This page shows a example of using emacs's regex to update HTML image tags on all files in a directory.
For all HTML image tags of the form:
<img src="paraboloid.png" alt="" width="832" height="513">
Add a value to the “alt” attribute. The value should be the image file name, but without file extension. ⁖
This needs to be done for about 100 files inside a dir and subdir.
find-dired, then give the dir name, then give
-name "*html". The result is all HTML files in that dir and subdir.
Now, mark the files you want, by typing 【% m】 (
dired-mark-files-regexp). Then give the pattern
\.html. This marks all HTML files.
To do regexp replace on a bunch of files, call
dired-do-query-replace-regexp. 〔➤ Emacs: Interactively Find ＆ Replace Text in Directory〕
The next job is to give regex search pattern. This is simple:
<img src="\([^"]+?\)" alt="" width="\([0-9]+\)" height="\([0-9]+\)">
This regex captures the file name, the width and height.
〔➤ Text Pattern Matching in Emacs (emacs regex tutorial)〕
Since emacs 22, it allows you to give a elisp expression for the replacement, by using this syntax for the replacement string:
\,‹elisp code›, where ‹elisp code› is lisp expression.
The heart of this task is to write the elisp function that gives us the replacement string, where the alt part is the transformed version of the file name. This is surprisingly simple too. Here's the lisp expression we need:
(concat "<img src=\"" (match-string 1) "\" alt=\"" (replace-regexp-in-string ".png" "" (replace-regexp-in-string "_" " " (match-string 1))) "\" width=\"" (match-string 2) "\" height=\"" (match-string 3) "\">" )
match-string simply give us the matched values. The interesting part is the
replace-regexp-in-string we used to generate the value for alt. First, we replace “_” to space, then we delete the “.png”. That's all there is to it.
Finally, we call
dired-do-query-replace-regexp in the dired
buffer (hotkey is Q).
〔➤ Emacs: Interactively Find ＆ Replace Text in Directory〕
Without emacs, the above operation might take a hour or two and is tedious and error prone. With expertise in Perl or Python scripting, the problem is lack of interactive see-and-do. With emacs, the whole operation is less than 5 minutes.
Suppose you are given a task where hundreds of valid HTML files in
a dir needs to be converted to valid XHTML. Note that XHTML has a
slightly different syntax. For example, all tags such as
<li> now needs to be closed.
<br> need to be like
<img … />,
<br/>. Also, tags are now case sensitive, so you need to lower case them. Also, image tags now must be wrapped inside a container tag, such as
<div>. The DTD also needs to be changed, and there are many style oriented tags that needs to be transformed.
This task seems daunting. You could try a Perl script in one shot, but it would probably take you days to code it correctly, and if your script has a parsing or regex error, it'll delete parts of your files without you knowing it. You could do a trial and error approach by regex replacement experimentally one at a time. Still, your script goes batch. If you make a mistake, you'll have to revert all your files. With mastery of emacs, you can do the above transform using regex find/replace one by one, interactively and safely, saving your time some 10 fold.