This page shows a real world example of using emacs's regex to update HTML image tags on all files in a directory. This is done by using emacs's find/replace commands, with the replacement string based on the current file's name. (by using a elisp function) You should be familiar with Elisp Language Basics.
I need to add the alt="image description" attribute to all image
tags for all HTML files in a directory. The alt's value should be based on the image's
file name.
Technically, this page shows you how to use emacs's regex and a elisp function for the replacement string, to do find/replace on all files in a dir.
I have many HTML files in a dir. Many have a image tag like this:
<img src="paraboloid.png" alt="math surface" width="832" height="513">
Note that their “alt” value is all just “math surface”. I want the alt value to be more descriptive, based on the file name. So, in this example, it should be alt="paraboloid".
All these files are inside a dir, most of these are inside various subdirs. There are about 100 files. About maybe 50 of them has alt="math surface" that needs to be fixed.
The simplest solution is to use regex with a custom replacement function. (The method described here can be used if your image tags don't have alt= and you need to add it.).
The solution is quite simple. To do regexp replace on a bunch of files, you can use the built-in command
dired-do-query-replace-regexp.
〔☛ Emacs: Interactively Find/Replace String Patterns on Multiple Files〕
So, all we have to do is to go dired, call that command, give the find string and replacement string, and we are done.
Since the files are in different sub directories, so i call find-dired first. find-dired will list all subdirs in a single dired listing.
So, i type 【Alt+x find-dired】, then give the dir name, then give -name "*html". The result is all HTML files in that dir and subdir.
Then, i mark all files i want, by typing 【% m】, which invokes dired-mark-files-regexp. Then i give the pattern \.html, which would mark all HTML files.
The next job is to give regex search pattern. This is simple:
<img src="\([^"]+?\)" alt="math surface" width="\([0-9]+\)" height="\([0-9]+\)">
〔☛ Text Pattern Matching in Emacs (emacs regex tutorial)〕
Since emacs 22, it allows you to give a elisp function (or elisp expression) as the replacement, by using this syntax when prompted for the replacement string: \,myLispExpression, where “myLispExpression” is lisp code.
The heart of this task is to write the elisp function that gives us the replacement string, where the alt part is the transformed version of the file name. This is surprisingly simple too. Here's the lisp expression we need:
(concat "<img src=\"" (match-string 1) "\" alt=\"" (replace-regexp-in-string ".png" "" (replace-regexp-in-string "_" " " (match-string 1))) "\" width=\"" (match-string 2) "\" height=\"" (match-string 3) "\">" )
The match-string simply give us the matched values. The interesting part is the replace-regexp-in-string we used to generate the value for alt. First, we replace “_” to space, then we delete the “.png”. That's all there is to it.
Finally, we call dired-do-query-replace-regexp in the dired
buffer (hotkey is Q).
〔☛ Emacs: Interactively Find/Replace String Patterns on Multiple Files〕
Without emacs, the above operation might take a hour or two and is tedious and error prone. With expertise in Perl or Python scripting, the problem is lack of interactive see-and-do. With emacs, the whole operation is less than 5 minutes.
The project that required this task is this: Gallery of Famous Surfaces.
Suppose you are given a task where hundreds of valid HTML files in
a dir needs to be converted to valid XHTML. Note that XHTML has a
slightly different syntax. For example, all tags such as <p> and <li> now needs to be closed.
Tags like <img>, <hr>, <br> need to be like
<img … />, <hr/>, <br/>. Also, tags are now case sensitive, so you need to lower case them. Also, image tags now must be wrapped inside a container tag, such as <div>. The DTD also needs to be changed, and there are many style oriented tags that needs to be transformed.
This task seems daunting. You could try a Perl script in one shot, but it would probably take you days to code it correctly, and if your script has a parsing or regex error, it'll delete parts of your files without you knowing it. You could do a trial and error approach by regex replacement experimentally one at a time. Still, your script goes batch. If you make a mistake, you'll have to revert all your files. With mastery of emacs, you can do the above transform using regex find/replace one by one, interactively and safely, saving your time some 10 fold.