This page shows you how to use a elisp function in your regex replacement. This lets you do transformation of the matched text. For example, replacing “_” to space, or insert timestamp, current file name, in your replacement text.
I want to replace a text pattern. However, the replaced text should be a transformed version of the matched text. For example, if the matched text is “emacs_fun”, the replacement text should be “emacs fun”.
Technically, this page shows you how to write a emacs elisp function that takes a input from matched text and returns a new text, and tell emacs to use this function for the replacement string. The replacement text can contain dynamically generated text, such as the current file's name, current time, etc.
Normally, you can write a Perl or Python script to do find/replace operation on all files in a dir. 〔☛ Python: Find/Replace by Regex Text Pattern〕 However, this process is not interactive. If you want the Find/Replace based on case-by-case basis, then this approach won't work. If you are going to program interactivity into your script, then it ceases to be a trivial job.
Emacs comes to the rescue, because it has several interactive regex Find/Replace commands, either on current file (using query-replace-regexp) or list of files (using dired-do-query-replace-regexp) 〔☛ Interactively Find/Replace String Patterns on Multiple Files〕
However, suppose you want the replacement string to be a transformed version of the matched text. This means, instead of constructing the replacement string using /1, /2 etc, you need to use a function that returns text, using the matched texts as input.
I have a website with thousands of HTML files. Among the set, it contains 3276 links to articles at Wikipedia. Because my site is written over the years, the link format is not consistent. For example, a link to the article on Stanislaw Szukalski might have these formats:
① <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">Stanislaw Szukalski</a>
② <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">Stanislaw_Szukalski</a>
③ <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">http://en.wikipedia.org/wiki/Stanislaw_Szukalski</a>
I want format 2 and 3 to be replaced with format 1, but not always. For some pages i want to use full URL as the link text.
For simplicity of this article, let's say i just want to replace format 2 to format 1 on a case by case basis.
Here, regex cannot do the job by itself because i need the underscore char “_” in the matched text be replaced by a space. This means, i need to write a function that takes the matched text and returns a desired text.
In emacs 22, there's a new feature that allows you use a elisp
function as your replacement string. This is done by giving the
replacement string this form \,(functionName), where functionName is your
elisp function.
The task here is to write the replacement function.
Let's say our function will be named ff. ff will take 1 input that's the matched text, then replace _ by space, then return the new text.
The function skeleton would be like this then:
(defun ff () "temp function. Returns a string based on current regex match." ; 1. get the matched text ; 2. transform the matched text ; 3. returns the transformed text )
This is conceptually simple. The hard part is to know how does elisp in the emacs environment actually get the matched text, and how does emacs lisp the language do text replacement. Here's the solution:
(defun ff () "temp function. Returns a string based on current regex match." (replace-regexp-in-string "_" " " (match-string 1)) )
The (match-string 1) gives you the first captured string. (“1” is for 1st captured pattern, “2” for 2nd captured pattern. “0” is the entire match.). The
replace-regexp-in-string is used to transform the text.
(To make emacs aware of ff, select the whole definition, then call eval-region.)
So, with this function written, we can call query-replace-regexp, then give this pattern:
>\([_A-Za-z0-9]+\)</a>
And the replacement expression would be:
\,(ff)
and we are all done.
This function can be of general use. Whenever you need to replace text patterns with complicated heuristics, you can base your replacement function on the above code.
Here's the actual replacement function i used for this job:
(defun wikipedia-link-replacement () "Returns a canonical form of Wikipedia link from a regex match. This function is used for query-replace-regexp, to turn the following forms of links: <a href=\"http://en.wikipedia.org/wiki/event\">event</a> <a href=\"http://en.wikipedia.org/wiki/Middle_distance\">Middle_distance</a> <a href=\"http://en.wikipedia.org/wiki/Middle_distance_track_event\">Middle_distance_track_event</a> <a href=\"http://en.wikipedia.org/wiki/Sapir-Whorf_Hypothesis\">Sapir-Whorf_Hypothesis</a> into a cannonical form. Basically, the link text needs to have “_” replaced by space. Also, it shouldn't match links that's already in canonical form, nor matching non-wikipedia link texts. The regex to be used for this function is: <a href=\"http://\\(..\\)\\.wikipedia.org/wiki/\\([^\"]+\\)\">\\(\\([-.A-Za-z0-9]+_\\)+[-.A-Za-z0-9]+ ?\\)</a> To use this function, call query-replace-regexp, then in the replacement prompt give: \\,(wikipedia-link-replacement) ." (let (langCode articlePath linkText linkText2 returnText) (setq langCode (match-string 1)) (setq articlePath (match-string 2)) (setq linkText (match-string 3)) (setq linkText2 (replace-regexp-in-string "_" " " articlePath)) (setq returnText (concat "<a href=\"http://" langCode ".wikipedia.org/wiki/" articlePath "\">" linkText2 "</a>" )) returnText ) )
Emacs is beautiful.
I need to find all text of the form:
(info "(emacs) Option Index")
and change it into this form:
<a href="../emacs_manual/Option-Index.html">(info "(emacs) Option Index")</a>
Here's the regex i use:
(info "(emacs) \([^"]+?\)")
Here's the replacement code to use \,(ff). Here's the elisp code:
(defun ff () "temp" (interactive) (let (matchedText url replaceText anchorText) (setq matchedText (match-string 1 ) ) (setq replaceText (replace-regexp-in-string " " "-" matchedText)) (setq url (concat "../emacs_manual/" replaceText ".html" ) ) (setq anchorText (concat "(info \"(emacs) " matchedText "\")" ) ) (concat "<a href=\"" url "\">" anchorText "</a>") ))
Here are more examples of using a function as replacement string.