ErgoEmacsEmacsLispBlogEmacsLispBuy Tutorial
Web Hosting by 1&1

Emacs Lisp: Multi-Pair String Replacement with Report

Xah Lee, , …,

This page shows you how to write a emacs lisp command that does multi-pair find/replace on current buffer, and also print a report in another buffer of the replacements done.

This is convenient because you can do find/replace in one-shot for multiple find/replace pairs, and you don't have to eyeball each case, but lets you have a glace on possible errors from the report.

Problem Description

Summary

Write a command “title-bracket-to-html-tag” that changes any bracketed text into HTML tag. For example, in current file, if there's this text:

〈The Rise of “Worse is Better”〉

it would become:

<cite>The Rise of “Worse is Better”</cite>

The command also should generate a report of all changes made, in a separate buffer.

Detail

In my writings, i use angle brackets for book titles and article titles. For example:

• 〈The Rise of “Worse is Better”〉 (1991) …
• 《The Unix-Hater's Handbook》 (1994) …

The double angle bracket is for book titles, the single is for article titles. 〔☛ Intro to Chinese Punctuation

I want the book/title names to be colored so it is more readable. So, i want a HTML tag like this:

• <cite>The Rise of “Worse is Better”</cite> (1991) …
• <cite class="book">The Unix-Hater's Handbook</cite> (1994) …

With proper CSS, the titles will be colored, and the brackets also added. Here's what it looks like:

Here's a sample CSS code:

cite{color:#822222}
cite:before,cite:after{color:black;font-style:normal}
cite:before{content:"〈"}
cite:after{content:"〉"}
cite.book:before{content:"《"}
cite.book:after{content:"》"}

〔☛ CSS Tutorial〕 It is very tedious to replace each angle bracket to the HTML tag version, even using Emacs Keyboard Macro feature. I'd like to have the changes done by just pressing one button.

Solution

Here's a solution. I was surprised, that it actually took me just 15 min.

(defun title-bracket-to-html-tag ()
  "Replace all 〈…〉 to <cite>…</cite> in current buffer.
Also replace 《…》 to <cite class=\"book\">…</span>.
Generate a report of the replaced strings in a separate buffer."
  (interactive)
  (let ((changedItems '()))

    (save-excursion
      (goto-char (point-min))
      (while (search-forward-regexp "《\\([^》]+?\\)》" nil t)
        (setq changedItems (cons (match-string 1) changedItems ) )
        (replace-match "<cite class=\"book\">\\1</cite>" t)
        )

      (goto-char (point-min))
      (while (search-forward-regexp "〈\\([^〉]+?\\)〉" nil t)
        (setq changedItems (cons (match-string 1) changedItems ) )
        (replace-match "<cite>\\1</cite>" t) ) )

    (with-output-to-temp-buffer "*changed items*"
      (mapcar
       (lambda (myTitle)
         (princ myTitle)
         (princ "\n") )
       changedItems) ) ))

Here's a outline of the algorithm:

  1. Search forward by regex for 《…》
  2. If found, replace it with cite tag.
  3. Push the replacement into a list (for the report of changed items later).
  4. Repeat the above until no more title brackets found.
  5. do the same for 〈…〉.
  6. When no more found, print the list for report.

All the functions in this code are very basic and is frequently used for text processing tasks. You can just use this function as a template to write your own.

The code is easy to understand. If you find it difficult, have a look at Emacs Lisp Basics and Emacs Lisp Idioms.

You can try the code. Copy & Paste the following into a file:

• 〈Defective C++〉 (2007), by Yossi Kreinin. @ yosefk.com.
• 《The Unix-Hater's handbook》 (1994), by Simson Garfinkel, Daniel Weise, Steven Strassmann, and Don Hopkins. The entire book is available at mit.edu. (ℤ local copy)
• 〈The Rise of “Worse is Better”〉 (1991), by Richard P Gabriel. @ dreamsongs.com
Richard Gabrielw is a well known figure in lisp community, the starter of what's now known as XEmacs. He's the recipient of ACM's 1998 Fellows Award and the 2004 Allen Newell Award.
〈The Rise of “Worse is Better”〉 is probably the first article that analyzed the strategy of software success from a evolutionary biology perspective.
• 〈Extreme Programming Explained〉 (2008), by Yossi Kreinin. @ yosefk.com
• 〈Java: Slow, ugly and irrelevant〉 (2001-01-08), by Simson Garfinkel. @ salon.com (ℤ local copy)
• 〈Optimization: Your Worst Enemy〉, (1999), by Joseph M Newcomer. @ flounder.com (ℤ local copy)
• 〈Will it rot my students' brains if they use Mathematica?〉 (2002-05), by Theodore W Gray. @ theodoregray.com (ℤ local copy)
Theodore is the author of Mathematica frontend. The article discusses educational math software, video games, and violence.
• 〈Go To Statement Considered Harmful〉 (1968), by Edsger W Dijkstra. Source; (ℤ local copy)
• 〈Skin Cancer〉 (2000), by Greg Knauss. @ suck.com. (ℤ Local copy)
A satire on Netscape browser and the “Skin” phenomenon.
• 〈Censorzilla〉 (2004), by Jamie Zawinski. @ jwz.org (ℤ local copy)
Jamie is a notorious programer of xemacs and Netscape web browser, has written a webpage that contains codes from Netscape browser before its Open Source release. Note the profanity laiden comments and what they say. It gives a indication of the pain and f�cked-up-ness of computing industry.
• 〈Let's Make Unix Not Suck〉 (1999), by Miguel De Icaza. @ primates.ximian.com
Miguel de Icazaw is the man behind Linux's Gnome project and Mono project. This article is written in the era when unixes do not really have a desktop or any concept of coherent development framework. It was controversial.
• 《Code Complete: A Practical Handbook of Software Construction》, by Steve C McConnell amazon.
Throw away all your Design Patterns or eXtreme Programming books. If you want a scientific book on software development analysis, read this book instead.
Steve McConnellw. «a author of many software engineering textbooks including Code Complete, Rapid Development, and Software Estimation. In 1998, McConnell was named as one of the three most influential people in the software industry by Software Development Magazine, along with Bill Gates and Linus Torvalds.»

Then call “title-bracket-to-html-tag”. It will generate a output on a separate pane showing you all the changed items. Here's the output:

Let's Make Unix Not Suck
Censorzilla
Skin Cancer
Go To Statement Considered Harmful
Will it rot my students' brains if they use Mathematica?
Optimization: Your Worst Enemy
Java: Slow, ugly and irrelevant
Extreme Programming Explained
The Rise of “Worse is Better”
The Rise of “Worse is Better”
Defective C++
Code Complete: A Practical Handbook of Software Construction
The Unix-Hater's handbook

Showing the changed items is important, because your text may have a mis-matched bracket. You can have a quick glance in the output and see if something is incorrect. This is also why keyboard macros isn't a good solution here (or, it needs to be creative).

(The book list above is from: The Tech Geekers & Software Engineering.)

Applications

I quote Wikipedia often. Often, in a quote there are many citation marks like this: {[1], [12], …}. They don't make sense in a excerpt. So, i have a command to remove them. The code is very similar. Replace each occurrence of [‹n›] and add it to a list then report it.

(defun remove-square-brackets  ()
  "Delete any text of the form “[‹n›]”.

Work on text selection or current line.
Print out in *changed items* buffer of all removed text.

For example, if text is on the line:
 「… was officially announced as Blu-ray Disc [11][12], and …」
then, after the call the line becomes:
 「… was officially announced as Blu-ray Disc, and …」."
  (interactive)

  (let (bdr p1 p2 inputStr resultStr changedItems)
    (setq bdr (get-selection-or-unit 'line) )
    (setq inputStr (elt bdr 0) p1 (elt bdr 1) p2 (elt bdr 2) )

    (setq changedItems '())

    (setq resultStr
          (with-temp-buffer
            (insert inputStr)

            (goto-char 1)
            (while (search-forward-regexp "\\(\\[[0-9]+?\\]\\)" nil t)
              (setq changedItems (cons (match-string 1) changedItems ) )
              (replace-match "" t) )
            (buffer-string)) )

    (delete-region p1 p2)
    (insert resultStr)

    (with-output-to-temp-buffer "*changed items*"
      (mapcar
       (lambda (myTitle)
         (princ myTitle)
         (princ "\n") ) changedItems) ) ) )

In the above, i used a convenient custom function get-selection-or-unit. You can replace it with thing-at-point 〔☛ Emacs Lisp: Using thing-at-point〕. Or, you can use my library, see: Emacs Lisp: get-selection-or-unit.

For about 10 examples of using multi-pair find/replace, See: Emacs Lisp Multi-Pair Find/Replace Applications.

Emacs ♥

blog comments powered by Disqus