Emacs Lisp: Find & Replace Multiple String Pairs

By Xah Lee. Date: . Last updated: .

This page shows a example of writing a emacs lisp function that cleans up a file's content by repeated application of find/replace operation.


I want to write a command such that it does find/replace on several pairs of {regex string, replace string}, on the current file.

For example, this text:

           Polygon[{{0, -0.00004000, 2.000},
                        {0, -0.00003978, 2.000},
                        {-0.01043, -0.09920, 1.995},
                        {0, -0.09975, 1.995}}],
           Polygon[{{0, -0.00003978, 2.000},
                        {0, -0.00003913, 2.000},
                        {-0.02074, -0.09757, 1.995},
                        {-0.01043, -0.09920, 1.995}}],
           Polygon[{{0, -0.00003913, 2.000},
                        {-0.00001236, -0.00003804, 2.000},
                        {-0.03083, -0.09486, 1.995},
                        {-0.02074, -0.09757, 1.995}}]

Should become this:



I have a website of Math Surface Gallery, which contains a Java applet called JavaView that allows people to view 3D objects with real-time rotation by the mouse. For example, this is one of the Java applet page: Costa surface applet. There are about 70 of such surfaces. Each of these surface has a raw data file that the Java applet reads. For example, for the Costa surface above, the raw data file is: costa.mgs.gz. These files are just Mathematica graphics in plain text, and compressed with gzip.

The content of the file looks like this:

    Polygon[{{3.552, -0.001061, 2.689}, {3.552, 0.03079, 2.689},
            {3.025, 0.02634, 2.524}, {3.025, -0.001061, 2.524}}],
    Polygon[{{3.552, 0.03079, 2.689}, {3.550, 0.1250, 2.689},
            {3.023, 0.1074, 2.524}, {3.025, 0.02634, 2.524}}],

Because the file contains thousands of polygons, and can take a while for the Java applet to load it from the net. One way to reduce file size is to reduce the number of polygons. But given a file, spaces and newline characters can be deleted, and the decimal numbers can be safely truncated to 3 digits.

So, typically, i open the file, call query-replace to replace , to ,, and delete newline chars (replacing \n by empty string), delete multiple spaces. To truncate decimals to 3 places, i call query-replace-regexp with pattern \([0-9]\)\.\([0-9][0-9][0-9]\)[0-9]+ and replace it with \1.\2.

For each file, i have to do multiple replacements. This process gets repetitious. It would be nice, to have a emacs command, so i can just press a button and have all these replacements done. This would reduce some 50 keystrokes and eyeballing into a single brainless button punch.


Here's the solution:

(defun xah-clean-Mathematica-graphics-buffer ()
  "Remove whitespace, truncate numbers, of current buffer of Mathematica graphics file.
This command does several find/replace on the current buffer.
Removing spaces, removing new lines, truncate numbers to 3 decimals, etc.
The goal of these replacement is to reduce the file size of a Mathematica Graphics file (.mgs) that are read over the net by JavaView."

  (goto-char 1)
  (while (search-forward "\n" nil t) (replace-match "" nil t))

  (goto-char 1)
  (while (search-forward-regexp "  +" nil t) (replace-match " " nil t))

  (goto-char 1)
  (while (search-forward ", " nil t) (replace-match "," nil t))

  (goto-char 1)
  (while (search-forward-regexp "\\([0-9]\\)\\.\\([0-9][0-9][0-9]\\)[0-9]+" nil t) (replace-match "\\1.\\2" t nil)))

This function is very simple. It does a series of replacement using the “while” loop, each time moving the cursor to the beginning of file. The core is the following 3 functions: { search-forward, search-forward-regexp, replace-match}.

The search-forward function takes a string and moves the cursor to the end of the string that matches. search-forward-regexp does similar. The replace-match simply replaces the text matched by the last search.

One interesting aspect about search-forward-regexp is that you must use 2 backslashes to represent one backslash. This is because backslash in emacs string needs a backslash to represent it. Then, this string is passed to emacs's regex engine. 〔➤see Emacs regex tutorial

Another thing of interest is that the first 2 optional parameters to replace-match function is “fixedcase” and “literal”, both are booleans. 〔➤see Emacs Functions Documentation Lookup

You can use this code as a template, whenever you need a command that replace multiple pairs in the current file.

Multi-Pair Replacement Elisp Convenience

PS: Note that in this tutorial, each replacement pair is done using a while loop, and each start with (goto-char 1). What if you have lots of pairs? Won't it be great if you can simply write:

["alpha" "α"]
["beta" "β"]
["gamma" "γ"]

instead of each with a while loop? For a solution for this, see: Emacs Lisp: Multi-Pair Find Replace: xah-replace-pairs.el.

Mathematica Code

Addendum: here's the Mathematica code to export graphics into a text file forcing all numbers to be printed in a simple d.dddd format.

Otherwise, Mathematica may print numbers in various forms such as 2.25`*^-9, \(7.2389`\), 3.141592653589793238462643383279503`20.



(*the first argument is a Graphics3D object, the second is a name to
save to, the third is number of decimal places for the coordinate

Emacs ♥

Like it? Buy Xah Emacs Tutorial. Thanks.