Elisp: Find Replace Multiple String Pairs

By Xah Lee. Date: . Last updated: .

This page shows a example of writing a emacs lisp function that cleans up a file's content by repeated application of find replace operation.


I want to write a command such that it does find replace on several pairs of {regex string, replace string}, on the current file.

For example, this text:

           Polygon[{{0, -0.00004000, 2.000},
                        {0, -0.00003978, 2.000},
                        {-0.01043, -0.09920, 1.995},
                        {0, -0.09975, 1.995}}],
           Polygon[{{0, -0.00003978, 2.000},
                        {0, -0.00003913, 2.000},
                        {-0.02074, -0.09757, 1.995},
                        {-0.01043, -0.09920, 1.995}}],
           Polygon[{{0, -0.00003913, 2.000},
                        {-0.00001236, -0.00003804, 2.000},
                        {-0.03083, -0.09486, 1.995},
                        {-0.02074, -0.09757, 1.995}}]

Should become this:



I have a website of Math Surface Gallery, which contains a Java applet called JavaView that allows people to view 3D objects with real-time rotation by the mouse. For example, this is one of the Java applet page: Costa surface applet. There are about 70 of such surfaces. Each of these surface has a raw data file that the Java applet reads. For example, for the Costa surface above, the raw data file is: costa.mgs.gz. These files are just Mathematica graphics in plain text, and compressed with gzip.

The content of the file looks like this:

    Polygon[{{3.552, -0.001061, 2.689}, {3.552, 0.03079, 2.689},
            {3.025, 0.02634, 2.524}, {3.025, -0.001061, 2.524}}],
    Polygon[{{3.552, 0.03079, 2.689}, {3.550, 0.1250, 2.689},
            {3.023, 0.1074, 2.524}, {3.025, 0.02634, 2.524}}],

Because the file contains thousands of polygons, and can take a while for the Java applet to load it from the net. One way to reduce file size is to reduce the number of polygons. But given a file, spaces and newline characters can be deleted, and the decimal numbers can be safely truncated to 3 digits.

So, typically, i open the file, Alt+x query-replace to replace , to ,, and delete newline chars (replacing \n by empty string), delete multiple spaces. To truncate decimals to 3 places, i call query-replace-regexp with pattern \([0-9]\)\.\([0-9][0-9][0-9]\)[0-9]+ and replace it with \1.\2.

For each file, i have to do multiple replacements. This process gets repetitious. It would be nice, to have a emacs command, so i can just press a button and have all these replacements done. This would reduce some 50 keystrokes and eyeballing into a single brainless button punch.


Here's the solution:

(defun xah-clean-Mathematica-graphics-buffer ()
  "Remove whitespace, truncate numbers, of current buffer of Mathematica graphics file.
This command does several find replace on the current buffer.
Removing spaces, removing new lines, truncate numbers to 3 decimals, etc.
The goal of these replacement is to reduce the file size of a Mathematica Graphics file (.mgs) that are read over the net by JavaView."

  (goto-char 1)
  (while (search-forward "\n" nil t) (replace-match "" nil t))

  (goto-char 1)
  (while (re-search-forward "  +" nil t) (replace-match " " nil t))

  (goto-char 1)
  (while (search-forward ", " nil t) (replace-match "," nil t))

  (goto-char 1)
  (while (re-search-forward "\\([0-9]\\)\\.\\([0-9][0-9][0-9]\\)[0-9]+" nil t) (replace-match "\\1.\\2" t nil)))

This function is very simple. It does a series of replacement using the “while” loop, each time moving the cursor to the beginning of file. The core is the following 3 functions: { search-forward, search-forward-regexp, replace-match}.

The search-forward function takes a string and moves the cursor to the end of the string that matches. search-forward-regexp does similar. The replace-match simply replaces the text matched by the last search.

One interesting aspect about search-forward-regexp is that you must use 2 backslashes to represent one backslash. This is because backslash in emacs string needs a backslash to represent it. Then, this string is passed to emacs's regex engine. [see Emacs regex tutorial]

Another thing of interest is that the first 2 optional parameters to replace-match function is “fixedcase” and “literal”, both are booleans.

You can use this code as a template, whenever you need a command that replace multiple pairs in the current file.

Multi-Pair Replacement Elisp Convenience

PS: Note that in this tutorial, each replacement pair is done using a while loop, and each start with (goto-char 1). What if you have lots of pairs? Won't it be great if you can simply write:

["alpha" "α"]
["beta" "β"]
["gamma" "γ"]

instead of each with a while loop? For a solution for this, see: Emacs: xah-replace-pairs.el Multi-Pair Find Replace.

Mathematica Code

Addendum: here's the Mathematica code to export graphics into a text file forcing all numbers to be printed in a simple d.dddd format.

Otherwise, Mathematica may print numbers in various forms such as 2.25`*^-9, \(7.2389`\), 3.141592653589793238462643383279503`20.



(*the first argument is a Graphics3D object, the second is a name to
save to, the third is number of decimal places for the coordinate

Emacs ♥

Like my tutorial? Put $5 at patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboard for Emacs

Ask me question on patreon