Emacs Lisp: Find/Replace Multiple String Pairs

, , …,

This page shows a example of writing a emacs lisp function that cleans up a file's content by repeated application of find/replace operation.

Problem Description

Summary

I want to write a command such that it does find/replace on several pairs of {regex string, replace string}, on the current file.

For example, this text:

Graphics3D[{
           Polygon[{{0, -0.00004000, 2.000},
                        {0, -0.00003978, 2.000},
                        {-0.01043, -0.09920, 1.995},
                        {0, -0.09975, 1.995}}],
           Polygon[{{0, -0.00003978, 2.000},
                        {0, -0.00003913, 2.000},
                        {-0.02074, -0.09757, 1.995},
                        {-0.01043, -0.09920, 1.995}}],
           Polygon[{{0, -0.00003913, 2.000},
                        {-0.00001236, -0.00003804, 2.000},
                        {-0.03083, -0.09486, 1.995},
                        {-0.02074, -0.09757, 1.995}}]
           }]

Should become this:

Graphics3D[{Polygon[{{0,-0.000,2.000},{0,-0.000,2.000},{-0.010,-0.099,1.995},{0,-0.099,1.995}}],Polygon[{{0,-0.000,2.000},{0,-0.000,2.000},{-0.020,-0.097,1.995},{-0.010,-0.099,1.995}}],Polygon[{{0,-0.000,2.000},{-0.000,-0.000,2.000},{-0.030,-0.094,1.995},{-0.020,-0.097,1.995}}]}]

Detail

I have a website of Math Surface Gallery, which contains a Java applet called JavaView that allows people to view 3D objects with real-time rotation by the mouse. For example, this is one of the Java applet page: Costa surface applet. There are about 70 of such surfaces. Each of these surface has a raw data file that the Java applet reads. For example, for the Costa surface above, the raw data file is: costa.mgs.gz. These files are just Mathematica graphics in plain text, and compressed with gzip.

The content of the file looks like this:

Graphics3D[{{
    Polygon[{{3.552, -0.001061, 2.689}, {3.552, 0.03079, 2.689},
            {3.025, 0.02634, 2.524}, {3.025, -0.001061, 2.524}}], 
    Polygon[{{3.552, 0.03079, 2.689}, {3.550, 0.1250, 2.689},
            {3.023, 0.1074, 2.524}, {3.025, 0.02634, 2.524}}], 
    Polygon[{…}],
…
}}]

Because the file contains thousands of polygons, and can take a while for the Java applet to load it from the net. One way to reduce file size is to reduce the number of polygons. But given a file, spaces and newline characters can be deleted, and the decimal numbers can be safely truncated to 3 digits.

So, typically, i open the file, call query-replace to replace , to ,, and delete newline chars (replacing \n by empty string), delete multiple spaces. To truncate decimals to 3 places, i call query-replace-regexp with pattern \([0-9]\)\.\([0-9][0-9][0-9]\)[0-9]+ and replace it with \1.\2.

For each file, i have to do multiple replacements. This process gets repetitious. It would be nice, to have a emacs command, so i can just press a button and have all these replacements done. This would reduce some 50 keystrokes and eyeballing into a single brainless button punch.

Solution

Here's the solution:

(defun clean-mgs-buffer ()
  "Reduce size of a mgs file by removing whitespace and truncating numbers.
This command does several find/replace on the current buffer.
Removing spaces, removing new lines, truncate numbers to 3 decimals, etc.
The goal of these replacement is to reduce the file size of a Mathematica Graphics file (.mgs) that are read over the net by JavaView."
  (interactive)

  (goto-char 1)
  (while (search-forward "\n" nil t) (replace-match "" nil t))

  (goto-char 1)
  (while (search-forward-regexp "  +" nil t) (replace-match " " nil t))

  (goto-char 1)
  (while (search-forward ", " nil t) (replace-match "," nil t))

  (goto-char 1)
  (while (search-forward-regexp "\\([0-9]\\)\\.\\([0-9][0-9][0-9]\\)[0-9]+" nil t) (replace-match "\\1.\\2" t nil)))

This function is very simple. It does a series of replacement using the “while” loop, each time moving the cursor to the beginning of file. The core is the following 3 functions: { search-forward, search-forward-regexp, replace-match}.

The search-forward function takes a string and moves the cursor to the end of the string that matches. search-forward-regexp does similar. The replace-match simply replaces the text matched by the last search.

One interesting aspect about search-forward-regexp is that you must use 2 backslashes to represent one backslash. This is because backslash in emacs string needs a backslash to represent it. Then, this string is passed to emacs's regex engine. 〔☛ Emacs regex tutorial

Another thing of interest is that the first 2 optional parameters to replace-match function is “fixedcase” and “literal”, both are booleans. 〔☛ Emacs Functions Documentation Lookup

You can use this code as a template, whenever you need a command that replace multiple pairs in the current file.

Multi-Pair Replacement Elisp Convenience

PS: Note that in this tutorial, each replacement pair is done using a while loop, and each start with (goto-char 1). What if you have lots of pairs? Won't it be great if you can simply write:

'(
["alpha" "α"]
["beta" "β"]
["gamma" "γ"]
)

instead of each with a while loop? For a solution for this, see: Elisp Package: Multi-Pair String Replacement: xfrp_find_replace_pairs.el.

Mathematica Code

Addendum: here's the Mathematica code to export graphics into a text file forcing all numbers to be printed in a simple d.dddd format.

Otherwise, Mathematica may print numbers in various forms such as 2.25`*^-9, \(7.2389`\), 3.141592653589793238462643383279503`20.

writeToFileRounded[expr_Graphics3D,fileName_?StringQ,prec_:4]:=Module[{},
      OpenWrite[fileName];
      WriteString[fileName,"Graphics3D["];
      WriteString[fileName,
        StringReplace[
          ToString@
            NumberForm[First@SetPrecision[Chop[expr,10^-(prec+1)],prec],
              ExponentFunction\[Rule](If[-Infinity<#<Infinity,Null,#]&)],
          "],"->"],\n"]];
      WriteString[fileName,"]"];
      Close[fileName]
      ];

writeToFileRounded[surf,"helicoid.ma",4]

(*the first argument is a Graphics3D object, the second is a name to
save to, the third is number of decimal places for the coordinate
values.*)

Emacs ♥

blog comments powered by Disqus