Emacs Lisp Problems: Trim String, Regex Match Data, Lacking Namespace

Master emacs+lisp, benefit for life. Testimonials. Thank you for support.
, , …,

This page shows a trim string emacs lisp package, and discusses emacs lisp's problems of regex match data and lacking namespace.

Emacs Lisp: Trim Whitespace in String Library

Magnar Sveen (the EmacsRocks guy) has written a elisp lib for trimming strings, such as chopping off white spaces or newline char. Basically, they are wrappers to a few builtin functions. But the lib is really nice, as they provide a set of functions with consistent interface, and you don't have to code your own. The lib is at: https://github.com/magnars/s.el

Coding string trimming functions is actually not as trivial. For example, here's one i wrote at xeu_elisp_util.el:

(defun trim-string (string)
  "Remove white spaces in beginning and ending of STRING.
White space here is any of: space, tab, emacs newline (line feed, ASCII 10)."
(replace-regexp-in-string "\\`[ \t\n]*" "" (replace-regexp-in-string "[ \t\n]*\\'" "" string))
)

One of the questions i had to ask myself is implementation choices. Should i be using replace-regexp-in-string? Which way is faster? There's actually quite a few ways to implement this, and it is not clear which way is fast unless you spend a few hours to experiment. (assume you need to call this function tens of thousands of times.) The function i wrote above is NOT fast, because it calls replace-regexp-in-string, which is implemented in elisp, and is quite complex. (call describe-function to look at its source code.)

(Calling trim string tens of thousands of times is realistic, and happens in practice a lot. For example, processing thousands of files line by line.)

Magnar Sveen's implementation is better:

(defun s-trim-left (s)
  "Remove whitespace at the beginning of S."
  (if (string-match "\\`[ \t\n\r]+" s)
      (replace-match "" t t s)
    s))

(defun s-trim-right (s)
  "Remove whitespace at the end of S."
  (if (string-match "[ \t\n\r]+\\'" s)
      (replace-match "" t t s)
    s))

(defun s-trim (s)
  "Remove whitespace at the beginning and end of S."
  (s-trim-left (s-trim-right s)))

But there's a cost.

Regex Search and Losing Match Data

Using string-match will taint your string match data.

For example, suppose you are processing thousands of HTML files. You want to change all links from relative path to absolute URL. Suppose you call search-forward-regexp to search for the link string. You need to get the matched data. Let's say (match-string 1) is the href value, and (match-string 2) is the link text, and (match-beginning 0) is the beginning of tag, and (match-end 0) is the end tag. Now, you call (match-string 2) then apply a regex to trim white spaces. Now, all your other match data will be lost.

For safety, whenever you use any elisp regex function and want to catch the match data, you should immediately get them after the call (that is, set to variables), then, continue to do whatever you are doing. Again, this isn't very nice, as you'll need to introduce several temp variables.

I really hope that replace-regexp-in-string is written in C.

Here's a example of text processing that uses regex and requires the capture of several match data and trim string: Emacs Lisp vs Perl: Validate Local File Links.

Note: you can use save-match-data, but it's a lisp macro. The solution isn't better than saving matches yourself.

Emacs Lisp Problem of Lacking Namespace

Another issue of elisp is the lack of namespace. This is a major problem that prevents elisp from growing as a language.

For example, Magnar's lib is nice, i want to use it, but, should i really? If i use it, it means i'll have to put another misc lib in my elisp system in a haphazard way. In other languages such as perl, python, ruby, there's namespace and controlled way to install packages. In these langs, a library has expected paths, standard naming scheme, standard format of documentation that are embedded with source code, and usually a system to install or uninstall them (⁖ perl cpan, python pip, ruby gem).

In emacs lisp, there's no namespace, and elisp packages are not managed. 〔☛ Emacs Lisp's Library System: What's require, load, load-file, autoload, feature?

(the emacs 24 package system is not a elisp language library system. It's a user-land app system, similar to Debian Linux's Advanced Packaging Tool (apt-get). 〔☛ Emacs 24 Packages (ELPA) Tutorial 〕)

When a language lacks namespace or standard library system, it ends up as lots of misc packages in the wild, with overlapping functionality, inconsistent interface, varying quality.

Like what you read?
Buy Xah Emacs Tutorial
or share some
blog comments powered by Disqus