Emacs Lisp Problems: Trim String, Regex Match Data, Lacking Namespace
This page shows a trim string emacs lisp package, and discusses emacs lisp's problems of regex match data and lacking namespace.
Elisp: Trim Whitespace in String Library
Magnar Sveen (the EmacsRocks guy) has written a elisp lib for trimming strings, such as chopping off white spaces or newline char. Basically, they are wrappers to a few builtin functions. But the lib is really nice, as they provide a set of functions with consistent interface, and you don't have to code your own. The lib is at: https://github.com/magnars/s.el
Coding string trimming functions is actually not as trivial. For example, here's one i wrote at xeu_elisp_util.el:
(defun trim-string (string) "Remove white spaces in beginning and ending of STRING. White space here is any of: space, tab, emacs newline (line feed, ASCII 10)." (replace-regexp-in-string "\\`[ \t\n]*" "" (replace-regexp-in-string "[ \t\n]*\\'" "" string)) )
One of the questions i had to ask myself is implementation choices. Should i be using
replace-regexp-in-string? Which way is faster? There's actually quite a few ways to implement this, and it is not clear which way is fast unless you spend a few hours to experiment. (assume you need to call this function tens of thousands of times.) The function i wrote above is NOT fast, because it calls
replace-regexp-in-string, which is implemented in elisp, and is quite complex. (Alt+x
describe-function to look at its source code.)
(Calling trim string tens of thousands of times is realistic, and happens in practice a lot. For example, processing thousands of files line by line.)
here's Magnar Sveen's implementation:
(defun s-trim-left (s) "Remove whitespace at the beginning of S." (if (string-match "\\`[ \t\n\r]+" s) (replace-match "" t t s) s)) (defun s-trim-right (s) "Remove whitespace at the end of S." (if (string-match "[ \t\n\r]+\\'" s) (replace-match "" t t s) s)) (defun s-trim (s) "Remove whitespace at the beginning and end of S." (s-trim-left (s-trim-right s)))
Regex Search and Losing Match Data
string-match will taint your string match data.
For example, suppose you are processing thousands of HTML files. You want to change all links from relative path to absolute URL. Suppose you Alt+x
search-forward-regexp to search for the link string. You need to get the matched data.
(match-string 1) is the href value, and
(match-string 2) is the link text, and
(match-beginning 0) is the beginning of tag, and
(match-end 0) is the end tag.
Now, you call
(match-string 2) then apply a regex to trim white spaces. Now, all your other match data will be lost.
For safety, whenever you use any elisp regex function and want to catch the match data, you should immediately get them after the call (that is, set to variables), then, continue to do whatever you are doing. Again, this isn't very nice, as you'll need to introduce several temp variables.
I really hope that
replace-regexp-in-string is written in C.
Here's a example of text processing that uses regex and requires the capture of several match data and trim string: Emacs Lisp vs Perl: Validate Local File Links.
Note: you can use
save-match-data, but it's a lisp macro. The solution isn't better than saving matches yourself.
Emacs Lisp Problem of Lacking Namespace
Another issue of elisp is the lack of namespace. This is a major problem that prevents elisp from growing as a language.
For example, Magnar's lib is nice, i want to use it, but, should i really? If i use it, it means i'll have to put another misc lib in my elisp system in a haphazard way. In other languages such as perl, python, ruby, there's namespace and controlled way to install packages. In these langs, a library has expected paths, standard naming scheme, standard format of documentation that are embedded with source code, and usually a system to install or uninstall them (For example, perl
In emacs lisp, there's no namespace, and elisp packages are not managed. [see Elisp: load, load-file, autoload]
(the emacs 24 package system is not a elisp language library system. It's a user-land app system, similar to Debian Linux's Advanced Packaging Tool (apt-get). [see Emacs: How to Install Packages Using ELPA, MELPA])
When a language lacks namespace or standard library system, it ends up as lots of misc packages in the wild, with overlapping functionality, inconsistent interface, varying quality.
now emacs has trim string builtin. see
Elisp: Trim String