Emacs Lisp: Regex Tutorial

By Xah Lee. Date: . Last updated: .

This page is a tutorial on using emacs regex in emacs lisp code.

your regex brain
emacs lisp regex toothpick syndrome

This page we give tips about regex in elisp code.

Emacs Regex Syntax and Common Patterns

If you are not familiar with emacs regex syntax and commands, first see:

Emacs: Regex Tutorial

How to Test Regex in Emacs Lisp Code

One simple way to test regex is to create a file with the following content:

(re-search-forward "yourRegex")

whatever text to search here

Then, put your cursor to the right of the closing parenthesis, then call eval-last-sexpCtrl+x Ctrl+e】. If your regex matches, it'll move cursor to the last char of the matched text. If you get a lisp error saying search failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.

Newline Character and Tab

Inside elisp string, \t is TAB char (Unicode codepoint 9), and \n is newline. You can use [\t\n ]+ for sequence of {tab, newline, space}.

When a file is opened in Emacs, newline is always \n, regardless whether your file is from {Unix, Windows, Mac}. Do NOT manually do find replace on newline chars for changing file newline convention. 〔➤see Emacs: Newline Representations ^M ^J ^L

Double Backslash in Lisp Code

Regex string in emacs lisp needs to have lots double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.

Here's General Rule:

For example, suppose you have this text:

<img src="cat.jpg" alt="my cat" width="795" height="183" />

When you call a command such as query-replace-regexp, you can type the regex in the prompt. Example:

<img src="\([^"]+?\)" alt="\([^"]+?\)" width="\([0-9]+\)" height="\([0-9]+\)" />

But in lisp code, the same regex needs to have many backslash escapes, like this:

(re-search-forward
"<img src=\"\\([^\"]+?\\)\" alt=\"\\([^\"]+?\\)\" width=\"\\([0-9]+\\)\" height=\"\\([0-9]+\\)\" />" )

The following should have single backslash only: {\n, \t, \"}.

(info "(elisp) Regular Expressions")

Unicode Representation in String

Emacs Lisp: Unicode Representation in String

How to Find Replace Text

For how to write elisp function that does find replace in a buffer, see: Emacs Lisp: Find Replace String in Buffer

Regex and Syntax Table

Warning: the meaning of a character class in emacs is dependent on the current major mode's syntax table. For example, what chars are considered “word” in [[:word:]] depends on how its defined in syntax table of current major mode.

For a example showing the difference, see: Emacs Lisp: Regex Patterns and Syntax Table

Syntax table is hard to work with, and regex using it may be unpredictable. Best is just to put the chars you want explicitly in your regex, for example, [A-Za-z].

Like it? Buy Xah Emacs Tutorial. Thanks.

or, buy something from Best Keyboard for Emacs