Elisp: Regex Tutorial

By Xah Lee. Date: . Last updated: .

This page is a tutorial on using regex in emacs lisp code.

your regex brain
emacs lisp regex toothpick syndrome

Regex Syntax

If you are not familiar with emacs regex syntax, first see:

Emacs: Regex Tutorial

Test Regex in Elisp Code

One simple way to test regex is to create a file with the following content:

(re-search-forward "yourRegex")

whatever text to search here

Then, put your cursor to the right of the closing parenthesis, then Alt+x eval-last-sexpCtrl+x Ctrl+e】. If your regex matches, it'll move cursor to the last char of the matched text. If you get a lisp error saying search failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.

Newline Character and Tab

Inside elisp string, \t is TAB char (Unicode codepoint 9), and \n is newline. You can use [\t\n ]+ for sequence of {tab, newline, space}.

When a file is opened in Emacs, newline is always \n, regardless whether your file is from {Unix, Windows, Mac}. Do NOT manually do find replace on newline chars for changing file newline convention. [see Emacs: Newline Representations ^M ^J ^L]

Double Backslash in Lisp Code

Regex string in emacs lisp needs to have lots double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.

Here's General Rule:

  1. The character {\n, \t} should have just 1 backslash in front.
  2. ASCII double quote character (codepoint 34) needs to have 1 backslash in front, i.e. \"
  3. To escape emacs regex character for regex interpretation instead of literal (such as parenthesis) or literal interpretation instead of regex (such as square bracket), you need double backslashs in front, because double backslashs in string represents 1 backslash.

For example, suppose you have this text:

<img src="cat.jpg" alt="my cat" width="795" height="183" />

When you call a command such as list-matching-lines , you can type the regex in the prompt. Example:

<img src="\([^"]+?\)" alt="\([^"]+?\)" width="\([0-9]+\)" height="\([0-9]+\)" />

But in lisp code, the same regex needs to have many backslash escapes, like this:

"<img src=\"\\([^\"]+?\\)\" alt=\"\\([^\"]+?\\)\" width=\"\\([0-9]+\\)\" height=\"\\([0-9]+\\)\" />" )

(info "(elisp) Regular Expressions")

Unicode Representation in String

Elisp: Unicode Escape Sequence

Find Replace Text

Elisp: Find Replace String in Buffer

Regex and Syntax Table

Warning: the meaning of a character class in emacs is dependent on the current major mode's syntax table. For example, what chars are considered “word” in [[:word:]] depends on how its defined in syntax table of current major mode.

For a example showing the difference, see: Elisp: Regex Patterns and Syntax Table

Syntax table is hard to work with, and regex using it may be unpredictable. Best is just to put the chars you want explicitly in your regex, for example, [A-Za-z].

Elisp: Writing Command

  1. Writing Command, Basics
  2. Mark and Region
  3. Get Buffer String
  4. Work with Lines
  5. Copy Cut Paste kill-ring
  6. Get User Input
  7. Interactive Form
  8. Get universal-argument
  9. Find Replace Text
  10. thing-at-point
  11. Get Dired Marked Files

If you have a question, put $5 at patreon and message me.
Or Buy Xah Emacs Tutorial
Or buy a nice keyboard: Best Keyboards for Emacs


Emacs Lisp