Emacs: Regex Tutorial

By Xah Lee. Date: . Last updated: .

Regular Expression (or regex in short) is a character sequence that represent a pattern for matching text. For example, you can use it to find all email addresses in a file by matching the email address pattern.

Note: Regular Expression came from formal language theory in 1950s by mathematician Stephen Cole Kleene. Regular Expression is used to express all “Regular Languages” (aka Linear Languages), basically meaning, any string from a grammar that does not generate nested structure. By 1990s, it became understood mostly as a tool in programing languages as a way to search text or find replace text. Different programing languages may have slightly incompatible syntax and features, and their regex pattern no longer has original meaning and definition in formal language theory.

What is Wrong with Wildcards * ?

In shell, you can use wildcards to match filename, for example:

*
Match 0 or more of any character. Example, *txt match all file names ending in “txt”.
?
Match any character.

This system is called glob pattern. Glob pattern is simple and easy to understand.

But what if, you want to match any file name that has 5 to 10 repeated B?

Or, for example, in a HTML file, you have lines like these:

<img src="img/cat.png" alt="my cat" />
<img src="img/dog.png" alt="her dog" />
<img src="img/house.jpg" alt="friend's house" />

You want to get all image tag's file names.

You cannot do with glob pattern. You need a more powerful one to match pattern: regex.

Regex Commands

The most commonly used commands that use regex are:

Example of Using a Regex Command

  1. Open a file with many lines.
  2. Alt+x list-matching-lines
  3. type th.t

It will list all lines that contains the regex pattern th.t, which includes “this”, “that”. The dot . character in regex is a wildcard, matching any character.

Emacs Regex Syntax

Here's commonly used patterns.

.
Any single character except newline ("\n").
\.
One period
[0-9]+
One or more digits
[^0-9]+
One or more non-digit characters
[A-Za-z]+
One or more letters
[-A-Za-z0-9]+
One or more {letter, digit, hyphen}
[_A-Za-z0-9]+
One or more {letter, digit, underscore}
[-_A-Za-z0-9]+
One or more {letter, digit, hyphen, underscore}
[[:ascii:]]+
One or more ASCII chars. (codepoint 0 to 127, inclusive)
[[:nonascii:]]+
One or more non-ASCII characters (For example, Unicode characters)
[\n\t ]+
One or more {newline character, tab, space}.

Repetition:

+
Match previous pattern 1 or more times. E.g. a+ means 1 or more occurrence of “a”. [0-9]+ means 1 or more occurrence of digit.
*
Match previous pattern 0 or more times
?
Match previous pattern 0 or 1 time
+?
Match previous pattern 1 or more times, but with minimal match (aka non-greedy)

Boundary anchors:

^
Beginning of {line, string, buffer}
$
End of {line, string, buffer}
\`
Beginning of {string, buffer}
\'
End of {string, buffer}
\b
word boundary marker

Capture:

\([0-9]+\)
Capture digit sequence.
\([A-Za-z]+\)
Capture English letter sequence. Do not use [A-z], because that'll match some punctuation chars too.
\([-A-Za-z]+\)
Capture English letter sequence plus dash.
\([-_A-Za-z]+\)
Capture English letter sequence plus dash and low line.
\([-_A-Za-z0-9]+\)
Capture alphanumeric sequence plus dash and low line.
\([^ab]\)
Capture any character that's not a nor b.
\([^ab]+\)
Capture a sequence that's not a nor b.
\([^"]\)
Capture a character that's not ".
"\([^"]+\)"
Capture text in between double quote.

Unicode character can be used literally, for example, matches the right arrow character literally.

You can also represent any character by a coded syntax such as \u2192. See: [see Elisp: Unicode Escape Sequence]

For complete list of regex syntax, see: (info "(elisp) Syntax of Regexps")

Newline and Tab

When using interactive commands, emacs won't understand \n or \t.

For example, if you want to match 2 lines, first line start with A, second line start with B

A, something
B, other thing

you can use this regex

A.+ B.+

But to type a return between the lines, you need to press Ctrl+q Ctrl+j

[see Emacs Key Syntax Explained]

Case Sensitivity

By default, Emacs regex is not case sensitive unless the pattern contains capital letters. That is, dragon will match {dragon, Dragon, DRAGON, draGON}. But Dragon will match only “Dragon”.

Case sensitivity is controlled by the variable case-fold-search. Alt+x toggle-case-fold-search to toggle it. Remember to toggle it back when you are done. Because case-fold-search is also used by isearch and basically all search or find/replace commands.

JavaScript vs Emacs Regex

Regex for most languages, JavaScript, Python, Ruby, Perl etc are similar. Emacs regex is different from them.

Here are practical major differences.

Major difference between emacs regex and other language's regex
JavaScriptsimilar emacs lisp
Capture"()""\\(\\)"
digit"\d""[0-9]" or similar "[[:digit:]]"
word"\w""[_A-Za-z0-9]" or similar "[[:word:]]"
whitespace"\s"similar "[ \n\t]" or "[[:space:]]"

Named Character Class and Syntax Table

Warning: the meaning of a named character class in emacs such as [[:word:]] is dependent on the current major mode's syntax table. [see Elisp: Syntax Table]

Syntax table is hard to work with, and regex using it may be unpredictable. Best is just to put the chars you want explicitly in your regex, for example, [A-Za-z0-9]. Use [[:word:]] if you need to match wester alphabets and also Chinese character or Russian character etc.

For a example showing the difference, see: Elisp: Regex Patterns and Syntax Table

Interactive Emacs Regex Mode

Emacs has a interactive regex mode. It show matches as you type. To go into the mode, Alt+x regexp-builder. (I don't like this)

I recommend using Alt+x list-matching-lines to test your pattern.

Regex in Emacs Lisp Code

Elisp: Regex Tutorial

Emacs Find Replace

If you have a question, put $5 at patreon and message me on xah discord.
Or support me by Buy Xah Emacs Tutorial

Emacs Tutorial

Emacs Init

Emacs Keys

ELisp

ELisp Examples

ELisp Write Major Mode


Emacs Tutorial

Quick Start

Manage Windows

File

Buffer

Find Replace

Copy/Paste

Unicode

Whitespace

Rectangle Edit

Line Wrap

Shell

View Special File

Working with Brackets

Org Mode

HTML

Emacs Efficiency

Misc