Emacs: Regex Tutorial

By Xah Lee. Date: . Last updated: .

Regex lets you find text by pattern, such as any characters that are repeated exactly twice. Regex is like wildcard, but more flexible.

Regex Commands

The most commonly used commands that use regex is query-replace-regexp. 〔►see Emacs: Find and Replace Commands

Others userful ones are:

Emacs Regex Syntax

Here's commonly used patterns.

.Any single character except newline ("\n").
\.One period
[0-9]+One or more digits
[^0-9]+One or more non-digit characters
[A-Za-z]+one or more letters
[-A-Za-z0-9]+one or more {letter, digit, hyphen}
[_A-Za-z0-9]+one or more {letter, digit, underscore}
[-_A-Za-z0-9]+one or more {letter, digit, hyphen, underscore}
[[:ascii:]]+one or more ASCII chars. (codepoint 0 to 127, inclusive)
[[:nonascii:]]+one or more none-ASCII characters (For example, Unicode characters)
[\n\t ]+one or more {newline character, tab, space}.
"\([^"]+\)"capture text between double quotes.
+match previous pattern 1 or more times. e.g. a+ means 1 or more occurrence of “a”. [0-9]+ means 1 or more occurrence of digit.
*match previous pattern 0 or more times
?match previous pattern 0 or 1 time
+?match previous pattern 1 or more times, but with minimal match (aka non-greedy)
boundary anchors
^…Beginning of {line, string, buffer}
…$End of {line, string, buffer}
\`…Beginning of {string, buffer}
…\'End of {string, buffer}
\bword boundary marker

Unicode character can be used literally, for example, will find the right arrow character.

You can also represent any character by a coded syntax such as \u2192. See: 〔►see Emacs Lisp: Unicode Representation in String

For complete list of regex syntax, see: (info "(elisp) Syntax of Regexps")

Newline and Tab

When using interactive commands, emacs won't understand \n or \t.

(For explanation, see: Emacs's Key Syntax Explained).

Case Sensitivity

By default, Emacs regex is not case sensitive unless the pattern contains capital letters. That is, dragon will match “dragon” and “Dragon” and “DRAGON” and “draGON”. But Dragon will match only “Dragon”.

Case sensitivity is controlled by the variable case-fold-search. Alt+x toggle-case-fold-search to toggle it. Remember to toggle it back when you are done. Because case-fold-search is also used by isearch and basically all search or find/replace commands.

Do not use [A-z], because that'll match some punctuation chars too. Use [A-Za-z].

JavaScript vs Emacs Regex

Regex for most languages, JavaScript, Python, Ruby, Perl etc are similar. Emacs regex is different from them.

Here are practical major differences.

Major difference between emacs regex and other language's regex
JavaScriptemacs lisp

For example, JavaScript's \d+ is emacs's [[:digit:]]+.

Note: the meaning of a character class in emacs is dependent on the current buffer's syntax table. For example, what chars are considered “word” in [[:word:]] depends on how its defined in syntax table of current major mode. But practically, it's what you'd expect.

For a example showing the difference, see: Emacs Lisp: Regex Patterns and Syntax Table.

Syntax table is hard to work with, and regex using it may be unpredictable. Best is just to put the chars you want explicitly in your regex, for example, use [A-Za-z] instead of [[:word:]] unless you also need to match Chinese character or Russian character etc.

Interactive Emacs Regex Mode

Emacs has a interactive regex mode. It show matches as you type. To go into the mode, Alt+x regexp-builder. (I don't like this)

Alternatively, Alt+x query-replace-regexp to test your pattern. Ι prefer this.

Regex in Emacs Lisp Code

Emacs Lisp: Regex Tutorial


(info "(emacs) Regexps")

Find and Replace Topic

  1. Emacs: Search / Highlight Words
  2. Emacs: Search Text in Files
  3. Emacs: Find and Replace Commands
  4. Emacs: Find Replace Text in Directory
  5. Emacs: Regex Tutorial
  6. Emacs: isearch Current Word
  7. Emacs: xah-find.el, Find Replace in Pure Elisp
Like it? Buy Xah Emacs Tutorial. Thanks.