Emacs: Regex Tutorial

By Xah Lee. Date: . Last updated: .

This page is a tutorial on emacs regex. Regex lets you find text patterns.

Regex Commands

The most commonly used command that uses regex is query-replace-regexp. 〔►see Emacs: Find and Replace Commands

Others userful ones are:

There are many others. You can list them all by calling apropos-command, then type “regex”.

Emacs Regex Syntax

Here's commonly used patterns.

PatternMatches
.Any single character except newline ("\n").
\.One period
[0-9]+One or more digits
[^0-9]+One or more non-digit characters
[A-Za-z]+one or more letters
[-A-Za-z0-9]+one or more {letter, digit, hyphen}
[_A-Za-z0-9]+one or more {letter, digit, underscore}
[-_A-Za-z0-9]+one or more {letter, digit, hyphen, underscore}
[[:ascii:]]+one or more ASCII chars. (codepoint 0 to 127, inclusive)
[[:nonascii:]]+one or more none-ASCII characters (For example, Unicode characters)
[\n\t ]+one or more {newline character, tab, space}.
PatternMatches
"\([^"]+\)"capture text between double quotes.
repetition
PatternMatches
+match previous pattern 1 or more times
*match previous pattern 0 or more times
?match previous pattern 0 or 1 time
+?match previous pattern 1 or more times, but with minimal match (aka non-greedy)
boundary anchors
PatternMatches
^…Beginning of {line, string, buffer}
…$End of {line, string, buffer}
\`…Beginning of {string, buffer}
…\'End of {string, buffer}
\bword boundary marker

Unicode character can be used literally, for example, will find the right arrow character.

You can also represent any character by a coded syntax such as \u2192. See: 〔►see Emacs Lisp: Unicode Representation in String

For complete list of regex syntax, see: (info "(elisp) Syntax of Regexps")

Newline and Tab

When using interactive commands, emacs won't understand \n or \t.

(For explanation, see: Emacs's Key Syntax Explained).

Case Sensitivity

When using [a-z], it is not case sensitive by default. Case sensitivity is controlled by the variable case-fold-search. Call toggle-case-fold-search to toggle it. Remember to toggle it back when you are done. Because case-fold-search is also used by isearch and basically all search or find/replace commands.

Do not use [A-z], because that'll match some punctuation chars too. Use [A-Za-z].

JavaScript vs Emacs Regex

Regex for most languages, JavaScript, Python, Ruby, Perl etc are similar. Emacs regex is different from them.

Here are some practical major differences.

JavaScriptemacs lisp
Capture(…)\(…\)
digit\d[[:digit:]]
word\w[[:word:]]
whitespace\s[[:space:]]

For example, JavaScript's \d+ is emacs's [[:digit:]]+.

Warning: the meaning of a character class in emacs is dependent on the current major mode's syntax table. For example, what chars are considered “word” in [[:word:]] depends on how its defined in syntax table of current major mode. But practically, it's what you'd expect.

For a example showing the difference, see: Emacs Lisp: Regex Patterns and Syntax Table.

Syntax table is hard to work with, and regex using it may be unpredictable. Best is just to put the chars you want explicitly in your regex, for example, [A-Za-z].

Interactive Emacs Regex Mode

Emacs has a interactive regex mode. It show matches as you type. To go into the mode, call regexp-builder. (I don't use this)

Alternatively, call query-replace-regexp to test your pattern. Ι prefer this.

Regex in Emacs Lisp Code

Emacs Lisp: Regex Tutorial

Reference

(info "(emacs) Regexps")

Find and Replace Topic

  1. Emacs: Search / Highlight Words
  2. Emacs: Search Text in Files (grep)
  3. Emacs: Find and Replace Commands
  4. Emacs: Find Replace Text in Directory
  5. Emacs: Regex Tutorial
  6. Emacs: isearch Current Word
  7. Emacs: xah-find.el, Find Replace in Pure Elisp
Like it? Buy Xah Emacs Tutorial. Thanks.

or, buy something from Best Keyboard for Emacs