Emacs: Text Pattern Matching (regex) tutorial

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page is a tutorial on emacs regex.

Regex Commands

The most commonly used command that uses regex is query-replace-regexp. 〔➤ Emacs: Find/Replace Tutorial

Others userful ones are:

There are many others. You can list them all by calling apropos-command, then type “regex”.

Emacs Regex Syntax

Here's commonly used patterns.

PatternMatches
.Any single character except newline ("\n").
\.One period
[0-9]+one or more digits
[A-Za-z]+one or more letters
[-A-Za-z0-9]+one or more {letter, digit, hyphen}
[_A-Za-z0-9]+one or more {letter, digit, underscore}
[-_A-Za-z0-9]+one or more {letter, digit, hyphen, underscore}
[[:ascii:]]+one or more ASCII chars. (codepoint 0 to 127, inclusive)
[[:nonascii:]]+one or more none-ASCII characters (⁖ Unicode characters)
[\n\t ]+one or more {newline character, tab, space}.
PatternMatches
"\([^"]+\)"capture text between double quotes.
repetition
PatternMatches
+match previous pattern 1 or more times
*match previous pattern 0 or more times
?match previous pattern 0 or 1 time
+?match previous pattern 1 or more times, but with minimal match (aka non-greedy)
boundary anchors
PatternMatches
^…Beginning of {line, string, buffer}
…$End of {line, string, buffer}
\`…Beginning of {string, buffer}
…\'End of {string, buffer}
\bword boundary marker

Unicode character can be used literally. But for non-printable ones such as “RIGHT-TO-LEFT MARK”, you can represent them by a code. See: Emacs: Newline Representation ^M ^J ^L

For complete list of regex syntax, see: (info "(elisp) Syntax of Regexps")

Matching Newline & Tab

When using interactive commands, emacs won't understand \n or \t.

(For explanation, see: Emacs's Key Notations Explained (/r ^M C-m RET <return> M- meta)).

Case Sensitivity

When using [a-z], it is not case sensitive by default. Case sensitivity is controlled by the variable case-fold-search. Call toggle-case-fold-search to toggle it.

Do not use [A-z], because that'll match some punctuation chars too. Use [A-Za-z].

Perl Regex vs Emacs Regex

Here are some practical major differences.

perlemacs lisp
Capture(…)\(…\)
digit\d[[:digit:]]
word\w[[:word:]]
whitespace\s[[:space:]]

For example, Perl's \d+ is emacs's [[:digit:]]+.

Warning: the meaning of a character class in emacs is dependent on the current major mode's syntax table. For example, what chars are considered “word” in [[:word:]] depends on how its defined in syntax table of current major mode. Syntax table is hard to work with. Best is just to put the chars you want explicitly in your regex, ⁖ [A-Za-z].

Interactive Emacs Regex Mode

Emacs has a interactive regex mode. It show matches as you type. To go into the mode, call regexp-builder. (I don't use this)

Alternatively, call query-replace-regexp to test your pattern. Ι prefer this.

Regex in Emacs Lisp Code

See: Emacs Lisp: Regex Tutorial

Reference

(info "(emacs) Regexps")

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus