This page is a tutorial on emacs regex.
Emacs's regex is not based on Perl or Python's, but is very similar. In emacs regex, the parenthesis characters () are
literal. If you want to capture a pattern, you need to escape the
paren like this: \(myPattern\).
Here are some common patterns:
| Pattern | Matches |
|---|---|
. | Any single character |
\. | One period |
[0-9]+ | Sequence of digits |
[A-Za-z]+ | Sequence of letters |
[-A-Za-z0-9]+ | Sequence of letter, digit, hyphen |
[_A-Za-z0-9]+ | Sequence of letter, digit, underscore |
[-_A-Za-z0-9]+ | Sequence of letter, digit, hyphen, underscore |
[[:ascii:]]+ | Sequence of ASCII chars. |
[[:nonascii:]]+ | Sequence of none ASCII chars (⁖ Unicode chars) |
[\t\n ]+ | Sequence of {tab, newline, space}. (in elisp code only. For interactive regex commands, use literal character. Insert by 【Ctrl+q Ctrl+j】) |
| Pattern | Matches |
|---|---|
"\([^"]+\)" | capture text between double quotes (greedy; match as far to the right as possible) |
“\([^”]+\)” | capture text between curly double quotes |
(\([^)]+\)) | capture text between parenthesis |
| Pattern | Matches |
|---|---|
+ | means match previous pattern 1 or more times |
* | means match previous pattern 0 or more times |
? | means match previous pattern 0 or 1 time |
+? | means match previous pattern 1 or more times, but with minimal match (aka non-greedy) |
| Pattern | Matches |
|---|---|
^… | Beginning of {line, string, buffer} |
…$ | End of {line, string, buffer} |
\`… | Beginning of {string, buffer} |
…\' | End of {string, buffer} |
\b | word boundary marker |
• When using interactive commands for find/replace, emacs won't understand \t, \n. To enter a Tab character, press 【Ctrl+q Tab ↹】. To enter a new line, press 【Ctrl+q Ctrl+j】. (For explanation, see: Emacs's Key Notations Explained (/r, ^M, C-m, RET, <return>, M-, meta)).
• When a file is opened in Emacs, newline is always \n, regardless whether your file is from Unix, Windows, Mac. Do NOT manually do find/replace on newline chars for file newline convention. 〔☛ Emacs: Newline Representations ^M ^J ^L〕
When using [a-z], it is not case sensitive by default. Case sensitivity is controlled by the variable “case-fold-search”. Call toggle-case-fold-search to toggle it.
Do not use [A-z]. It'll match punctuation chars between ASCII “Z” and “a”. Use [A-Za-z].
Here are some practical major differences.
| perl | emacs | |
|---|---|---|
| Capture | (…) | \(…\) |
| digit | \d | [[:digit:]] |
| word | \w | [[:word:]] |
| space | \s | [[:space:]] |
| newline | \n | \n in elisp string, or 【Ctrl+q Ctrl+j】 in interactive command. |
For example, Perl's \d+ is emacs's
[[:digit:]]+.
Also, the meaning of a character class in emacs may be
dependent on the current major mode's syntax table. For example, what
chars are considered “word” in [[:word:]] depends on how
its defined in syntax table of current major mode. Syntax table is
hard to work with. Best is just to put the chars you want explicitly
in your regex, ⁖ [A-Za-z].
Emacs has a interactive regex mode. It show matches as you type. To go into the mode, call regexp-builder. (I don't use this)
Alternatively, call query-replace-regexp to test your pattern. Ι prefer this.
Regex is used in elisp code too, just like Perl as a language.
To test regex in your elisp code, open a empty file and place the regex function at top and the text you want to match below it, like this:
(search-forward-regexp "yourRegex")
whatever text here
Then, put your cursor to the right of the closing parenthesis,
then call eval-last-sexp 【Ctrl+x Ctrl+e】. If your regex matches, it'll move cursor to the last
char of the matched text. If you get a lisp error saying search failed, then
your regex didn't match. If you get a lisp syntax error, then you probably
screwed up on the backslashs.
In a lisp regex function that takes a regex string
(⁖ search-forward-regexp), you will need to use double backslash. This is
because, in elisp string, a backslash needs to be prefixed with a backslash,
then, this interpreted string is passed to emacs's regex engine.
Here's General Rule:
\n, \t, …}." needs to have a backslash in front, i.e. \"For example, suppose you have this text:
<img src="cat.jpg" alt="my cat" width="795" height="183" />
When you call a command such as query-replace-regexp, you can type the regex in the prompt. Example:
<img src="\([^"]+?\)" alt="\([^"]+?\)" width="\([0-9]+\)" height="\([0-9]+\)" />
But in lisp code, the same regex needs to have many backslash escapes, like this:
(search-forward-regexp "<img src=\"\\([^\"]+?\\)\" alt=\"\\([^\"]+?\\)\" width=\"\\([0-9]+\\)\" height=\"\\([0-9]+\\)\" />" )
The following should have single backslash only: {\n, \t, \"}.