Elisp: Unicode Escape Sequence

By Xah Lee. Date: . Last updated: .

In emacs lisp string, you can have Unicode characters directly (For example, "I β™₯ 😸"), or, you can represent Unicode char by the following syntax:

Note: the syntax is a bit ugly. Which one to use depends on whether the Unicode is in the range of 0 to 4 hex digits. (Each Unicode char is given a integer id, called its β€œcodepoint”. [see Unicode Basics])

;; examples of Unicode char representation in string

;; lower case β€œa”
(search-forward "\u0061" )

;; β™₯ BLACK HEART SUIT codepoint 9829, #x2665
(search-forward "\u2665" )

;; 😸 GRINNING CAT FACE WITH SMILING EYES codepoint 128568, #x1f638
(search-forward "\U0001f638" )

;; β™₯ 😸

in the above example, the letter a's Unicode hex is just β€œ61”, so you need to pad it with β€œ00”.

in the above example, the grinning cat 😸's codepoint in hex is 5 digits. So, you need to use the "\U00xxxxxx" form, and because it's less than 6 digits, so you need to pad it with β€œ0”, resulting β€œ000” there.

Note: you can find a Unicode char's codepoint by Alt+x describe-char.

[see Emacs: Unicode Tutorial]

Why is Encoded Unicode Char Useful?

The use of encoded representation is useful when you want to represent non-printable chars, such as {RIGHT-TO-LEFT MARK, ZERO WIDTH NO-BREAK SPACE, NO-BREAK SPACE}. Example:

(defun replace-BOM-mark-etc ()
  "Query replace some invisible Unicode chars.
The chars to be searched are:
 RIGHT-TO-LEFT MARK 8207 x200f

start on cursor position to end."
  (query-replace-regexp "\u200f\\|\ufeff" ""))

(info "(elisp) General Escape Syntax")

Unicode Topic

  1. Unicode Basics
  2. Unicode Tutorial
  3. Emacs File Encoding FAQ
  4. Best Unicode Fonts for Programer
  5. Elisp: Unicode Escape Sequence
  6. Xah Math Input Mode
  7. Xah Unicode Browser Mode

If you have a question, put $5 at patreon and message me.
Or Buy Xah Emacs Tutorial
Or buy a nice keyboard: Best Keyboards for Emacs


Emacs Lisp