Elisp: Unicode Representation in String

By Xah Lee. Date: . Last updated: .

In emacs lisp string, you can have Unicode characters directly (For example, "I β™₯ 😸"), or, you can represent Unicode char by the following syntax:

Note: the syntax is a bit ugly. Which one to use depends on whether the Unicode is in the range of 0 to 4 hex digits. (Each Unicode char is given a integer id, called its β€œcodepoint”. [see Unicode Basics: What's Character Set, Character Encoding, UTF-8?])

;; examples of Unicode char representation in string

;; lower case β€œa”
(search-forward "\u0061" )

;; β™₯ BLACK HEART SUIT codepoint 9829, #x2665
(search-forward "\u2665" )

;; 😸 GRINNING CAT FACE WITH SMILING EYES codepoint 128568, #x1f638
(search-forward "\U0001f638" )

;; β™₯ 😸

in the above example, the letter a's Unicode hex is just β€œ61”, so you need to pad it with β€œ00”.

in the above example, the grinning cat 😸's codepoint in hex is 5 digits. So, you need to use the "\U00xxxxxx" form, and because it's less than 6 digits, so you need to pad it with β€œ0”, resulting β€œ000” there.

Note: you can find a Unicode char's codepoint by Alt+x describe-char. [see Emacs: Unicode Tutorial]

Why is Encoded Unicode Char Useful?

The use of encoded representation is useful when you want to represent non-printable chars, such as {RIGHT-TO-LEFT MARK, ZERO WIDTH NO-BREAK SPACE, NO-BREAK SPACE}. Example:

(defun replace-BOM-mark-etc ()
  "Query replace some invisible Unicode chars.
The chars to be searched are:
 RIGHT-TO-LEFT MARK 8207 x200f
 ZERO WIDTH NO-BREAK SPACE 65279 xfeff

start on cursor position to end."
  (interactive)
  (query-replace-regexp "\u200f\\|\ufeff" ""))

see

(info "(elisp) General Escape Syntax")

Font/Unicode Topic

  1. Unicode Tutorial
  2. Emacs File Encoding FAQ
  3. Unicode Basics: What's Character Set, Character Encoding, UTF-8?
  4. Xah Math Input Mode
  5. Xah Unicode Browser Mode

  1. Best Unicode Fonts for Programer
  2. Font Setup
  3. Proportional Font
  4. Cycle Fonts by Command
  5. Set Line Spacing
  6. toggle-word-wrap
  7. Elisp: Unicode Representation in String
Patreon me $5 patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboards for Emacs

If you have a question, put $5 at patreon and message me.