Emacs: Remove Accent Marks

By Xah Lee. Date: . Last updated: .

Here's a emacs command that removes accent marks, or, convert some Unicode characters into ASCII. (aka Zap Gremlins)

For example:

(defun xah-asciify-text (&optional @begin @end)
  "Remove accents in some letters and some
Change European language characters into equivalent ASCII ones, e.g. “café” ⇒ “cafe”.
When called interactively, work on current line or text selection.

URL `http://ergoemacs.org/emacs/emacs_zap_gremlins.html'
Version 2018-11-12"
  (interactive)
  (let (($charMap
         [
          ["ß" "ss"]
          ["á\\|à\\|â\\|ä\\|ā\\|ǎ\\|ã\\|å\\|ą\\|ă\\|ạ\\|ả\\|ả\\|ấ\\|ầ\\|ẩ\\|ẫ\\|ậ\\|ắ\\|ằ\\|ẳ\\|ặ" "a"]
          ["æ" "ae"]
          ["ç\\|č\\|ć" "c"]
          ["é\\|è\\|ê\\|ë\\|ē\\|ě\\|ę\\|ẹ\\|ẻ\\|ẽ\\|ế\\|ề\\|ể\\|ễ\\|ệ" "e"]
          ["í\\|ì\\|î\\|ï\\|ī\\|ǐ\\|ỉ\\|ị" "i"]
          ["ñ\\|ň\\|ń" "n"]
          ["ó\\|ò\\|ô\\|ö\\|õ\\|ǒ\\|ø\\|ō\\|ồ\\|ơ\\|ọ\\|ỏ\\|ố\\|ổ\\|ỗ\\|ộ\\|ớ\\|ờ\\|ở\\|ợ" "o"]
          ["ú\\|ù\\|û\\|ü\\|ū\\|ũ\\|ư\\|ụ\\|ủ\\|ứ\\|ừ\\|ử\\|ữ\\|ự"     "u"]
          ["ý\\|ÿ\\|ỳ\\|ỷ\\|ỹ"     "y"]
          ["þ" "th"]
          ["ď\\|ð\\|đ" "d"]
          ["ĩ" "i"]
          ["ľ\\|ĺ\\|ł" "l"]
          ["ř\\|ŕ" "r"]
          ["š\\|ś" "s"]
          ["ť" "t"]
          ["ž\\|ź\\|ż" "z"]
          [" " " "]       ; thin space etc
          ["–" "-"]       ; dash
          ["—\\|一" "--"] ; em dash etc
          ])
        $begin $end
        )
    (if (null @begin)
        (if (use-region-p)
            (setq $begin (region-beginning) $end (region-end))
          (setq $begin (line-beginning-position) $end (line-end-position)))
      (setq $begin @begin $end @end))
    (let ((case-fold-search t))
      (save-restriction
        (narrow-to-region $begin $end)
        (mapc
         (lambda ($pair)
           (goto-char (point-min))
           (while (search-forward-regexp (elt $pair 0) (point-max) t)
             (replace-match (elt $pair 1))))
         $charMap)))))
(defun xah-asciify-string (@string)
  "Returns a new string. European language chars are changed ot ASCII ones e.g. “café” ⇒ “cafe”.
See `xah-asciify-text'
Version 2015-06-08"
  (with-temp-buffer
      (insert @string)
      (xah-asciify-text (point-min) (point-max))
      (buffer-string)))

[see Accent Marks: Trema, Umlaut, Macron, Circumflex, and All That]

( thanks to robert_nagy for adding chars)

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

http://groups.google.com/group/comp.emacs/msg/8d58b6e9b2bd07fd

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Emacs Text Transform Under Cursor

  1. Toggle Letter Case
  2. Title Case
  3. Upcase Sentences
  4. Cycle Space Hyphen Underscore
  5. Escape Quotes
  6. Quote Lines
  7. Spaces to New Lines
  8. Change Brackets/Quotes
  9. Remove Accent Marks
  10. Convert Straight/Curly Quotes
  11. Convert English/Chinese Punctuations
  12. Color Conversion (RGB, HSL, HSV)
  13. Decimal to Hexadecimal
  14. Replace Greek Letter Names to Unicode
  15. Twitterfy Text
  16. Toggle line wrap
  17. Clean White Space

If you have a question, put $5 at patreon and message me.
Or Buy Xah Emacs Tutorial
Or buy a nice keyboard: Best Keyboards for Emacs

Emacs

Emacs Lisp

Misc