Emacs: Remove Accent Marks

By Xah Lee. Date: . Last updated: .

Here's a emacs command that removes accent marks, or, convert some Unicode characters into ASCII. (aka Zap Gremlins)

For example:

(defun xah-asciify-text (&optional @begin @end)
  "Change European language characters into equivalent ASCII ones, e.g. “café” ⇒ “cafe”.
When called interactively, work on current line or text selection.

URL `http://ergoemacs.org/emacs/emacs_zap_gremlins.html'
Version 2016-07-12"
  (interactive)
  (let (($charMap
         [
          ["á\\|à\\|â\\|ä\\|ā\\|ǎ\\|ã\\|å\\|ą" "a"]
          ["é\\|è\\|ê\\|ë\\|ē\\|ě\\|ę" "e"]
          ["í\\|ì\\|î\\|ï\\|ī\\|ǐ" "i"]
          ["ó\\|ò\\|ô\\|ö\\|õ\\|ǒ\\|ø\\|ō" "o"]
          ["ú\\|ù\\|û\\|ü\\|ū"     "u"]
          ["Ý\\|ý\\|ÿ"     "y"]
          ["ç\\|č\\|ć" "c"]
          ["ď\\|ð" "d"]
          ["ľ\\|ĺ\\|ł" "l"]
          ["ñ\\|ň\\|ń" "n"]
          ["þ" "th"]
          ["ß" "ss"]
          ["æ" "ae"]
          ["š\\|ś" "s"]
          ["ť" "t"]
          ["ř\\|ŕ" "r"]
          ["ž\\|ź\\|ż" "z"]
          ])
        $begin $end
        )
    (if (null @begin)
        (if (use-region-p)
            (progn
              (setq $begin (region-beginning))
              (setq $end (region-end)))
          (progn
            (setq $begin (line-beginning-position))
            (setq $end (line-end-position))))
      (progn
        (setq $begin @begin)
        (setq $end @end)))
    (let ((case-fold-search t))
      (save-restriction
        (narrow-to-region $begin $end)
        (mapc
         (lambda ($pair)
           (goto-char (point-min))
           (while (re-search-forward (elt $pair 0) (point-max) t)
             (replace-match (elt $pair 1))))
         $charMap)))))
(defun xah-asciify-string (@string)
  "Returns a new string. European language chars are changed ot ASCII ones e.g. “café” ⇒ “cafe”.
See `xah-asciify-text'
Version 2015-06-08"
  (with-temp-buffer
      (insert @string)
      (xah-asciify-text (point-min) (point-max))
      (buffer-string)))

[see Accent Marks: Trema, Umlaut, Macron, Circumflex, and All That]

( thanks to robert_nagy for adding chars)

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Text Transform Topic

  1. Emacs: Toggle Letter Case
  2. Emacs: Change to Title Case
  3. Emacs: Upcase Sentences
  4. Emacs: Cycle Replace Space Hyphen Underscore
  5. Emacs: Remove Accent Marks
  6. Emacs: Escape Quotes Command
  7. Emacs: Spaces to New Lines
  8. Emacs: Quote Lines
  9. Emacs: Change Brackets and Quotes
  10. Emacs: CSS Compressor
  11. Emacs: Replace Greek Letter Names to Unicode
  12. Emacs: Convert Straight/Curly Quotes
  13. Emacs: Convert English/Chinese Punctuations
  14. Emacs: Lines to HTML Table
Liket it? Put $1 at patreon. Or Buy Xah Emacs Tutorial. Thanks.