Emacs Lisp: Convert Unicode Chars to ASCII (Zap Gremlins)

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp command that converts some Unicode characters into ASCII. ⁖ “café” ⇒ “cafe”, “naïve” ⇒ “naive”.

(defun xah-asciify-region (&optional φfrom φto)
  "Change European language characters into equivalent ASCII ones, ⁖ “café” ⇒ “cafe”.
When called interactively, work on current line or text selection.

URL `http://ergoemacs.org/emacs/emacs_zap_gremlins.html'
Version 2015-01-19"
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (list (line-beginning-position) (line-end-position))))
  (let ((ξcharMap [

                         ["á\\|à\\|â\\|ä\\|ã\\|å\\|ā" "a"]
                         ["é\\|è\\|ê\\|ë\\|ē" "e"]
                         ["í\\|ì\\|î\\|ï\\|ī" "i"]
                         ["ó\\|ò\\|ô\\|ö\\|õ\\|ø\\|ō" "o"]
                         ["ú\\|ù\\|û\\|ü\\|ū"     "u"]
                         ["Ý\\|ý\\|ÿ"     "y"]
                         ["ñ" "n"]
                         ["ç" "c"]
                         ["ð" "d"]
                         ["þ" "th"]
                         ["ß" "ss"]
                         ["æ" "ae"]
    (let ((case-fold-search t))
          (narrow-to-region φfrom φto)
           (lambda (ξpair)
             (goto-char (point-min))
             (while (search-forward-regexp (elt ξpair 0) (point-max) t)
               (replace-match (elt ξpair 1))))
(defun xah-asciify-string (φstring)
  "Returns a new string. European language chars are changed ot ASCII ones ⁖ “café” ⇒ “cafe”. 
See `xah-asciify-region'
Version 2014-10-20"
      (insert φstring)
      (xah-asciify-region (point-min) (point-max))


Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus