Emacs Lisp: Convert Unicode Chars to ASCII (Zap Gremlins)

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp command that converts some Unicode characters into ASCII. ⁖ “café” ⇒ “cafe”, “naïve” ⇒ “naive”.

(defun xah-asciify-region (&optional φfrom φto)
  "Change European language characters into equivalent ASCII ones, ⁖ “café” ⇒ “cafe”.
When called interactively, work on current line or text selection.

URL `http://ergoemacs.org/emacs/emacs_zap_gremlins.html'
Version 2015-05-01"
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (list (line-beginning-position) (line-end-position))))
  (let ((ξcharMap 
          ["á\\|à\\|â\\|ä\\|ā\\|ǎ\\|ã\\|å" "a"]
          ["é\\|è\\|ê\\|ë\\|ē\\|ě" "e"]
          ["í\\|ì\\|î\\|ï\\|ī\\|ǐ" "i"]
          ["ó\\|ò\\|ô\\|ö\\|õ\\|ǒ\\|ø\\|ō" "o"]
          ["ú\\|ù\\|û\\|ü\\|ū"     "u"]
          ["Ý\\|ý\\|ÿ"     "y"]
          ["ñ" "n"]
          ["ç" "c"]
          ["ð" "d"]
          ["þ" "th"]
          ["ß" "ss"]
          ["æ" "ae"]
    (let ((case-fold-search t))
        (narrow-to-region φfrom φto)
         (lambda (ξpair)
           (goto-char (point-min))
           (while (search-forward-regexp (elt ξpair 0) (point-max) t)
             (replace-match (elt ξpair 1))))
(defun xah-asciify-string (φstring)
  "Returns a new string. European language chars are changed ot ASCII ones ⁖ “café” ⇒ “cafe”.
See `xah-asciify-region'
Version 2014-10-20"
      (insert φstring)
      (xah-asciify-region (point-min) (point-max))

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus