Emacs Lisp: Convert Unicode Chars to ASCII (Zap Gremlins)

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp command that converts some Unicode characters into ASCII. ⁖ “café” ⇒ “cafe”, “naïve” ⇒ “naive”.

(defun xah-asciify-region (&optional φfrom φto)
  "Change European language characters into equivalent ASCII ones, ⁖ “café” ⇒ “cafe”.

This command does not transcode all Unicode chars such as Greek, math symbols. They remains.

When called interactively, work on text selection or current line.
URL `http://ergoemacs.org/emacs/emacs_zap_gremlins.html'
Version 2014-10-20"
  (interactive
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (list (line-beginning-position) (line-end-position))))
  (let ((ξcharMap [
                         ["á\\|à\\|â\\|ä\\|ã\\|å" "a"]
                         ["é\\|è\\|ê\\|ë" "e"]
                         ["í\\|ì\\|î\\|ï" "i"]
                         ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"]
                         ["ú\\|ù\\|û\\|ü"     "u"]
                         ["Ý\\|ý\\|ÿ"     "y"]
                         ["ñ" "n"]
                         ["ç" "c"]
                         ["ð" "d"]
                         ["þ" "th"]
                         ["ß" "ss"]
                         ["æ" "ae"]
                         ]))
    (let ((case-fold-search t))
        (save-restriction
          (narrow-to-region φfrom φto)
          (mapc
           (lambda (ξpair)
             (goto-char (point-min))
             (while (search-forward-regexp (elt ξpair 0) (point-max) t)
               (replace-match (elt ξpair 1))))
           ξcharMap)))))
(defun xah-asciify-string (φstring)
  "Returns a new string. European language chars are changed ot ASCII ones ⁖ “café” ⇒ “cafe”. 
See `xah-asciify-region'
Version 2014-10-20"
  (with-temp-buffer 
      (insert φstring)
      (xah-asciify-region (point-min) (point-max))
      (buffer-string)))

TODO

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus