Emacs Lisp: Convert Unicode String to ASCII (Zap Gremlins)

Master emacs+lisp, benefit for life. Testimonials. Thank you for support.
, , …,

This page shows a emacs lisp command that changes Unicode string into ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.

Emacs Lisp Solution

Here's a solution.

(defun asciify-text (ξstring &optional ξfrom ξto)
"Change some Unicode characters into equivalent ASCII ones.
For example, “passé” becomes “passe”.

This function works on chars in European languages, and does not transcode arbitrary Unicode chars (such as Greek, math symbols).  Un-transformed unicode char remains in the string.

When called interactively, work on text selection or current block.

When called in lisp code, if ξfrom is nil, returns a changed string, else, change text in the region between positions ξfrom ξto."
   (if (use-region-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)) )
       (list nil (car bds) (cdr bds)) ) ) )

  (require 'xfrp_find_replace_pairs)

  (let (workOnStringP
        (charChangeMap [
                        ["á\\|à\\|â\\|ä\\|ã\\|å" "a"]
                        ["é\\|è\\|ê\\|ë" "e"]
                        ["í\\|ì\\|î\\|ï" "i"]
                        ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"]
                        ["ú\\|ù\\|û\\|ü"     "u"]
                        ["Ý\\|ý\\|ÿ"     "y"]
                        ["ñ" "n"]
                        ["ç" "c"]
                        ["ð" "d"]
                        ["þ" "th"]
                        ["ß" "ss"]
                        ["æ" "ae"]
    (setq workOnStringP (if ξfrom nil t))
    (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))
    (if workOnStringP
        (let ((case-fold-search t)) (replace-regexp-pairs-in-string inputStr charChangeMap) )
      (let ((case-fold-search t)) (replace-regexp-pairs-region ξfrom ξto charChangeMap) )) ) )

You'll need xfrp_find_replace_pairs.el


Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Like what you read?
Buy Xah Emacs Tutorial
or share some
blog comments powered by Disqus