Emacs Lisp: Convert Unicode String to ASCII (Zap Gremlins)

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp command that changes Unicode string into ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.

Emacs Lisp Solution

Here's a solution.

(defun asciify-text (ξstring &optional ξfrom ξto)
"Change some Unicode characters into equivalent ASCII ones.
For example, “passé” becomes “passe”.

This function works on chars in European languages, and does not transcode arbitrary Unicode chars (such as Greek, math symbols).  Un-transformed unicode char remains in the string.

When called interactively, work on text selection or current block.

When called in lisp code, if ξfrom is nil, returns a changed string, else, change text in the region between positions ξfrom ξto."
  (interactive
   (if (use-region-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)) )
       (list nil (car bds) (cdr bds)) ) ) )

  (require 'xfrp_find_replace_pairs)

  (let (workOnStringP
        inputStr
        (charChangeMap [
                        ["á\\|à\\|â\\|ä\\|ã\\|å" "a"]
                        ["é\\|è\\|ê\\|ë" "e"]
                        ["í\\|ì\\|î\\|ï" "i"]
                        ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"]
                        ["ú\\|ù\\|û\\|ü"     "u"]
                        ["Ý\\|ý\\|ÿ"     "y"]
                        ["ñ" "n"]
                        ["ç" "c"]
                        ["ð" "d"]
                        ["þ" "th"]
                        ["ß" "ss"]
                        ["æ" "ae"]
                        ])
        )
    (setq workOnStringP (if ξfrom nil t))
    (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))
    (if workOnStringP
        (let ((case-fold-search t)) (replace-regexp-pairs-in-string inputStr charChangeMap) )
      (let ((case-fold-search t)) (replace-regexp-pairs-region ξfrom ξto charChangeMap) )) ) )

You'll need xfrp_find_replace_pairs.el

TODO

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus