This page shows a emacs lisp command that changes Unicode string into ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.
Here's a solution.
(defun asciify-text (ξstring &optional ξfrom ξto) "Change some Unicode characters into equivalent ASCII ones. For example, “passé” becomes “passe”. This function works on chars in European languages, and does not transcode arbitrary Unicode chars (such as Greek, math symbols). Un-transformed unicode char remains in the string. When called interactively, work on text selection or current block. When called in lisp code, if ξfrom is nil, returns a changed string, else, change text in the region between positions ξfrom ξto." (interactive (if (region-active-p) (list nil (region-beginning) (region-end)) (let ((bds (bounds-of-thing-at-point 'paragraph)) ) (list nil (car bds) (cdr bds)) ) ) ) (require 'xfrp_find_replace_pairs) (let (workOnStringP inputStr (charChangeMap [ ["á\\|à\\|â\\|ä\\|ã\\|å" "a"] ["é\\|è\\|ê\\|ë" "e"] ["í\\|ì\\|î\\|ï" "i"] ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"] ["ú\\|ù\\|û\\|ü" "u"] ["Ý\\|ý\\|ÿ" "y"] ["ñ" "n"] ["ç" "c"] ["ð" "d"] ["þ" "th"] ["ß" "ss"] ["æ" "ae"] ]) ) (setq workOnStringP (if ξfrom nil t)) (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto))) (if workOnStringP (let ((case-fold-search t)) (replace-regexp-pairs-in-string inputStr charChangeMap) ) (let ((case-fold-search t)) (replace-regexp-pairs-region ξfrom ξto charChangeMap) )) ) )
You'll need xfrp_find_replace_pairs.el
[:nonascii:]) Or, leave untranslated Unicode chars as is.This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.
Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. Here's Teemu's code.
(defun asciify-string (string) "Convert STRING to ASCII string. For example: “passé” becomes “passe” Code originally by Teemu Likonen." (with-temp-buffer (insert string) (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT") (buffer-substring-no-properties (point-min) (point-max))))
Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.
perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'
Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .