Emacs Lisp: Convert Punctuation Between English/Chinese Forms

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp code to convert to/from Chinese/English punctuations.

If you type Chinese or Japanese mixed with English, then often you'll have mixed Asian/Western punctuations, and is laborious to fix manually. Here's a code that will help fix it.

This is used to convert punctuation from English to/from Asian's full-width form. 〔➤ Unicode Full-Width Characters〕Example:

(require 'xfrp_find_replace_pairs)
(require 'xeu_elisp_util)

(defun xah-convert-english-chinese-punctuation (p1 p2 &optional φto-direction)
  "Convert punctuation from/to English/Chinese characters.

When called interactively, do current text block or selection. The conversion direction is automatically determined.

If `universal-argument' is called, ask user for change direction.

When called in lisp code, p1 p2 are region begin/end positions. φto-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」.

See also: `xah-remove-punctuation-trailing-redundant-space'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
Version 2015-02-04
"
  (interactive
   (let ( (ξboundary (get-selection-or-unit 'block)))
     (list (elt ξboundary 1) (elt ξboundary 2)
           (if current-prefix-arg
               (ido-completing-read
                "Change to: "
                '( "english"  "chinese")
                "PREDICATE"
                "REQUIRE-MATCH")
             "auto"
             ))))
  (let (
        (ξinput-str (buffer-substring-no-properties p1 p2))
        (ξenglish-chinese-punctuation-map
         [
          [". " "。"]
          [".\n" "。\n"]
          [", " ","]
          [",\n" ",\n"]
          [": " ":"]
          ["; " ";"]
          ["? " "?"] ; no space after
          ["! " "!"]

          ;; for inside HTML
          [".</" "。</"]
          ["?</" "?</"]
          [":</" ":</"]
          ]
         ))

    (when (string= φto-direction "auto")
      (setq
       φto-direction
       (if
           (or (string-match "。" ξinput-str)
               (string-match "," ξinput-str)
               (string-match "?" ξinput-str)
               (string-match "!" ξinput-str)) 
           "english"
         "chinese")))

    (replace-pairs-region
     p1 p2
     (cond
      ((string= φto-direction "chinese") ξenglish-chinese-punctuation-map)
      ((string= φto-direction "english") (mapcar (lambda (ξpair) (vector (elt ξpair 1) (elt ξpair 0))) ξenglish-chinese-punctuation-map))
      (t (user-error "Your 3rd argument 「%s」 isn't valid" φto-direction))))))

This requires 2 elisp util from ErgoEmacs:

But you can rewrite to do without.

multi-pair find/replace are tremendously useful. For many other examples, see: Emacs Lisp Multi-Pair Find/Replace Applications.

Download Latest Version

You can download the latest version from https://code.google.com/p/ergoemacs/source/browse/packages/xah-misc-commands.el

Remove Punctuation Trailing Redundant Spaces

Here's helpful command to remove redundant spaces after punctuation.

(require 'xfrp_find_replace_pairs)
(require 'xeu_elisp_util)

(defun xah-remove-punctuation-trailing-redundant-space (φp1 φp2)
  "Remove redundant whitespace after punctuation.
Works on current block or text selection.

When called in emacs lisp code, the φp1 φp2 are cursor positions for region.

See also `xah-convert-english-chinese-punctuation'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
version 2015-02-04
"
  (interactive
   (let ((ξboundary (get-selection-or-unit 'block)))
     (list (elt ξboundary 1) (elt ξboundary 2))))
  (replace-regexp-pairs-region φp1 φp2
                               [
                                ;; clean up. Remove extra space.
                                [" +," ","]
                                [",  +" ", "]
                                ["?  +" "? "]
                                ["!  +" "! "]
                                ["\\.  +" ". "]

                                ;; fullwidth punctuations
                                [", +" ","]
                                ["。 +" "。"]
                                [": +" ":"]
                                ["? +" "?"]
                                ["; +" ";"]
                                ["! +" "!"]
                                ["、 +" "、"]
                                ]
                               "FIXEDCASE" "LITERAL"))

These commands are useful for Twitter too, for saving a few character in Twitter's character limit. Because, English punctuation takes 2 char each, while Chinese version needs just one char, the space is included in the punctuation symbol.

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus