Emacs Lisp: Convert Punctuation Between English/Chinese Forms

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp code to convert to/from Chinese/English punctuations.

If you type Chinese or Japanese mixed with English, then often you'll have mixed Asian/Western punctuations, and is laborious to fix manually. Here's a code that will help fix it.

This is used to convert punctuation from English to/from Asian's full-width form. 〔➤ Unicode Full-Width Characters〕Example:

(defun xah-convert-english-chinese-punctuation (p1 p2 &optional ε-to-direction)
  "Convert punctuation from/to English/Chinese Unicode symbols.

When called interactively, do current text block (paragraph) or text selection. The conversion direction is automatically determined.

If `universal-argument' is called:

 no C-u → Automatic.
 C-u → to English
 C-u 1 → to English
 C-u 2 → to Chinese

When called in lisp code, p1 p2 are region begin/end positions. ε-to-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」.

See also: `xah-remove-punctuation-trailing-redundant-space'."
  (interactive
   (let ( (bds (get-selection-or-unit 'block)))
     (list (elt bds 1) (elt bds 2)
           (cond
            ((equal current-prefix-arg nil) "auto")
            ((equal current-prefix-arg '(4)) "english")
            ((equal current-prefix-arg 1) "english")
            ((equal current-prefix-arg 2) "chinese")
            (t "chinese")
            )
           ) ) )
  (let (
        (inputStr (buffer-substring-no-properties p1 p2))
        (ξ-english-chinese-punctuation-map
         [
          [". " "。"]
          [".\n" "。\n"]
          [", " ","]
          [": " ":"]
          ["; " ";"]
          ["? " "?"] ; no space after
          ["! " "!"]

          ;; for inside HTML
          [".</" "。</"]
          ["?</" "?</"]
          [":</" ":</"]
          ]
         ))

    (when (string= ε-to-direction "auto")
      (if 
          (or (string-match "。" inputStr)
              (string-match "," inputStr)
              (string-match "?" inputStr)
              (string-match "!" inputStr)
              )          ;; (or (string-match ", " inputStr)
          ;;     (string-match ".  " inputStr)
          ;;     (string-match "! " inputStr)
          ;;     (string-match "? " inputStr)
          ;;     (string-match ". " inputStr)
          ;;     )
          (setq ε-to-direction "english")
        (setq ε-to-direction "chinese")
        ))

    (replace-pairs-region
     p1 p2
     (cond
      ((string= ε-to-direction "chinese") ξ-english-chinese-punctuation-map)
      ((string= ε-to-direction "english") (mapcar (lambda (ξpair) (vector (elt ξpair 1) (elt ξpair 0))) ξ-english-chinese-punctuation-map))
      (t (user-error "Your 3rd argument 「%s」 isn't valid" ε-to-direction)) )
     ) ) )

This requires 2 elisp util from ErgoEmacs:

But you can rewrite to do without.

multi-pair find/replace are tremendously useful. For many other examples, see: Emacs Lisp Multi-Pair Find/Replace Applications.

Download Latest Version

You can download the latest version from https://code.google.com/p/ergoemacs/source/browse/packages/xah-misc-commands.el

Remove Punctuation Trailing Redundant Spaces

Here's helpful command to remove redundant spaces after punctuation.

(defun xah-remove-punctuation-trailing-redundant-space (p1 p2)
  "Remove redundant whitespace after punctuation.
Works on current block or text selection.

When called in emacs lisp code, the p1 p2 are cursor positions for region.

See also `xah-convert-english-chinese-punctuation'."
  (interactive
   (let ( (bds (get-selection-or-unit 'block)))
     (list (elt bds 1) (elt bds 2) ) ) )
  (replace-regexp-pairs-region p1 p2
                               [
                                ;; clean up. Remove extra space.
                                [" +," ","]
                                [",  +" ", "]
                                ["?  +" "? "]
                                ["!  +" "! "]
                                ["\\.  +" ". "]

;; fullwidth punctuations
                                [", +" ","]
                                ["。 +" "。"]
                                [": +" ":"]
                                ["? +" "?"]
                                ["; +" ";"]
                                ["! +" "!"]
                                ["、 +" "、"]
                                ]
                               "FIXEDCASE" "LITERAL") )

These commands are useful for Twitter too, for saving a few character in Twitter's character limit. Because, English punctuation takes 2 char each, while Chinese version needs just one char, the space is included in the punctuation symbol.

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus