Emacs Lisp: Convert Punctuation Between English/Chinese Forms

By Xah Lee. Date: . Last updated: .

This page shows a emacs lisp code to convert to/from Chinese/English punctuations.

If you type Chinese or Japanese mixed with English, then often you'll have mixed Asian/Western punctuations, and is laborious to fix manually. Here's a code that will help fix it.

This is used to convert punctuation from English to/from Asian's full-width form. 〔➤see Unicode Full-Width Characters〕 Example:

(defun xah-convert-english-chinese-punctuation (*begin *end &optional *to-direction)
  "Convert punctuation from/to English/Chinese characters.

When called interactively, do current line or selection. The conversion direction is automatically determined.

If `universal-argument' is called, ask user for change direction.

When called in lisp code, *begin *end are region begin/end positions. *to-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」.

See also: `xah-remove-punctuation-trailing-redundant-space'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
Version 2015-10-05"
  (interactive
   (let (-p1 -p2)
     (if (use-region-p)
         (progn
           (setq -p1 (region-beginning))
           (setq -p2 (region-end)))
       (progn
         (setq -p1 (line-beginning-position))
         (setq -p2 (line-end-position))))
     (list
      -p1
      -p2
      (if current-prefix-arg
          (ido-completing-read
           "Change to: "
           '( "english"  "chinese")
           "PREDICATE"
           "REQUIRE-MATCH")
        "auto"
        ))))
  (let (
        (-input-str (buffer-substring-no-properties *begin *end))
        (-replacePairs
         [
          [". " "。"]
          [".\n" "。\n"]
          [", " ","]
          [",\n" ",\n"]
          [": " ":"]
          ["; " ";"]
          ["? " "?"] ; no space after
          ["! " "!"]

          ;; for inside HTML
          [".</" "。</"]
          ["?</" "?</"]
          [":</" ":</"]
          [" " " "]
          ]
         ))

    (when (string= *to-direction "auto")
      (setq
       *to-direction
       (if
           (or
            (string-match " " -input-str)
            (string-match "。" -input-str)
            (string-match "," -input-str)
            (string-match "?" -input-str)
            (string-match "!" -input-str))
           "english"
         "chinese")))
    (save-excursion
      (save-restriction
        (narrow-to-region *begin *end)
        (mapc
         (lambda (-x)
           (progn
             (goto-char (point-min))
             (while (search-forward (aref -x 0) nil "noerror")
               (replace-match (aref -x 1)))))
         (cond
          ((string= *to-direction "chinese") -replacePairs)
          ((string= *to-direction "english") (mapcar (lambda (x) (vector (elt x 1) (elt x 0))) -replacePairs))
          (t (user-error "Your 3rd argument 「%s」 isn't valid" *to-direction))))))))

Remove Punctuation Trailing Redundant Spaces

Here's helpful command to remove redundant spaces after punctuation.

(defun xah-remove-punctuation-trailing-redundant-space (*begin *end)
  "Remove redundant whitespace after punctuation.
Works on current line or text selection.

When called in emacs lisp code, the *begin *end are cursor positions for region.

See also `xah-convert-english-chinese-punctuation'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
version 2015-08-22"
  (interactive
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (list (line-beginning-position) (line-end-position))))
  (require 'xah-replace-pairs)
  (xah-replace-regexp-pairs-region
   *begin *end
   [
    ;; clean up. Remove extra space.
    [" +," ","]
    [",  +" ", "]
    ["?  +" "? "]
    ["!  +" "! "]
    ["\\.  +" ". "]

    ;; fullwidth punctuations
    [", +" ","]
    ["。 +" "。"]
    [": +" ":"]
    ["? +" "?"]
    ["; +" ";"]
    ["! +" "!"]
    ["、 +" "、"]
    ]
   "FIXEDCASE" "LITERAL"))

These commands are useful for Twitter too, for saving a few character in Twitter's character limit. Because, English punctuation takes 2 char each, while Chinese version needs just one char, the space is included in the punctuation symbol.

Like it? Buy Xah Emacs Tutorial. Thanks.

or, buy something from my keyboard store.