Emacs Lisp: Convert Punctuation Between English/Chinese Forms

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows a emacs lisp code to convert to/from Chinese/English punctuations.

If you type Chinese or Japanese mixed with English, then often you'll have mixed Asian/Western punctuations, and is laborious to fix manually. Here's a code that will help fix it.

This is used to convert punctuation from English to/from Asian's full-width form. 〔➤ Unicode Full-Width Characters〕Example:

(defun xah-convert-english-chinese-punctuation (φp1 φp2 &optional φto-direction)
  "Convert punctuation from/to English/Chinese characters.

When called interactively, do current text block or selection. The conversion direction is automatically determined.

If `universal-argument' is called, ask user for change direction.

When called in lisp code, φp1 φp2 are region begin/end positions. φto-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」.

See also: `xah-remove-punctuation-trailing-redundant-space'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
Version 2015-04-13"
   (let ( ξp1 ξp2)
     (if (use-region-p)
           (setq ξp1 (region-beginning))
           (setq ξp2 (region-end)))
         (if (re-search-backward "\n[ \t]*\n" nil "move")
             (progn (re-search-forward "\n[ \t]*\n")
                    (setq ξp1 (point)))
           (setq ξp1 (point)))
         (if (re-search-forward "\n[ \t]*\n" nil "move")
             (progn (re-search-backward "\n[ \t]*\n")
                    (setq ξp2 (point)))
           (setq ξp2 (point)))))
      (if current-prefix-arg
           "Change to: "
           '( "english"  "chinese")
  (let (
        (ξinput-str (buffer-substring-no-properties φp1 φp2))
          [". " "。"]
          [".\n" "。\n"]
          [", " ","]
          [",\n" ",\n"]
          [": " ":"]
          ["; " ";"]
          ["? " "?"] ; no space after
          ["! " "!"]

          ;; for inside HTML
          [".</" "。</"]
          ["?</" "?</"]
          [":</" ":</"]

    (when (string= φto-direction "auto")
           (or (string-match "。" ξinput-str)
               (string-match "," ξinput-str)
               (string-match "?" ξinput-str)
               (string-match "!" ξinput-str))
        (narrow-to-region φp1 φp2)
         (lambda (ξx)
             (goto-char (point-min))
             (while (search-forward (aref ξx 0) nil "noerror")
               (replace-match (aref ξx 1)))))
          ((string= φto-direction "chinese") ξreplacePairs)
          ((string= φto-direction "english") (mapcar (lambda (ξpair) (vector (elt ξpair 1) (elt ξpair 0))) ξreplacePairs))
          (t (user-error "Your 3rd argument 「%s」 isn't valid" φto-direction))))))))

Remove Punctuation Trailing Redundant Spaces

Here's helpful command to remove redundant spaces after punctuation.

(require 'xah-replace-pairs)
(require 'xeu_elisp_util)

(defun xah-remove-punctuation-trailing-redundant-space (φp1 φp2)
  "Remove redundant whitespace after punctuation.
Works on current block or text selection.

When called in emacs lisp code, the φp1 φp2 are cursor positions for region.

See also `xah-convert-english-chinese-punctuation'.

URL `http://ergoemacs.org/emacs/elisp_convert_chinese_punctuation.html'
version 2015-02-04
   (let ((ξboundary (get-selection-or-unit 'block)))
     (list (elt ξboundary 1) (elt ξboundary 2))))
  (replace-regexp-pairs-region φp1 φp2
                                ;; clean up. Remove extra space.
                                [" +," ","]
                                [",  +" ", "]
                                ["?  +" "? "]
                                ["!  +" "! "]
                                ["\\.  +" ". "]

                                ;; fullwidth punctuations
                                [", +" ","]
                                ["。 +" "。"]
                                [": +" ":"]
                                ["? +" "?"]
                                ["; +" ";"]
                                ["! +" "!"]
                                ["、 +" "、"]
                               "FIXEDCASE" "LITERAL"))

These commands are useful for Twitter too, for saving a few character in Twitter's character limit. Because, English punctuation takes 2 char each, while Chinese version needs just one char, the space is included in the punctuation symbol.

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus