ErgoEmacsEmacsLispBlogEmacsLispBuy Tutorial
Web Hosting by 1&1

Emacs Lisp: Convert Punctuation Between English/Chinese Forms

Xah Lee,

The following command transform text under cursor, between 2 forms.

(defun convert-english-chinese-punctuation (p1 p2 &optional ξ-to-direction)
  "Replace punctuation from/to English/Chinese Unicode symbols.

When called interactively, do current text block (paragraph) or text selection. The conversion direction is automatically determined.

If `universal-argument' is called:

 no C-u → Automatic.
 C-u → to English
 C-u 1 → to English
 C-u 2 → to Chinese

When called in lisp code, p1 p2 are region begin/end positions. ξ-to-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」.

See also: `remove-punctuation-trailing-redundant-space'."
  (interactive
   (let ( (bds (get-selection-or-unit 'block)))
     (list (elt bds 1) (elt bds 2)
           (cond
            ((equal current-prefix-arg nil) "auto")
            ((equal current-prefix-arg '(4)) "english")
            ((equal current-prefix-arg 1) "english")
            ((equal current-prefix-arg 2) "chinese")
            (t "chinese")
            )
           ) ) )
  (let ((ξ-english-chinese-punctuation-map
         [
          [". " "。"]
          [".\n" "。\n"]
          ["," ","]
          [": " ":"]
          ["; " ";"]
          ["?" "?"] ; no space after
          ["! " "!"]

          ;; for inside HTML
          [".</" "。</"]
          ["?</" "?</"]
          [":</" ":</"]
          ]
         ))

    (replace-pairs-region p1 p2
                              (cond
                               ((string= ξ-to-direction "chinese") ξ-english-chinese-punctuation-map)
                               ((string= ξ-to-direction "english") (mapcar (lambda (ξpair) (vector (elt ξpair 1) (elt ξpair 0))) ξ-english-chinese-punctuation-map))
                               ((string= ξ-to-direction "auto")
                                (if (string-match "。\\|\\|\\|\\|:" (buffer-substring-no-properties p1 p2))
                                    (mapcar (lambda (ξpair) (vector (elt ξpair 1) (elt ξpair 0))) ξ-english-chinese-punctuation-map)
                                  ξ-english-chinese-punctuation-map
                                  ))

                               (t (error "Your 3rd argument 「%s」 isn't valid." ξ-to-direction)) ) ) ) )

This is used to convert punctuation from English to/from Asian form. example

If you type Chinese, Japanese, then this will be useful. Else, you might checkout the emacs lisp techniques.

This requires 2 elisp util from ErgoEmacs: 〔xfrp_find_replace_pairs.el〕 〔☛ Emacs Lisp: Multi-Pair String Replacement Function〕 and 〔xeu_elisp_util.el〕 〔☛ Emacs Lisp: get-selection-or-unit〕. But you can rewrite to do without.

multi-pair find/replace are tremendously useful. For many other examples, see: Emacs Lisp Multi-Pair Find/Replace Applications.

Remove Punctuation Trailing Redundant Spaces

This command is helpful even for just English text. But in particular, it's useful after you converted from English to Chinese punctuation, because Chinese punctuation should not have any trailing space, by convention.

(defun remove-punctuation-trailing-redundant-space (p1 p2)
  "Remove redundant whitespace after punctuation.
Works on current block or text selection.

When called in emacs lisp code, the p1 p2 are cursor positions for region.

See also `convert-english-chinese-punctuation'."
  (interactive
   (let ( (bds (get-selection-or-unit 'block)))
     (list (elt bds 1) (elt bds 2) ) ) )
  (replace-regexp-pairs-region p1 p2
                               [
                                ;; clean up. Remove extra space.
                                [",  +" ", "]
                                [".   +" ". "]

                                [", +" ","]
                                ["。 +" "。"]
                                [": +" ":"]
                                ["? +" "?"]
                                ["; +" ";"]
                                ["! +" "!"]
                                ["、 +" "、"]
                                ]
                               "FIXEDCASE" "LITERAL") )

These commands are useful for Twitter too. Because, English punctuation takes 2 char each, while Chinese version needs just one char.

blog comments powered by Disqus