ErgoEmacsEmacsLispBlogEmacsLispBuy Tutorial
Web Hosting by 1&1

Emacs Lisp: Count Lines, Words, Chars

Xah Lee, , …,

A little elisp tip. In emacs 23, there's count-lines, but no command to count words or characters. Here's a short elisp i have been using since about 2006. It reports the number of words and chars in a text selection.

(defun my-count-words-region (posBegin posEnd)
  "Print number of words and chars in region."
  (interactive "r")
  (message "Counting …")
  (save-excursion
    (let (wordCount charCount)
      (setq wordCount 0)
      (setq charCount (- posEnd posBegin))
      (goto-char posBegin)
      (while (and (< (point) posEnd)
                  (re-search-forward "\\w+\\W*" posEnd t))
        (setq wordCount (1+ wordCount)))

      (message "Words: %d. Chars: %d." wordCount charCount)
      )))

This code is largely from Introduction to Programming in Emacs Lisp amazon by Robert J Chassell, when i was reading it sometimes in 2005. That tutorial is for people who never programed. It was quite frustrating to read, because for every sentence you are learning about emacs lisp, you have to scan some 20 pages of things you already know about programing, such as what's variables, assignment, syntax, etc. In the end, i didn't really read that book. This function is about the only thing i got out of it.

How It Works

Now let's explain how this function works.

The function has this skeleton:

(defun my-count-words-region (pos1 pos2)
  "…"
  (interactive "r")
  ;   )

This means, when you call the function with 【Meta+x】, the region beginning position (a integer) will be fed to your variable “pos1”, and region's end will be fed to the argument “pos2”, automatically. This is caused by the line (interactive "r").

The next part of the function is this:

(save-excursion
 (let (var1 var2 …))
 (setq var1 …)
 (setq var2 …)
 …
)

The let is lisp's way to have a block of local variables. The (save-excursion …) will run its body, then restore the cursor position and mark position. We need it because in the code we are going to move cursor around. When the command is finished, the cursor will remain where user started the command.

Now, to count the char, it is just the length of the beginning and ending position of the region. So, it is simple, like this:

(setq charCount (- posEnd posBegin))

Now, we move the char to beginning of region, like this: (goto-char posBegin). The next part count the words, like this:

(while (and (< (point) posEnd)
                  (re-search-forward "\\w+\\W*" posEnd t))
        (setq wordCount (1+ wordCount)))

The (< (point) posEnd) is for checking that the cursor havn't reached the end of region yet.

The (re-search-forward "\\w+\\W*" posEnd t) means keep moving the cursor forward by regex search a word pattern. The “posEnd” argument there means don't search beyond the end of region. And the “t” there means don't report error if no more found.

search-forward and re-search-forward are very important functions in elisp. I use them in all of my text processing scripts. If you are not familiar with them, lookup their inline doc (with describe-function).

So, the above “while” block, basically means keep moving the cursor and count words, until the cursor is at the end of region.

Finally, the program just print out the result, by:

(message "Words: %d. Chars: %d." wordCount charCount)

Exercise

Try to write a version so that, when there is a text selection, count word and char in text selection, but if there's no text selection, just count the current line. You might want to read Emacs Lisp Idioms (for writing interactive commands) to refresh your memory about emacs's tech meaning of “region”, “active region”, transient-mark-mode.

Note

The code shown on this page count words by emacs's syntax table. That is, the regex for word \\w+ is based on syntax table. (in emacs, each character is classified to be in certain category, defined by each major mode. This is called a syntax table. For example, the English alphabets are in the “word” class, punctuations characters are in “punctuation” class, etc.) (info "(elisp) Syntax Tables")

The disadvantage of syntax table is that, the result is unpredictable, dependent on what mode it currently is. For example, this file (at this moment), is 1325 words when in “Fundamental” mode, but 1316 words when in “text-mode”. (863 by unix “wc” command.)

Note: in emacs 24.x (alpha as of 2012-03-02), count-words is now built-in. Also based on emacs's syntax table.

blog comments powered by Disqus