Elisp: Text Processing: HTML Markup Elisp Functions

By Xah Lee. Date: . Last updated: .

In my emacs tutorial, i want to put HTML markup on elisp functions. For example, if i've written:

<p>Call “sort-lines” to sort.</p>

I want it to be like this:

<p>Call <var class="εf">sort-lines</var> to sort.</p>

This way, i can write a JavaScript that pops up documentation when mouse hovers on the function name.

I already have 300 HTML files of emacs/lisp tutorial. How to fix them?

One way is to write a elisp script that go thru all files. Another way is to write a interactive command that can be used on a case-by-case basis. I experimented with both, and decided to use the latter approach.

Here's the command:

(defun curly-quotes-to-emacs-function-tag (p1 p2)
  "Replace “word” to HTML markup for elisp function in text selection or current buffer (respects `narrow-to-region').

For example, the text:
 <p>Call “sort-lines” to sort.</p>
 <p>Call <var class=\"εf\">sort-lines› to sort.</p>

Note: a word is changed only if all of the following are true:

① It is enclosed in <p> tag, or <ul>, <ol>, <table>, <figcaption>. (For example, not inside <h1> or <title>, <a>, or other tags.)
② It is enclosed in “double curly quotes”.
③ `fboundp' returns true.

This command assumes that all tags are closed in your HTML. For example: <p> must be closed with </p>.

This command also makes a report of changed items.

Some issues:

• If the lisp functions name is less than 2 chars, it won't be tagged. For example: + - 1+ ….

• Only words contaning lowercase a to z, 0-9, or hyphen, are checked, even though elisp identifier allows many other chars. For example: “yas/reload-all”, “Info-copy-current-node-name” (note capital letter).

• Some words are common in other lang, for example, “while”, “print”, “string”, unix's “find”, “grep”, HTML's “kbd” tag, etc. But they are also built-in elisp symbols. So, you may not want to tag them.

• Personal emacs functions will also be tagged. You may not want them to be because they are not standard functions.

• Some functions are from 3rd party libs, and some are not bundled with GNU emacs , for example, 「'cl」, 「'htmlize」. They may or may not be tagged depending whether they've been loaded."
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (list (point-min) (point-max)) ) )
  (require 'sgml-mode) ; from html-mode, needs sgml-skip-tag-forward
  (let (p3 p4 mStr ($i 0) (case-fold-search nil) )
        (narrow-to-region p1 p2)
        (goto-char (point-min))
        (while (re-search-forward "<p>\\|<ul>\\|<ol>\\|<table\\|<figcaption>" nil t)
          (setq p3 (point) )
          (sgml-skip-tag-forward 1)
          (setq p4 (point) )

            (narrow-to-region p3 p4)
            (goto-char (point-min))
            (while (re-search-forward "“\\([-a-z0-9]+\\)”" (point-max) t)
              (setq mStr (match-string 1) )

              (when (and (fboundp (intern mStr))
                         (> (length mStr) 2))
                (replace-match (concat "<var class=\"εf\">" mStr "›") t t)
                (setq $i (1+ $i) )
                ) ) ) )
        (when (> $i 0)
          (occur "<var class=\"εf\">[-a-z0-9]+›" )) ) ) ))

With this code, i can just press a button, and the whole buffer will be so marked. Or, i can select a region of text, press a button, and have that part marked, with a report of the changes (in a second pane).

This still means i have to manually go thru my 300 existing files. The thing is, a batch script that fix all 300 files would not be accurate. For example, many words are also other language's keywords. There is no way for the script to really know unless it has strong AI. For example, on this page: Emacs Lisp Idioms, i have this passage:

Note that, when the thing is a “symbol”, it usually means any alphanumeric sequence with dash “-” or underscore “_” characters. For example, if you are writing PHP reference lookup command, and the cursor is on p in print_r($y);, you want to grab the whole “print_r” not just “print”. The exact meaning of symbol depends on current major mode's Syntax Table.

Notice it contains the string “print” there. The print there refers to PHP's function “print”, and shouldn't be marked as a elisp function.

Still, there are many way to help fix all the 300 files. The good thing about emacs i love is that everything can be done in a incremental, interactive way. So, i start by having this command. This command basically is a semi-automatic way. With this command, in 20 minutes, i could have fixed 20 pages, and i would have learned any complexities of the task. (for example, i learned, that if the elisp function appears inside <title> or <h1> tags, then they shouldn't be marked.) So, i work and modify the command as i go. By the time the command is in good shape as it is now, i've already fixed some 100 pages. Then, if i want, i can modify this command into a batch script. (by now i've fixed basically all pages, maybe 30 left.)

PS: If you are wondering why do i use a Greek ε as the starting character in my HTML class attribute. I use it because it makes it more distinctive, as a more unique string sequence. [see Programing Style: Variable Naming: English Words Considered Harmful] [see Using Unicode in HTML Attributes]

Liket it? Put $1 at patreon. Or Buy Xah Emacs Tutorial. Thanks.