Elisp: Syntax Color Source Code in HTML

By Xah Lee. Date: . Last updated: .

This page shows you how to write a emacs lisp command to syntax color computer language source code in HTML.


Write a command “htmlize-pre-block”. When called, it will syntax color the computer language source code under cursor.

For example, here's a elisp code snippet:

(if (< 3 2) (message "yes") )

Here's syntax colored version of raw HTML:

(<span class="keyword">if</span> (&lt; 3 2) (message <span class="string">"yes"</span>) )

Here's how it looks like in a web browser:

(if (< 3 2) (message "yes") )

There is a emacs package that transforms any colored text in emacs to HTML form. This is extremely nice. The package is named htmlize.el and is written by Hrvoje Niksic, available at http://fly.srk.fer.hr/~hniksic/emacs/htmlize.el.cgi. This package primarily gives you 3 new commands:

  1. htmlize-region. Output to a new buffer.
  2. htmlize-buffer. Output to a new buffer.
  3. htmlize-file. Takes a input file name, output to new file.

Here's what i need. I need a command “htmlize-pre-block”. When the cursor is inside a “pre” tag like this:

<pre class="‹lang_name›">

then, after calling “htmlize-pre-block”, the source code inside the tag will be syntax colored, that is: wrapped with appropriate “span” tags on the language's keywords.


There are many ways to solve this problem. Here's one way.

  1. Grab the text inside the <pre class="lang_name">…</pre> tag the cursor is in.
  2. Create a temp buffer. Insert the text in.
  3. Set the new buffer to a major mode corresponding to lang_name, and fontify it.
  4. Alt+x htmlize-buffer.
  5. From the htmlize-buffer output, grab the (htmlized) text inside <pre> tag.
  6. Kill the htmlize output buffer and my temp buffer.
  7. Delete the original text, insert in the htmlized text.

To achieve the above, i decided on 2 steps:

  1. Write a function “htmlize-string” that takes a string and mode name, and returns the htmlized string.
  2. Write a function “htmlize-pre-block” that does the steps of grabbing text, calls “htmlize-string”, then replace original text with the new.


Here's the code of “htmlize-string” function:

(defun htmlize-string (sourceCodeStr langModeName)
  "Take SOURCECODESTR and return a htmlized version using LANGMODENAME.
This function requries the htmlize.el by Hrvoje Niksic."
  (require 'htmlize)
  (let (htmlizeOutputBuf p1 p2 resultStr)

    ;; put code in a temp buffer, set the mode, fontify
      (insert sourceCodeStr)
      (funcall (intern langModeName))
      (setq htmlizeOutputBuf (htmlize-buffer))

    ;; extract the fontified source code in htmlize output
    (with-current-buffer htmlizeOutputBuf
      (setq p1 (search-forward "<pre>"))
      (setq p2 (search-forward "</pre>"))
      (setq resultStr (buffer-substring-no-properties (+ p1 1) (- p2 6))))

    (kill-buffer htmlizeOutputBuf)

The “htmlize-string” takes a string and a mode name, and returns a htmlized string.

First it creates a temp buffer by with-temp-buffer , then insert the string, set a major mode for the language it should be colored with, then Alt+x htmlize-buffer to generate the htmlized string. The return value of htmlize-buffer is the buffer of its output, which we set to “htmlizeOutputBuf”.

Now, we have a buffer object “htmlizeOutputBuf”. It contains the htmlized text. It is actually a complete HTML file like this: <html><head>…</head><body>…</body></html>. We want to grab part of the text that is the htmlized source code. (that is, excluding the usual HTML header and footer)

We call with-current-buffer and extract text between <pre>…</pre> tags. The first argument to with-current-buffer is a buffer object or buffer name. Then, emacs will use that buffer as the current buffer.

Emacs's buffer related functions can often take a argument that is either a buffer name (of type “string”) or a buffer object itself (of type “buffer”).

(info "(elisp) Buffers")

We extract text by buffer-substring-no-properties. Emacs's string can contain information called “text properties”, which contains info such as coloring for the text. To grab a string in a buffer, you can use buffer-substring or buffer-substring-no-properties. Most emacs commands that take a string as argument can accept string with or without properties.

[see Elisp: Text Properties]


Here's the code of “htmlize-pre-block” function:

(defun htmlize-pre-block ()
  "Replace text enclosed by <pre> tag to htmlized code.
For example, if the cursor is somewhere between the pre tags:
 <pre class=\"lang-code\">…▮…</pre>

after calling, the text inside the pre tag will be htmlized.
That is, wrapped with many span tags.

The opening tag must be of the form <pre class=\"lang-code\">.
The “lang-code” determines what emacs mode is used to colorize the

 “lang-code” can be any of {c, elisp, java, JavaScript, html, xml, css, …}.
 (See source code for a full list)

See also: `dehtmlize-pre-block'.

This function requires htmlize.el by Hrvoje Niksic."
  (let (inputStr langCode p1 p2 modeName
       ("ahk" . "ahk-mode")
       ("bash" . "sh-mode")
       ("bbcode" . "xbbcode-mode")
       ("c" . "c-mode")
       ("cl" . "lisp-mode")
       ("clojure" . "clojure-mode")
       ("cmd" . "dos-mode")
       ("css" . "css-mode")
       ("elisp" . "emacs-lisp-mode")
       ("haskell" . "haskell-mode")
       ("html" . "html-mode")
       ("xml" . "sgml-mode")
       ("html6" . "html6-mode")
       ("java" . "java-mode")
       ("javascript" . "js-mode")
       ("js" . "js-mode")
       ("lsl" . "xlsl-mode")
       ("ocaml" . "tuareg-mode")
       ("org" . "org-mode")
       ("perl" . "cperl-mode")
       ("php" . "php-mode")
       ("povray" . "pov-mode")
       ("powershell" . "powershell-mode")
       ("python" . "python-mode")
       ("ruby" . "ruby-mode")
       ("scala" . "scala-mode")
       ("scheme" . "scheme-mode")
       ("vbs" . "visual-basic-mode")
       ("visualbasic" . "visual-basic-mode")
       ) ))

      (re-search-backward "<pre class=\"\\([-A-Za-z0-9]+\\)\"") ; tag begin position
      (setq langCode (match-string 1))
      (setq p1 (search-forward ">")) ; lang source code string begin
      (search-forward "</pre>")
      (setq p2 (search-backward "<")) ; lang source code string end
      (search-forward "</pre>") ; tag end position
      (setq inputStr (buffer-substring-no-properties p1 p2))

      (setq modeName
            (let ((tempVar (assoc langCode langModeMap) ))
              (if tempVar (cdr tempVar) "text-mode" ) ) )

      (delete-region p1 p2)
      (goto-char p1)
      (insert (htmlize-string inputStr modeName)) ) ) )

The function first sets up a map of langCode to major mode name, like this:

  ("ahk" . "ahk-mode")
  ("bash" . "sh-mode")
  ("bbcode" . "xbbcode-mode")
  ("c" . "c-mode")
  ("cl" . "lisp-mode")
  ("clojure" . "clojure-mode")
  ("cmd" . "dos-mode")

This is called a association list, or sometimes known as keyed list, dictionary. To get a item, you can use assoc. See: (info "(elisp) Association Lists").

Then, it grabs the text inside the <pre> block in the current buffer, by using search functions for beginning/end of “pre”, and set positions p1 p2, then use buffer-substring-no-properties to grab the text.

In the above, the langCode is also set from the regex match in re-search-backward.

Then, we get the major mode name for that langCode, by:

(setq modeName
      (let ((tempVar (assoc langCode langModeMap) ))
        (if tempVar (cdr tempVar) "text-mode" ) ) )

Once we know what major mode to use, then we call “htmlize-string” to get the htmlized text. We just delete the original text and insert the new one there. Like this:

(delete-region p1 p2)
(goto-char p1)
(insert (htmlize-string inputStr modeName))

Emacs ♥

Setting Up htmlize.el and CSS

Note: quote from htmlize.el's header documentation:

htmlize supports three types of HTML output, selected by setting “htmlize-output-type”: “css”, “inline-css”, and “font”. … “css” mode is the default.

My functions “htmlize-pre-block” and “htmlize-string” assumes you are using the CSS mode output. This means, you'll have to do a one-time manual process of taking the CSS code generated by the htmlized output and place it in your own HTML page to reference it. You can use my CSS code for language here: elisp_htmlize_css_code.css.

If your HTML is in Unicode UTF-8 encoding, you might add the following to your emacs init file:

(setq htmlize-convert-nonascii-to-entities nil)
(setq htmlize-html-charset "utf-8")

They will prevent htmlize creating ugly HTML entities. For example, if you have a bullet char “•” (Unicode U+2022), you will see the character as is instead of &#x2022.

If you are not familiar with {HTML, CSS}, see:

Dehtmlize Text

The raw HTML of htmlized language code is usually unreadable. For example, here's 2 lines of OCaml language code:

let myComposition f g = (fun x -> f (g x) );;
myComposition (fun x -> x ^ "c") (fun x -> x ^ "b") "a";;

Here's its htmlized version:

<span class="tuareg-font-lock-governing">let</span> <span class="function-name">myComposition</span><span class="variable-name"> f g </span><span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-operator">(</span><span class="keyword">fun</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">-&gt;</span> f <span class="tuareg-font-lock-operator">(</span>g x<span class="tuareg-font-lock-operator">)</span> <span class="tuareg-font-lock-operator">);;</span>
myComposition <span class="tuareg-font-lock-operator">(</span><span class="keyword">fun</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">-&gt;</span> x <span class="tuareg-font-lock-operator">^</span> <span class="string">"c"</span><span class="tuareg-font-lock-operator">)</span> <span class="tuareg-font-lock-operator">(</span><span class="keyword">fun</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">-&gt;</span> x <span class="tuareg-font-lock-operator">^</span> <span class="string">"b"</span><span class="tuareg-font-lock-operator">)</span> <span class="string">"a"</span><span class="tuareg-font-lock-operator">;;</span>

Suppose you want to modify the OCaml code in your blog. Usually, you switch to browser, copy the code, switch back to emacs, create a new buffer, paste the code to edit it. When done, you copy it, close temp buffer, delete the htmlized version on your blog, paste the new in, then htmlize it again. This process is painful.

It would be nice, if you can press a button, then the htmlized source code in your HTML will become plain. So you can modify it. Press a button again to have it htmlized again.

Here's the code of “dehtmilze-pre-block”:

(defun dehtmlize-pre-block (p1 p2)
  "Delete span tags between pre tags.
For example, if the cursor is somewhere between the tags:
<pre class=\"…\">…▮…</pre>

after calling, all span tags inside the block will be removed.
If there's a text selection, dehtmlize that region.

Note: only span tags of the form 「<span class=\"…\">…</span>」 are deleted.

This command does the reverse of `htmlize-pre-block'."
   (if (use-region-p)
       (list (region-beginning) (region-end))
     (let (p3 p4)
         (re-search-backward "<pre class=\"\\([-A-Za-z0-9]+\\)\"")
         (setq p3 (re-search-forward ">")) ; code begin position
         (re-search-forward "</pre>")
         (setq p4 (- (point) 6)) ; code end position
         (list p3 p4 )) ) ) )
  (dehtmlize-span-region p1 p2)
(defun dehtmlize-span-region (p1 p2)
  "Delete HTML “span” tags in region.
Note: only span tags of the form 「<span class=\"…\">…</span>」 are deleted."
  (interactive "r")
      (narrow-to-region p1 p2)
      (replace-regexp-pairs-region (point-min) (point-max) '(["<span class=\"[^\"]+\">" ""]))
      (replace-pairs-region (point-min) (point-max) '( ["</span>" ""] ["&amp;" "&"] ["&lt;" "<"] ["&gt;" ">"] ) ) ) ) )

Set the Code to a File

If you have a pre block:

<pre class="python">

Wouldn't it be nice, by pressing a button, then a plain source code content is moved into a temp file 〔xx-temp-‹randomstr›.py〕 in a split buffer?

For latest version of these code, see Emacs: Xah HTML Mode.

JavaScript Solution

Google has a open source technology that uses JavaScript to color code in HTML on the fly instead of using the bulky markup. For detail, see: Syntax Coloring with Google-Code-Prettify.

Patreon me $5 patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboard for Emacs

If you have a question, put $5 at patreon and message me.