Elisp: Text Processing, Transforming Page Tag

By Xah Lee. Date: . Last updated: .

This page shows a example of using emacs lisp for text processing. It is used to update HTML page's navigation bar.

Problem

You have hundreds of HTML pages that have a nav bar like this:

<div class="pages">Goto Page:
<a href="1.html">1</a>,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>

It looks like this in browser (with CSS):

page tag 1

This is the page navigation bar. Note that the page contains a link to itself.

You want to remove the self-link. The result should look like this:

<div class="pages">Goto Page:
1,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>
page tag 2

Solution

Here are the steps we need to do for each file:

  1. open the file.
  2. move cursor to the beginning of page navigation string.
  3. move cursor to file name.
  4. call sgml-delete-tag to remove the anchor tag. (sgml-delete-tag is from html-mode)
  5. save file.
  6. close buffer.

We begin by writing a test code to process a single file.

(defun my-process-file-navbar (fPath)
  "Modify the HTML file at fPath."
  (let (fName myBuffer)
    (setq fName (file-name-nondirectory fPath))
    (setq myBuffer (find-file fPath))
    (widen) ; in case buffer already open, and narrow-to-region is in effect
    (goto-char 1)
    (search-forward "<div class=\"pages\">Goto Page:")
    (search-forward fName)
    (sgml-delete-tag 1)
    (save-buffer)
    (kill-buffer myBuffer)))

(my-process-file-navbar "~/test1.html")

For testing, create files {test1.html, test2.html, test3.html} in a temp directory for testing this code. Place the following content into each file:

<div class="pages">Goto Page: <a href="test1.html">XYZ Overview</a>, <a href="test2.html">Second Page</a>, <a href="test3.html">Summary Z</a></div>

(note that the link text may not be 1, 2, 3.)

The elisp code above is very basic.

sgml-delete-tag is from html-mode (which is automatically loaded when a HTML file is opened).

sgml-delete-tag deletes the opening/closing tags tags the cursor is on.

All we need to do now is to feed it a bunch of file paths.

To get the list of files that contains the page-nav tag, we can simply use linux's “find” and “grep”, like this:

find . -name "*\.html" -exec grep -l '<div class="pages">' {} \;

From the output, we can use string-rectangle and query-replace, to construct the following code:

(mapc 'my-process-file-navbar
      [
       "~/web/cat1html"
       "~/web/dog.html"
       "~/web/something.html"
       "~/web/xyz.html"
       ]
      )

The mapc is a lisp idiom of looping thru a list or vector. The first argument is a function. The function will be applied to every element in the list. The single quote in front of the function is necessary. It prevents the symbol “my-process-file-navbar” from being evaluated (as a expression of a variable).

Emacs ♥

Elisp Script Examples

  1. Write grep in Elisp
  2. Find String Inside HTML Tag
  3. Validate Matching Brackets
  4. Generate Links Report
  5. Generate Sitemap
  6. Archive Website For Reader Download
  7. Process File line-by-line
  8. Text-Soup Automation
  9. Split HTML Annotation
  10. Fixing Dead Links
  11. Elisp vs Perl: Validate Local File Links
  12. Transform Page Tag
  13. Transform HTML FAQ Tags
  14. Transform HTML Tags
  15. “figure” to “figcaption”
  16. “span.w” to “b”

If you have a question, put $5 at patreon and message me.
Or Buy Xah Emacs Tutorial
Or buy a nice keyboard: Best Keyboards for Emacs

Emacs

Emacs Lisp

Misc