Elisp: Text Processing, Transforming Page Tag

By Xah Lee. Date: . Last updated: .

This page shows a example of using emacs lisp for text processing. It is used to update HTML page's navigation bar.

Problem

You have hundreds of HTML pages that have a nav bar like this:

<div class="pages">Goto Page:
<a href="1.html">1</a>,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>

It looks like this in browser (with CSS):

page tag 1

This is the page navigation bar. Note that the page contains a link to itself.

You want to remove the self-link. The result should look like this:

<div class="pages">Goto Page:
1,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>
page tag 2

Solution

Here are the steps we need to do for each file:

  1. open the file.
  2. move cursor to the beginning of page navigation string.
  3. move cursor to file name.
  4. call sgml-delete-tag to remove the anchor tag. (sgml-delete-tag is from html-mode)
  5. save file.
  6. close buffer.

We begin by writing a test code to process a single file.

(defun my-process-file-navbar (fPath)
  "Modify the HTML file at fPath."
  (let (fName myBuffer)
    (setq fName (file-name-nondirectory fPath))
    (setq myBuffer (find-file fPath))
    (widen) ; in case buffer already open, and narrow-to-region is in effect
    (goto-char (point-min))
    (search-forward "<div class=\"pages\">Goto Page:")
    (search-forward fName)
    (sgml-delete-tag 1)
    (save-buffer)
    (kill-buffer myBuffer)))

(my-process-file-navbar "~/test1.html")

For testing, create files {test1.html, test2.html, test3.html} in a temp directory for testing this code. Place the following content into each file:

<div class="pages">Goto Page: <a href="test1.html">XYZ Overview</a>, <a href="test2.html">Second Page</a>, <a href="test3.html">Summary Z</a></div>

(note that the link text may not be 1, 2, 3.)

The elisp code above is very basic.

find-file
Open file. (info "(elisp) Files")
search-forward
Move cursor. (info "(elisp) Buffers")
kill-buffer
Close buffer. (info "(elisp) Searching and Matching").

sgml-delete-tag is from html-mode (which is automatically loaded when a HTML file is opened).

sgml-delete-tag deletes the opening/closing tags tags the cursor is on.

All we need to do now is to feed it a bunch of file paths.

To get the list of files that contains the page-nav tag, we can simply use linux's “find” and “grep”, like this:

find . -name "*\.html" -exec grep -l '<div class="pages">' {} \;

From the output, we can use string-rectangle and query-replace, to construct the following code:

(mapc 'my-process-file-navbar
      [
       "~/web/cat1html"
       "~/web/dog.html"
       "~/web/something.html"
       "~/web/xyz.html"
       ]
      )

The mapc is a lisp idiom of looping thru a list or vector. The first argument is a function. The function will be applied to every element in the list. The single quote in front of the function is necessary. It prevents the symbol “my-process-file-navbar” from being evaluated (as a expression of a variable).

Emacs ♥

If you have a question, put $5 at patreon and message me on xah discord.
Or support me by Buy Xah Emacs Tutorial

Emacs Tutorial

Emacs Init

Emacs Keys

ELisp

ELisp Examples

ELisp Write Major Mode


ELisp Examples

Xah Commands

Text Transform Under Cursor

Commands Do thing-at-point

Command to Insert Things

Script Examples

Misc