Emacs Lisp Batch Text processing: Grep Find Replace Variations

By Xah Lee. Date: . Last updated: .

This page shows emacs lisp scripts that do variations of find/replace string. For example, i need a script that reports the position of a given string for 5 thousand files. Another example: replace all HTML page's <h1> tag text from its <title> tag text.

Problem: Report String Position

I need to know if a particular string happens in beginning of file or near the end. Ι need to know this for about 5k files in a directory.

Solution

;; -*- coding: utf-8 -*-
;; 2011-03-21
;; report the line number of a occurrences of string, of a given dir

(setq inputDir "~/web/ergoemacs_org/emacs/" )

;; add a ending slash if not there
(when (not (string= "/" (substring inputDir -1)))
  (setq inputDir (concat inputDir "/")))

(defun my-process-file (fPath)
  "process the file at fullpath fPath …"
  (let (myBuffer (ii 0) searchStr)

    (when (not (string-match "/xx" fPath)) ; skip dir starting with xx

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fPath nil nil nil t)

      (setq case-fold-search nil) ; NOTE: remember to set case sensitivity here

      (setq searchStr "<style>" )

      (goto-char 1)
      (while (search-forward searchStr nil t) ; NOTE: for regex, use re-search-forward
        (princ (format "this many: %d %s\n" (line-number-at-pos (point)) fPath)))

      (kill-buffer myBuffer))))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah occur output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")))

Modify the “inputDir” and “searchStr” above and test it on your own machine.

For explanation of this code, see: How to Write grep in Emacs Lisp.

Problem 2: Fix HTML “TITLE” and “H1” Tags

Today, while i working on my website, i noticed some HTML files are missing a “H1” header tag. While in another directory, i wish to replace all “TITLE” tag content by the one from “H1” tag.

So, i need a script that fix these tag's texts.

Solution

Here's a function that gets a file's “title” tag text.

(defun xah-html-get-html-file-title (fname)
  "Return FNAME <title> tag's text.
Assumes that the file contains the string
“<title>…</title>”."
  (with-temp-buffer
    (insert-file-contents fname nil nil nil t)
    (goto-char 1)
    (buffer-substring-no-properties
     (search-forward "<title>") (- (search-forward "</title>") 8))))

I also need to get the “H1” tag text. So i just quickly did a copy-paste coding:

(defun xah-html-get-html-file-h1 (fname)
  "Return fname <h1> tag's text.
Assumes that the file contains the string
“<h1>…</h1>”."
  (with-temp-buffer
    (insert-file-contents fname nil nil nil t)
    (goto-char 1)
    (buffer-substring-no-properties
     (search-forward "<h1>") (- (search-forward "</h1>") 5))))

It's not efficient to open file twice to get “title” and “h1” texts, but that's ok, because my whole script will finish running in a few seconds anyway and this is just one-time use.

Now, here's the code i wrote quickly to fix the tags:

;; -*- coding: utf-8 -*-
;; 2011-03-20
;; change title to h1 tag's text in “Time Machine” pages
;; 
;; for each HTML page in 〔~/web/xahlee_org/p/time_machine/〕
;; if the title tag and h1 tag text differ, make the title use h1's text

(setq inputDir "~/web/xahlee_org/p/time_machine/" ) ; dir must end with a slash

(defun my-process-file (fPath)
  "process the file at fullpath fPath …"
  (let ( titleText h1Text p1 p2)

    (setq h1Text (get-html-file-h1-text fPath))
    (setq titleText (get-html-file-title fPath))

    (if (equal h1Text titleText)
        nil
      (progn
        (find-file fPath )
        (goto-char 1)
        (search-forward "<title>" )
        (setq p1 (point) )

        (search-forward "</title>" )
        (backward-char 8)
        (setq p2 (point) )

        (delete-region p1 p2 )
        (insert h1Text)
        (print fPath)
        ))
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*process time machine output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )

Again, all the above script are variations of find replace. For code detail, see: How to Write grep in Emacs Lisp and Elisp: Find String Inside HTML Tag.

In this script, i didn't include code to save the changed file. This way, i can do some manual verification after the script has run. When i want them all saved, i just call ibuffer and type 3 keys 【* u S】 to have all of them saved, and 【D y】 closes them all.

[see Emacs: List Buffers]

Like my tutorial? Put $5 at patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboard for Emacs

Ask me question on patreon