ErgoEmacsEmacsLispBlogEmacsLispBuy Tutorial
Web Hosting by 1&1

Emacs Lisp Batch Text processing: Grep Find Replace Variations

Xah Lee,

This page shows emacs lisp scripts that do variations of grep/find/replace string. For example, i need a script that reports the position of a given string for 5 thousand files. Another example: replace all HTML page's “H1” tag text from its “TITLE” tag text. If you don't know elisp, first take a look at Emacs Lisp Basics.

Problem: Report String Position

I need to know if a particular string happens in beginning of file or near the end. Ι need to know this for about 5k files in a dir.

Solution

;; -*- coding: utf-8 -*-
;; 2011-03-21
;; report the position (line number) of a occurrences of string, of a given dir

(setq inputDir "~/web/xahlee_org/" )

;; add a ending slash if not there
(when (not (string= "/" (substring inputDir -1) ))
  (setq inputDir (concat inputDir "/") )
  )

(defun my-process-file (fPath)
  "process the file at fullpath fPath …"
  (let (myBuffer (ii 0) searchStr)

    (when (not (string-match "/xx" fPath))

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fPath nil nil nil t)

      (setq case-fold-search nil) ; NOTE: remember to set case sensitivity here

      (setq searchStr "<div class=\"amz728x90\">" )

      (goto-char 1)
      (while (search-forward searchStr nil t) ; NOTE: for regex, use re-search-forward
          (princ (format "this many: %d %s\n" (line-number-at-pos (point)) fPath))
        )
      
      (kill-buffer myBuffer)
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah occur output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
  (princ "Done deal!")
    )
  )

You can modify the “inputDir” and “searchStr” above and test it on your own machine.

For explanation of this code, see: How to Write grep in Emacs Lisp.

Problem 2: Fix HTML “TITLE” & “H1” Tags

Today, while i working on my website, i noticed some HTML files are missing a “H1” header tag. While in another directory, i wish to replace all “TITLE” tag content by the one from “H1” tag.

So, i need a script that fix these tag's texts.

Solution

Here's a function that gets a file “title” tag text. I wrote this about a year ago.

(defun get-html-file-title (fName)
"Return FNAME <title> tag's text.
Assumes that the file contains the string
“<title>…</title>”."
 (let (x1 x2 linkText)

   (with-temp-buffer
     (goto-char 1)
     (insert-file-contents fName nil nil nil t)

     (setq x1 (search-forward "<title>"))
     (search-forward "</title>")
     (setq x2 (search-backward "<"))
     (buffer-substring-no-properties x1 x2)
     )
   ))

I also need to get the “H1” tag text. So i just quickly did a copy-paste coding:

(defun get-html-file-h1-text (fName)
  "Return FNAME <h1> tag's text.
Assumes that the file contains the string
“<h1>…</h1>”."
  (let (x1 x2 linkText)

    (with-temp-buffer
      (goto-char 1)
      (insert-file-contents fName nil nil nil t)

      (setq x1 (search-forward "<h1>"))
      (search-forward "</h1>")
      (setq x2 (search-backward "<"))
      (buffer-substring-no-properties x1 x2)
      )
    ))

It's not efficient to open file twice to get “title” and “h1” texts, but that's ok, because my whole script will finish running in a few seconds anyway and this is just one-time use.

Now, here's the code i wrote quickly to fix the tags:

;; -*- coding: utf-8 -*-
;; 2011-03-20
;; change title to h1 tag's text in “Time Machine” pages
;; 
;; for each HTML page in 〔~/web/xahlee_org/p/time_machine/〕
;; if the title tag and h1 tag text differ, make the title use h1's text

(setq inputDir "~/web/xahlee_org/p/time_machine/" ) ; dir must end with a slash

(defun my-process-file (fPath)
  "process the file at fullpath fPath …"
  (let ( titleText h1Text p1 p2)

    (setq h1Text (get-html-file-h1-text fPath))
    (setq titleText (get-html-file-title fPath))

    (if (equal h1Text titleText)
        nil
      (progn 
        (find-file fPath )
        (goto-char 1)
        (search-forward "<title>" )
        (setq p1 (point) )

        (search-forward "</title>" )
        (backward-char 8)
        (setq p2 (point) )

        (delete-region p1 p2 )
        (insert h1Text)
        (print fPath)
        ))
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*process time machine output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )

Again, all the above script are variations of find/replace. For code detail, see: How to Write grep in Emacs Lisp and Emacs Lisp: Find String Inside HTML Tag.

In this script, i didn't include code to save the changed file. This way, i can do some manual verification after the script has run. When i want them all saved, i just call “ibuffer” and type 3 keys 【* u S】 to have all of them saved, and 【D y】 closes them all.

What might you use this script for in your work?

blog comments powered by Disqus