This page shows a simple practical elisp script for HTML tag transformation.
I want transform the HTML tag
<span class="w">xyz</span> to
<b>xyz</b>, for over a hundred files. Also, print a report of the changes.
This is for my English vocabulary and literature study project. There are a few hundred files.
Here's outline of steps.
Here's the code:
;; -*- coding: utf-8 -*- ;; 2011-07-18 ;; replace <span class="w">…</span> to <b>…</b> ;; ;; do this for all files in a dir. (setq inputDir "~/web/xahlee_org/PageTwo_dir/Vocabulary_dir/" ) ; dir should end with a slash (setq changedItems '()) (defun my-process-file (fPath) "Process the file at FPATH …" (let (myBuff myWord) (setq myBuff (find-file fPath)) (widen) (goto-char 1) ;; in case buffer already open (while (search-forward-regexp "<span class=\"w\">\\([^<]+?\\)</span>" nil t) (setq myWord (match-string 1)) (when (< (length myWord) 15) ; a little double check in case of possible mismatched tag (replace-match (concat "<b>" myWord "</b>" ) t) (setq changedItems (cons (substring-no-properties myWord) changedItems ) ) ) ) ;; close buffer if there's no change. Else leave it open. (when (not (buffer-modified-p myBuff)) (kill-buffer myBuff) ) ) ) (require 'find-lisp) (setq make-backup-files t) (setq case-fold-search nil) (setq case-replace nil) (let (outputBuffer) (setq outputBuffer "*xah span.w to b replace output*" ) (with-output-to-temp-buffer outputBuffer (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$")) (print changedItems) (princ "Done deal!") ) )
The above is fairly easy to understand. You might refresh elisp basics at: Text Processing with Emacs Lisp Batch Style and Emacs Lisp Idioms (for writing interactive commands).
Here's the output: elisp_batch_html_tag_transform_bold_output.txt.
There are over 1k changes. The output is extremely useful because i can just take a few seconds to glance at the output to know there are no errors. Errors are possible because whenever using regex to parse HTML, a missing tag in HTML or even a unexpected nested tag, can mean disaster.