Emacs Lisp Idioms for Text Processing in Batch Style

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows some common programing patterns of emacs lisp for batch text processing. Typically the type of tasks one would do in unix shell tools or Perl. For example, find/replace on a list of given files or dir, process (small sized) log files, compile a bunch of files, generating a report.

If you don't know elisp, see: Emacs Lisp Basics.

Reading & Writing to File

Read-Only Text Processing

To process thousands of files, read only, use with-temp-buffer.

(defun my-process-file (fPath)
  "Process the file at path FPATH …"
  (with-temp-buffer fPath
    (insert-file-contents fPath)
    ;; process it …
    ) )

Modifying Files

If you want to write to file ONLY when you actually changed the file, you can create flag variable and call write-region, like this:

(defun my-process-file (fPath)
  "Process the file at path FPATH …"
  (let ( fileChanged-p )
      (insert-file-contents fPath)

      ;; process text …
      ;; set fileChanged-p to true/false

      (when fileChanged-p (write-region 1 (point-max) fPath) ) ) ) )

If you always need to change every file, use with-temp-file.

Note: you should not use find-file or write-file, because they have many side-effects and is slow. See: Emacs Lisp Text Processing: find-file vs with-temp-buffer.

Read a File into List of Lines

To read a whole file into a list of lines, you can use this code:

(defun read-lines (fPath)
  "Return a list of lines of a file at FPATH."
    (insert-file-contents fPath)
    (split-string (buffer-string) "\n" t)))

Once you have a list, you can use mapcar to process each element in the list. If you don't need the resulting list, use mapc.

Note: in elisp, it's more efficient to process text in a buffer than doing complicated string manipulation with string data type. But, if your lines are all short and you don't need to know the text that comes before or after current line, then, list of lines can be easier to work with. For a example of line-by-line processing in a buffer, see: Process a File line-by-line in Emacs Lisp.

File & Dir Manipulation

Filename Manipulation

Commonly used functions to manipulate file names.

(file-name-directory f)      ; get dir path
(file-name-nondirectory f)   ; get file name

(file-name-extension f)      ; get suffix
(file-name-sans-extension f) ; remove suffix

(file-relative-name f )      ; get relative path
(expand-file-name f )        ; get full path

default-directory       ; get the current dir (this is a variable)

(info "(elisp) File Names")

File & Dir Manipulation

Commonly used functions to manipulate files & dirs.

(file-exists-p FILENAME)

(rename-file FILE NEWNAME &optional OK-IF-ALREADY-EXISTS)


(delete-file FILE)

(set-file-modes FILE MODE)
;; get list of file names
(directory-files DIR &optional FULL MATCH NOSORT)

;; create a dir. Non existent paren dirs will be created
(make-directory DIR &optional PARENTS)

;; copy/delete whole dir
(delete-directory DIRECTORY &optional RECURSIVE) ; RECURSIVE option new in emacs 23.2
(copy-directory DIR NEWNAME &optional KEEP-TIME PARENTS) ; new in emacs 23.2

How to find the current elisp script's name programmatically?

(or load-file-name buffer-file-name)

Explanation: If your elisp script needs to know its own file name at run time, you need to use the (or load-file-name buffer-file-name), because if user ran your script by eval-buffer, then “load-file-name”'s value would be nil. So, using both {load-file-name, buffer-file-name} is a good way to get the script name regardless whether the script is executed by load or eval buffer.

If you want the full path, call file-name-directory on the result. See also: Emacs Lisp Scripting Quirk: Relative Paths.

(info "(elisp) Files")

Example: make backup file.

(defun make-backup ()
  "Make a backup copy of current buffer's file.
Create a backup of current buffer's file.
The new file name is the old file name with trailing “~”, in the same dir.
If such a file already exist, append more “~”.
If the current buffer is not associated with a file, its a error."
  (let (fName backupName)
    (setq fName (buffer-file-name))
    (setq backupName (concat fName "~"))

    (while (file-exists-p backupName)
      (setq backupName (concat backupName "~"))

    (copy-file fName backupName t)
    (message (concat "Backup saved as: " (file-name-nondirectory backupName)))

Calling a Shell Command

Call a shell command, wait for it to finish before continuing, use shell-command or shell-command-to-string.

; idiom for calling a shell command
(shell-command "cp /somepath/myfile.txt  /somepath")

; idiom for calling a shell command and get its output
(shell-command-to-string "ls")

Call a shell command, but don't wait for it to finish before continuing, use start-process or start-process-shell-command. Here a example:

;; open files in Linux desktop
 (lambda (fPath)
   (let ((process-connection-type nil))
     (start-process "" nil "xdg-open" fPath)) )

For detail, see: Emacs Dired: Opening Files in External Apps.

(info "(elisp) Asynchronous Processes")

Traverse a directory

In the following, “my-process-file” is a function that takes a file full path as input. The “find-lisp-find-files” will generate a list of full paths, using a regex on file name. The mapc will apply the function to elements in a list.

; idiom for traversing a directory
(require 'find-lisp)
(mapc 'my-process-file (find-lisp-find-files "~/web/emacs/" "\\.html$"))

Running Elisp in Batch mode

You can run a elisp program in the Operating System's command line interface (shell), using the --script option. For example:

emacs --script process_log.el

Emacs has few other options and variations to control how you run a elisp script. Here's a table of main options:

Full NameShort NameMeaning
--no-init-file-qDo not load your init files {~/.emacs, ~/.emacs.el, ~/.emacs.d/init.el} nor site-wide default.el.
--no-site-fileDo not load the site-wide site-start.el.
--batchDo not launch emacs as a editor. Use it together with --load to specify a lisp file. This implies --no-init-file but not --no-site-file.
--load="path"-l pathExecute the elisp file at path.
--script pathRun emacs like --batch with --load set to path.

The site-start.el is a init file for site-wide running of emacs. It pretty much means a init file for all users of this emacs installation. It may be added by a sys admin, or it may be part of a particular emacs distribution (⁖ Carbon Emacs, Aquamacs Emacs, ErgoEmacs …). You can usually find this file in the directory where emacs is installed, if it exists. Normally, you shouldn't worry about this file. Only time you need to disable it is if you want a pure GNU Emacs experience (without loading any packages added by third party)

When you write a elisp script to run in batch, make sure your elisp file is:

If you've done a clean job in your elisp script, then, all you need to use is emacs --script elisp file path.

If your elisp program requires functions that you've defined in your emacs init file, then you should explicitly load it in your script by (load emacs init file path), or, you can add the option to load it, like this: --user=xah. (best to actually pull out the function you need)

On Linux, you can also add a shebang line in your elisp script, like this: #!/usr/bin/emacs --script, so you can call your script in command line by name.

If you are on a Mac with Carbon Emacs or Aquamacs, call it from the command line like this:

/Applications/Emacs.app/Contents/MacOS/Emacs --script=process_log.el

(info "(emacs) Option Index")

Getting Command Line Arguments

To get arguments passed from the command line, use the built-in variable argv. See: Getting Command Line Arguments.

For some practical examples of batch style text processing, see:

( Thanks to Rubén Berenguel for a correction.) ( thx to Phil Hudson for a tip.)

Like it?
Buy Xah Emacs Tutorial
or share
blog comments powered by Disqus