Emacs Lisp Text Processing: find-file vs with-temp-buffer

By Xah Lee. Date: . Last updated: .

This page gives a detailed speed comparion of using emacs lisp's find-file vs with-temp-buffer for processing 5 thousand files.

Summary

Using find-file to open 5565 files, takes 63 seconds.

Using with-temp-buffer, 16 seconds. (4 times faster.)

Moral: when doing batch text processing of thousands of files, don't use find-file, use with-temp-buffer or with-temp-file instead. (use the latter when you need to make changes to the file.)

(if you don't know how, see basics at: Elisp: Writing Elisp Script.)

Detail

Here's the test that you can run.

For this testing purpose, the input dir used is the HTML version of GNU Emacs Lisp Reference Manual. A total of ~900 files. If you like to run the test, you can download it at: http://www.gnu.org/software/emacs/manual/elisp.html. (download the “with one web page per node” version)

with-temp-buffer Version

Here's the actual code.

;; 2011-12-20
;; Speed test script
;; 
;; What the script do:
;; Creates a 〔sitemap.xml〕 file.
;; Open each files in a dir, if the file doesn't contain the word “refresh”, add a entry of the file to 〔sitemap.xml〕.

;; Must end in a slash. Must not start with ~
(setq webroot "/Users/h3/web/xahlee_org/emacs_manual/elisp/")

;; ------------------------

(defun my-process-file (fPath destBuff)
  "Process the file at fullpath FPATH.
Write result to buffer DESTBUFF."
  (with-temp-buffer
      (insert-file-contents fPath)
      (goto-char 1)
      (when (not (search-forward "refresh" nil "noerror"))
        (with-current-buffer destBuff
          (insert "<url><loc>")
          (insert (concat "http://example.org/" (substring fPath (length webroot))))
          (insert "</loc></url>\n") )) ) )

;; ------------------------

(print
 (benchmark-run 1
     ;; create sitemap buffer
     (let (filePath sitemapBuf)
       (setq filePath (concat webroot "sitemap.xml"))
       (setq sitemapBuf (find-file filePath))
       (erase-buffer)
       (set-buffer-file-coding-system 'unix)
       (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">
")

       (require 'find-lisp)
       (mapc
        (lambda ($x) (my-process-file $x sitemapBuf))
        (find-lisp-find-files webroot "\\.html$"))

       (insert "</urlset>")
       (save-buffer)
       )
   ))

(message "%s" "Yay, Done!")

To run it:

What the script does is very simple:

This version gets file content by using a temp buffer, like this:

(with-temp-buffer
  (insert-file-contents fPath)
  ;; do processing here
)

The script is a simplified version of generating a sitemap. [see Elisp: Create Sitemap]

find-file Version

The find-file version is identical except the my-process-file function. Like this:

(defun my-process-file (fPath destBuff)
  "Process the file at fullpath FPATH.
Write result to buffer DESTBUFF."
  (let (myBuffer)
    (setq myBuffer (find-file fPath))
    (goto-char 1)
    (when (not (search-forward "refresh" nil "noerror"))
      (with-current-buffer destBuff
        (insert "<url><loc>")
        (insert (concat "http://example.org/" (substring fPath (length webroot))))
        (insert "</loc></url>\n") ))
    (kill-buffer myBuffer) ) )

Speed difference

Here's the test results (all timing are in seconds, rounded).

Script VersionScript Running TimeGarbage CollectionGarbage Collection TimeActual Time
find-file8.2800.89
with-temp-buffer1.46160.161.5

Do Not use find-file or write-file

find-file, write-file, or any function that visits a file has many unwanted side-effects, and it can be up to 40 times slower (i tested before). Here's example of side-effects:

Misc Notes

Jon Snader (jcs) suggested using the “benchmark-run” for timing report and garbage collection info. http://irreal.org/blog/?p=400. The “benchmark-run” is tremendously useful. Thanks Jon.

Stefan Monnier suggested that turning off “vc-handled-backends” might speed up the “find-file” version slightly. Source groups.google.com comp.lang.lisp

Thanks to Trey Jackson for a major correction on this article. In my previous report, the timing difference was a factor of 45, because i had various personal hooks.

Patreon me $5 patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboard for Emacs

If you have a question, put $5 at patreon and message me.