Emacs Lisp Text Processing: find-file vs with-temp-buffer

By Xah Lee. Date: . Last updated: .

This page gives a detailed speed comparion of using emacs lisp's find-file vs with-temp-buffer for processing 5 thousand files.

Summary

Using find-file to open 5565 files, takes 63 seconds.

Using with-temp-buffer, 16 seconds. (4 times faster.)

Moral: when doing batch text processing of thousands of files, don't use find-file, use with-temp-buffer or with-temp-file instead. (use the latter when you need to make changes to the file.)

(if you don't know how, see basics at: Emacs Lisp Idioms for Text Processing in Batch Style.)

Detail

Here's the test that you can run.

For this testing purpose, the input dir used is the HTML version of GNU Emacs Lisp Reference Manual. A total of ≈900 files. If you like to run the test, you can download it at: http://www.gnu.org/software/emacs/manual/elisp.html. (download the “with one web page per node” version)

with-temp-buffer Version

Here's the actual code.

;; 2011-12-20
;; Speed test script
;; 
;; What the script do:
;; Creates a 〔sitemap.xml〕 file.
;; Open each files in a dir, if the file doesn't contain the word “refresh”, add a entry of the file to 〔sitemap.xml〕.

;; Must end in a slash. Must not start with ~
(setq webroot "/Users/h3/web/xahlee_org/emacs_manual/elisp/")

;; ------------------------

(defun my-process-file (fPath destBuff)
  "Process the file at fullpath FPATH.
Write result to buffer DESTBUFF."
  (with-temp-buffer
      (insert-file-contents fPath)
      (goto-char 1)
      (when (not (search-forward "refresh" nil "noerror"))
        (with-current-buffer destBuff
          (insert "<url><loc>")
          (insert (concat "http://example.org/" (substring fPath (length webroot))))
          (insert "</loc></url>\n") )) ) )

;; ------------------------

(print 
 (benchmark-run 1
     ;; create sitemap buffer
     (let (filePath sitemapBuf)
       (setq filePath (concat webroot "sitemap.xml"))
       (setq sitemapBuf (find-file filePath))
       (erase-buffer)
       (set-buffer-file-coding-system 'unix)
       (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">
")

       (require 'find-lisp)
       (mapc
        (lambda (ξx) (my-process-file ξx sitemapBuf))
        (find-lisp-find-files webroot "\\.html$"))

       (insert "</urlset>")
       (save-buffer)
       )
   ))

(message "%s" "Yay, Done!")

To run it:

What the script does is very simple:

This version gets file content by using a temp buffer, like this:

(with-temp-buffer
  (insert-file-contents fPath)
  ;; do processing here
)

The script is a simplified version of generating a sitemap. 〔➤see Emacs Lisp: Create Sitemap

find-file Version

The find-file version is identical except the my-process-file function. Like this:

(defun my-process-file (fPath destBuff)
  "Process the file at fullpath FPATH.
Write result to buffer DESTBUFF."
  (let (myBuffer)
    (setq myBuffer (find-file fPath))
    (goto-char 1)
    (when (not (search-forward "refresh" nil "noerror"))
      (with-current-buffer destBuff
        (insert "<url><loc>")
        (insert (concat "http://example.org/" (substring fPath (length webroot))))
        (insert "</loc></url>\n") ))
    (kill-buffer myBuffer) ) )

Speed difference

Here's the test results (all timing are in seconds, rounded).

Script VersionScript Running TimeGarbage CollectionGarbage Collection TimeActual Time
find-file8.2800.89
with-temp-buffer1.46160.161.5

Do Not use find-file or write-file

find-file, write-file, or any function that visits a file has many unwanted side-effects, and it can be up to 40 times slower (i tested before). Here's example of side-effects:

Misc Notes

Jon Snader (jcs) suggested using the “benchmark-run” for timing report and garbage collection info. http://irreal.org/blog/?p=400. The “benchmark-run” is tremendously useful. Thanks Jon.

Stefan Monnier suggested that turning off “vc-handled-backends” might speed up the “find-file” version slightly. Source groups.google.com comp.lang.lisp

Thanks to Trey Jackson for a major correction on this article. In my previous report, the timing difference was a factor of 45, because i had various personal hooks.

Like it? Buy Xah Emacs Tutorial. Thanks.

or, buy something from Best Keyboard for Emacs