Emacs Lisp Text Processing: find-file vs with-temp-buffer
This page gives a detailed speed comparion of using emacs lisp's
with-temp-buffer for processing 5 thousand files.
find-file to open 5565 files, takes 63 seconds.
with-temp-buffer, 16 seconds. (4 times faster.)
Moral: when doing batch text processing of thousands of files, don't use
with-temp-file instead. (use the latter when you need to make changes to the file.)
(if you don't know how, see basics at: Elisp: Writing Elisp Script.)
Here's the test that you can run.
For this testing purpose, the input dir used is the HTML version of GNU Emacs Lisp Reference Manual. A total of ~900 files. If you like to run the test, you can download it at: http://www.gnu.org/software/emacs/manual/elisp.html. (download the “with one web page per node” version)
Here's the actual code.
;; 2011-12-20 ;; Speed test script ;; ;; What the script do: ;; Creates a 〔sitemap.xml〕 file. ;; Open each files in a dir, if the file doesn't contain the word “refresh”, add a entry of the file to 〔sitemap.xml〕. ;; Must end in a slash. Must not start with ~ (setq webroot "/Users/h3/web/xahlee_org/emacs_manual/elisp/") ;; ------------------------ (defun my-process-file (fPath destBuff) "Process the file at fullpath FPATH. Write result to buffer DESTBUFF." (with-temp-buffer (insert-file-contents fPath) (goto-char 1) (when (not (search-forward "refresh" nil "noerror")) (with-current-buffer destBuff (insert "<url><loc>") (insert (concat "http://example.org/" (substring fPath (length webroot)))) (insert "</loc></url>\n") )) ) ) ;; ------------------------ (print (benchmark-run 1 ;; create sitemap buffer (let (filePath sitemapBuf) (setq filePath (concat webroot "sitemap.xml")) (setq sitemapBuf (find-file filePath)) (erase-buffer) (set-buffer-file-coding-system 'unix) (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> ") (require 'find-lisp) (mapc (lambda ($x) (my-process-file $x sitemapBuf)) (find-lisp-find-files webroot "\\.html$")) (insert "</urlset>") (save-buffer) ) )) (message "%s" "Yay, Done!")
To run it:
- Copy and Paste and save this file as
- Change the “webroot” variable to a directory on your computer that has lots HTML files.
- In terminal, run it with “--script”, like this:
emacs --script ~/speedtest_temp-buff.el. This won't load your init files. You init files might contain hooks or other things that effect the speed.
- When the program is finished, it'll create a file named
sitemap.xmlin the same dir of “webroot”.
benchmark-run's output will be printed on the screen.
What the script does is very simple:
- Open each HTML file in a directory
- If the file contains the string “refresh”, then do nothing.
- Else, add the file name as a entry to a file
sitemap.xml. (this file is created by the script.)
This version gets file content by using a temp buffer, like this:
(with-temp-buffer (insert-file-contents fPath) ;; do processing here )
The script is a simplified version of generating a sitemap. [see Elisp: Create Sitemap]
find-file version is identical except the
my-process-file function. Like this:
(defun my-process-file (fPath destBuff) "Process the file at fullpath FPATH. Write result to buffer DESTBUFF." (let (myBuffer) (setq myBuffer (find-file fPath)) (goto-char 1) (when (not (search-forward "refresh" nil "noerror")) (with-current-buffer destBuff (insert "<url><loc>") (insert (concat "http://example.org/" (substring fPath (length webroot)))) (insert "</loc></url>\n") )) (kill-buffer myBuffer) ) )
Here's the test results (all timing are in seconds, rounded).
|Script Version||Script Running Time||Garbage Collection||Garbage Collection Time||Actual Time|
Do Not use find-file or write-file
write-file, or any function that visits a file has many unwanted side-effects, and it can be up to 40 times slower (i tested before). Here's example of side-effects:
- It keeps undo info.
- It syntax color the buffer.
- It displays the file. (very slow if you have
- It may have tons of hooks added by others. (
- It may do backup.
Jon Snader (jcs) suggested using the “benchmark-run” for timing report and garbage collection info. http://irreal.org/blog/?p=400. The “benchmark-run” is tremendously useful. Thanks Jon.
Thanks to Trey Jackson for a major correction on this article. In my previous report, the timing difference was a factor of 45, because i had various personal hooks.
If you have a question, put $5 at patreon and message me.