Elisp: Create Sitemap

By Xah Lee. Date: . Last updated: .

This page shows how to use emacs lisp to create a sitemap.

Problem

Write a elisp script to generate a sitemap. That is: create a file of sitemap format that lists all files in a directory.

Detail

A sitemap is a XML file that lists URLs of all files in a website for web crawlers to crawl.

A sitemap file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>

   …

</urlset>
  1. The file can have many <url>…</url> item.
  2. Each <url> container represent a file and other info.
  3. The <loc> is a URL of the file.
  4. The <lastmod>, <changefreq>, <priority> are optional.
  5. A sitemap file can list a max of 50k URLs.

The purpose of sitemap file is for web crawlers to easily know all files that exist on your site.

Solution

The general plan is very simple. Here's one way to do it.

  1. Create a new file, insert XML header tags.
  2. Traverse the web root dir. For each file, determine whether it should be listed in the sitemap.
  3. If so, generate the proper URL tag and insert it into the new file.
  4. When done visiting files, insert the XML footer tags. Save the file.
;; -*- coding: utf-8; lexical-binding: t; -*-
;; version: 2019-06-11
;; home page: /Users/xah/web/ergoemacs_org/emacs/make_sitemap.html

(require 'seq)

(setq xah-web-root-path "/Users/xah/web/" )

(defvar xahsite-external-docs nil "A vector of dir paths.")
(setq  xahsite-external-docs
       [
        "ergoemacs_org/emacs_manual/"
        "xahlee_info/REC-SVG11-20110816/"
        "xahlee_info/clojure-doc-1.8/"
        "xahlee_info/css_2.1_spec/"
        "xahlee_info/css_transitions/"
        "xahlee_info/js_es2011/"
        "xahlee_info/js_es2015/"
        "xahlee_info/js_es2015_orig/"
        "xahlee_info/js_es2016/"
        "xahlee_info/js_es2018/"
        "xahlee_info/node_api/"
        ])

(defun xahsite-generate-sitemap (@domain-name)
  "Generate a sitemap.xml.gz file of xahsite at doc root.
@domain-name must match a existing one.
Version 2018-09-17"
  (interactive
   (list (ido-completing-read "choose:" '( "ergoemacs.org" "wordyenglish.com" "xaharts.org" "xahlee.info" "xahlee.org" "xahmusic.org" "xahsl.org" ))))
  (let (
        ($sitemapFileName "sitemap.xml" )
        ($websiteDocRootPath (concat xah-web-root-path (replace-regexp-in-string "\\." "_" @domain-name "FIXEDCASE" "LITERAL") "/")))
    ;; (print (concat "begin: " (format-time-string "%Y-%m-%dT%T")))
    (let (
          ($filePath (concat $websiteDocRootPath $sitemapFileName ))
          ($sitemapBuffer (generate-new-buffer "sitemapbuff")))
      (with-current-buffer $sitemapBuffer
        (set-buffer-file-coding-system 'unix)
        (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">
"))
      (mapc
       (lambda ($f)
         (setq $pageMoved-p nil)
         (when (not (or
                     (string-match "/xx" $f) ; ; dir/file starting with xx are not public
                     (string-match "403error.html" $f)
                     (string-match "404error.html" $f)))
           (with-temp-buffer
             (insert-file-contents $f nil 0 100)
             (when (search-forward "page_moved_64598" nil t)
               (setq $pageMoved-p t)))
           (when (not $pageMoved-p)
             (with-current-buffer $sitemapBuffer
               (insert "<url><loc>"
                       "http://" @domain-name "/" (substring $f (length $websiteDocRootPath))
                       "</loc></url>\n"
                       )))))
       (seq-filter
        (lambda (path)
          (not (seq-some
                (lambda (x) (string-match x path))
                xahsite-external-docs
                )))
        (directory-files-recursively $websiteDocRootPath "\\.html$" )))
      (with-current-buffer $sitemapBuffer
        (insert "</urlset>")
        (write-region (point-min) (point-max) $filePath nil 3)
        (kill-buffer ))
      (find-file $filePath)
      )
    ;; (print (concat "done: " (format-time-string "%Y-%m-%dT%T")))
    ))

(defun xahsite-generate-sitemap-all ()
  "do all
2016-08-15"
  (interactive)
  (xahsite-generate-sitemap "ergoemacs.org" )
  (xahsite-generate-sitemap "wordyenglish.com" )
  (xahsite-generate-sitemap "xaharts.org" )
  (xahsite-generate-sitemap "xahlee.info" )
  (xahsite-generate-sitemap "xahlee.org" )
  (xahsite-generate-sitemap "xahmusic.org" )
  (xahsite-generate-sitemap "xahsl.org"  ))

On a site with 3515 html files (10 times more if counting image files etc), the script takes 5 seconds to run. (e.g. timing based on running it a second time, thus not counting disk reading time. )

See also: Golang: Generate Sitemap

If you have a question, put $5 at patreon and message me on xah discord.
Or support me by Buy Xah Emacs Tutorial

Emacs Tutorial

Emacs Init

Emacs Keys

ELisp

ELisp Examples

ELisp Write Major Mode


ELisp Examples

Xah Commands

Text Transform Under Cursor

Commands Do thing-at-point

Command to Insert Things

Script Examples

Misc