Elisp: Command to Extract URL

By Xah Lee. Date: . Last updated: .

This page shows a command to extract all URLs in a HTML file.

For example, if you have this text:

<a href="../cats.html">cats</a>, <a href="http://en.wikipedia.org/wiki/Idiom">Idiom</a>, <a class="b" href="computing.html"></a>

After calling the command, the following is copied to kill-ring:


If there's no text selection, current paragraph is used.

If universal-argument is called first, relative URLs are converted to full path.


There are many ways to code this. Here's one:

(defun xah-html-extract-url (@begin @end &optional @not-full-path-p)
  "Extract URLs in current block or region to `kill-ring'.
When called interactively, copy result to `kill-ring'. Each URL in a line.

If the URL is a local file relative path, convert it to full path.

If `universal-argument' is called first, don't convert relative URL to full path.

This command extracts all text of the forms
 <‹letter› … href=‹path› …>
 <‹letter› … src=‹path› …>
that is on a a single line, by regex. The quote for ‹path› may be double or single quote.

When called in lisp code, @begin @end are region begin/end positions.
Returns a list of strings.

URL `http://ergoemacs.org/emacs/elisp_extract_url_command.html'
Version 2019-07-02"
   (let ($p1 $p2)
     ;; set region boundary $p1 $p2
     (if (use-region-p)
         (setq $p1 (region-beginning) $p2 (region-end))
         (if (re-search-backward "\n[ \t]*\n" nil "NOERROR")
             (progn (re-search-forward "\n[ \t]*\n")
                    (setq $p1 (point)))
           (setq $p1 (point)))
         (if (re-search-forward "\n[ \t]*\n" nil "NOERROR")
             (progn (re-search-backward "\n[ \t]*\n")
                    (setq $p2 (point)))
           (setq $p2 (point)))))
     (list $p1 $p2 (not current-prefix-arg))))

  (let (($regionText (buffer-substring-no-properties @begin @end))
        ($urlList (list)))
      (insert $regionText)

      (goto-char 1)
      (while (re-search-forward "<" nil t)
        (replace-match "\n<" "FIXEDCASE" "LITERAL"))

      (goto-char 1)
      (while (re-search-forward
              "<[A-Za-z]+.+?\\(href\\|src\\)[[:blank:]]*?=[[:blank:]]*?\\([\"']\\)\\([^\"']+?\\)\\2" nil t)
        (push (match-string 3) $urlList)))
    (setq $urlList (reverse $urlList))

    (when @not-full-path-p
      (setq $urlList
             (lambda ($x)
               (if (string-match "^http:\\|^https:" $x )
                   (progn $x)
                   (expand-file-name $x (file-name-directory (buffer-file-name))))))

    (when (called-interactively-p 'any)
      (let (($printedResult (mapconcat 'identity $urlList "\n")))
        (kill-new $printedResult)
        (message "%s" $printedResult)))
    $urlList ))

for latest updates of this code, see Emacs: Xah HTML Mode.

Elisp HTML Commands

  1. wrap-url
  2. URL to Link
  3. Link to Dead Link
  4. Make Image Links
  5. Extract URL
  6. Word to Wikipedia Linkify
  7. Wikipedia URL Linkify
  8. URL Percent Decode/Encode
  9. Lines to HTML Table
  10. Markup Function Names
  11. Color Source Code
  12. Transform Text Under Cursor
  13. Chinese Char Reference Linkify
  14. HTML make-citation
  15. update-title
  16. Google Map Linkify 🌐
  17. CSS Compressor
  18. Make Ruby Annotation
  19. Move Image File

If you have a question, put $5 at patreon and message me.
Or Buy Xah Emacs Tutorial
Or buy a nice keyboard: Best Keyboards for Emacs


Emacs Lisp