Elisp: Change URL into HTML Link

By Xah Lee. Date: . Last updated: .

This page is a lisp tutorial on writing a command that transform the text under cursor from a URL into specific form of HTML link.

Problem

I want a command that changes the URL under cursor, such as:

http://some.example.com/xyz.html

becomes:

<a class="sorc" href="http://some.example.com/xyz.html" data-accessed="2015-09-12">http://some.example.com/xyz.html</a>

Note that the date is automatically inserted.

The data- are Custom Data Attribute introduced in HTML5. [see HTML5 Custom Data Attribute]

Solution

Change URL into a Link

Here's the code.

(defun xah-html-source-url-linkify (@prefixArg)
  "Make URL at cursor point into a html link.
If there's a text selection, use the text selection as input.

Example: http://example.com/xyz.htm
becomes
<a class=\"sorc\" href=\"http://example.com/xyz.htm\" data-accessed=\"2008-12-25\">example.com…</a>

The anchor text may be of 4 possibilities, depending on value of `universal-argument'.

1 → 「‹full url›」
2 or 4 → 「‹domain›…」
3 → 「img src」
0 or any → smartly decide.

URL `http://ergoemacs.org/emacs/elisp_html-linkify.html'
Version 2015-09-12"
  (interactive "P")
  (let (
         $boundaries
         $p1-input
         $p2-input
         $input-str
         $p1-url $p2-url $p1-tag $p2-tag
         $url $domainName $linkText )

    (if (use-region-p)
        (progn
          (setq $p1-input (region-beginning))
          (setq $p2-input (region-end))
          (setq $input-str (buffer-substring-no-properties $p1-input $p2-input)))
      (progn
        (setq $boundaries (bounds-of-thing-at-point 'url))
        (setq $p1-input (car $boundaries))
        (setq $p2-input (cdr $boundaries))
        (setq $input-str (buffer-substring-no-properties $p1-input $p2-input))))

    ;; check if it's just plain URL or already in linked form 「<a href=…>…</a>」
    ;; If latter, you need to get the boundaries for the entire link too.
    (if (string-match "href=\"" $input-str)
        (save-excursion
          (search-backward "href=" (- (point) 104)) ; search boundary as extra guard for error
          (forward-char 6)
          (setq $p1-url (point))
          (search-forward "\"" (+ $p1-url 104))
          (setq $p2-url (- (point) 1))

          (goto-char $p1-url)
          (search-backward "<a" (- $p1-url 30))
          (setq $p1-tag (point))
          (goto-char $p2-url)
          (search-forward "</a>" (+ $p2-url 140))
          (setq $p2-tag (point)))
      (progn
        (setq $p1-url $p1-input)
        (setq $p2-url $p2-input)
        (setq $p1-tag $p1-input)
        (setq $p2-tag $p2-input)))

    (setq $url (replace-regexp-in-string "&amp;" "&" (buffer-substring-no-properties $p1-url $p2-url) nil "LITERAL")) ; in case it's already encoded. TODO this is only 99% correct.

    ;; get the domain name
    (setq $domainName
          (progn
            (string-match "://\\([^\/]+?\\)/" $url)
            (match-string 1 $url)))

    (setq $linkText
          (cond
           ((equal @prefixArg 1) $url) ; full url
           ((or (equal @prefixArg 2) (equal @prefixArg 4) (equal @prefixArg '(4))) (concat $domainName "…")) ; ‹domain›…
           ((equal @prefixArg 3) "img src") ; img src
           (t (if
                  (or
                   (string-match "wikipedia\\.org.+jpg$" $url)
                   (string-match "wikipedia\\.org.+JPG$" $url)
                   (string-match "wikipedia\\.org.+png$" $url)
                   (string-match "wikipedia\\.org.+PNG$" $url)
                   (string-match "wikipedia\\.org.+svg$" $url)
                   (string-match "wikipedia\\.org.+SVG$" $url))
                  "img src"
                $url
                )) ; smart
           ))

    (setq $url (replace-regexp-in-string "&" "&amp;" $url))

    ;; delete URL and insert the link
    (delete-region $p1-tag $p2-tag)
    (insert (format
             "<a class=\"sorc\" href=\"%s\" data-accessed=\"%s\">%s</a>"
             $url (format-time-string "%Y-%m-%d") $linkText
             ))))

The code is easy to understand. If you find it difficult, see Elisp: Writing a Wrap-URL Function, which has more explanation.

Transform HTML Link into Dead Link Markup

I also want a command to transform a link to a dead-link format. For example, a link like this:

<a class="sorc" href="http://some.example.com/" data-accessed="2000-01-01">http://some.example.com/</a>

becomes:

<s data-accessed="2000-01-01" data-defunct-date="2015-09-12">http://some.example.com/</s>

with today's date added to the “data-defunct-date” part.

The following is the code to turn a link into a dead link format.

(defun xah-html-make-link-defunct ()
  "Make the html link under cursor to a defunct form.
Example:
If cursor is inside this tag
 <a class=\"sorc\" href=\"http://example.com/\" data-accessed=\"2008-12-26\">…</a>
 (and inside the opening tag.)

It becomes:

 <s data-accessed=\"2006-03-11\" data-defunct-date=\"2014-01-11\">http://www.math.ca/cgi/kabol/search.pl</s>

URL `http://ergoemacs.org/emacs/elisp_html-linkify.html'
Version 2015-09-12"
  (interactive)
  (let ($p1 $p2 $wholeLinkStr $newLinkStr $url $accessedDate)
    (save-excursion
      ;; get the boundary of opening tag
      (forward-char 3)
      (search-backward "<a " ) (setq $p1 (point))
      (search-forward "</a>") (setq $p2 (point))

      ;; get ξwholeLinkStr
      (setq $wholeLinkStr (buffer-substring-no-properties $p1 $p2))

      ;; generate replacement text
      (with-temp-buffer
        (insert $wholeLinkStr)

        (goto-char 1)
        (re-search-forward  "href=\"\\([^\"]+?\\)\"")
        (setq $url (match-string 1))

        (re-search-forward  "data-accessed=\"\\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\\)\"")
        (setq $accessedDate (match-string 1))

        (setq $newLinkStr (format "<s data-accessed=\"%s\" data-defunct-date=\"%s\">%s</s>" $accessedDate (format-time-string "%Y-%m-%d") $url ))))

    (delete-region $p1 $p2)
    (insert $newLinkStr)))

the HTML s tag means strike thru. [see HTML: What's the Difference Between “s” vs “strike” vs “del” Tags]

For latest version of these code, see: Emacs: Xah HTML Mode.

Notes About Preserving Dead Links in Blogs

Here's some detail about why i want to have this command.

In writing blogs, often you need to cite links. The links may be other blogs, news sites, or some random site. Many such URL are ephemeral. They exist today, but may become a dead link few months later. Typically, if the URL doesn't have a dedicated domain, it is more likely to go bad sooner.

Over the years, you hay have thousands links in your blog. When you update your pages years later, you find dead links such as 〔http://someRandomBlog.org/importantToday.html〕, and may not remember what that link is about. No author, no title, no idea when that link became dead. Sometimes, domain name owner of the link changed, so the linked page have became a spam site.

One partial solution is to add access date together with the link, like this:

<p>I found a fantastic <a href="http://some.example.com/xyz.html">emacs blog</a>
(accessed on 2010-12-03) today!</p>

With a access date, at least you know when the link was good. If the link went bad, you or your readers can at least try to see the link thru web archive site such as Wayback Machine.

However, this requires manual insertion of the date. Also, the “accessed on” info in your content is very distracting.

It would be better, if the access date is embedded in the link.

HTML5 introduced Custom Data Attribute. [see HTML5 Custom Data Attribute]

A uniform format for all your links is good. Because, if later on HTML6 or other HTML Microformat has a way to add access date to links, i can write a script that reliably change all external links to the new format.

Change URL into Citation Format

If you need a citation format, such with {author, title, date} info, see:

Elisp: Writing a make-citation Command

Like my tutorial? Put $5 at patreon

Or Buy Xah Emacs Tutorial

Or buy a nice keyboard: Best Keyboard for Emacs

Ask me question on patreon