ErgoEmacsEmacsLispBlogEmacsLispBuy Tutorial

Emacs Lisp: Using thing-at-point

, , …,

This page shows you how to use emacs lisp's thing-at-point function, and discuss some of its problems, and with a suggested solution.

Purposes of Elisp Code

In coding emacs lisp, there are 2 major types of purpose.

For text processing, i also see 2 major categories:

Many emacs commands that are used every few minutes are for interactive text processing. ⁖ {comment-dwim, fill-paragraph, query-replace, kill-rectangle, sort-lines, reverse-region, list-matching-lines, delete-trailing-whitespace, indent-region, just-one-space, delete-blank-lines, downcase-region, find-file-at-point, …}.

Writing interactive commands are probably the most useful for beginning elisp coders. I have 100+ personal commands for interactive text processing. Typically, i press a key, then the text under cursor changes to another form. 〔☛ Emacs Lisp Power! Transform Text Under Cursor

thing-at-point & bounds-of-thing-at-point

When writing interactive commands, one of the most useful function is thing-at-point. Here's a excerpt from its online doc:

Return the THING at point. THING is a symbol which specifies the kind of syntactic entity you want. Possibilities include `symbol', `list', `sexp', `defun', `filename', `url', `email', `word', `sentence', `whitespace', `line', `page' and others.

Here's a example.

(defun test ()
  "get current word."
  (interactive)
  (message "%s" (thing-at-point 'word))
  )

thing-at-point lets you get the string of the {current word, current line, current sentence, paragraph, file, URL, …}, that's under cursor. Without it, you typically have to code about 5 to 10 lines, using functions such as {search-forward, skip-chars-forward} to find the boundary. Then you need to set the positions to variables. Then call buffer-substring to get the string. Also, you need to wrap the whole thing with save-excursion so that the cursor does not jump to unexpected places when your command is finished.

bounds-of-thing-at-point

A associated function is bounds-of-thing-at-point. It returns the positions of the unit under cursor. This is useful because: sometimes you also need to know a thing's boundary, because you may need to delete it (using (delete-region ‹position1› ‹position2›)) and replace it with some transformed string.

Here's a example.

(defun test ()
  "example of using `bounds-of-thing-at-point'"
  (interactive)
  (let (boundaries pos1 pos2)
    (setq boundaries (bounds-of-thing-at-point 'word) )
    (setq pos1 (car boundaries) )
    (setq pos2 (cdr boundaries) )
    (message "thing begin at 「%s」, end at 「%s」, thing is 「%s」"
             pos1 pos2 (buffer-substring-no-properties pos1 pos2))
    )
  )

Problems of thing-at-point

After using thing-at-point for several years, i started to get slightly annoyed by some of its problems. Here's the problems i see:

Behavior Dependents on Syntax Table

When you call (thing-at-point 'word), what string you get exactly depends on the syntax table of the current mode.

For example, if you always want your “word” to mean any alphanumeric plus hyphen, you can't rely on thing-at-point to give you the right thing, because it may include underscore, or may not include hyphen, or may include apostrophe, depending on the current major mode's syntax table.

This problem also applies for “'sentence”, “'paragraph”.

Inconsistent Behavior for 「'line」

When you call (thing-at-point 'line), it will return the line with the newline character. However, if the line is at the end of buffer, then no newline is included.

This means you have to do extra code to add or truncate the last char of the line.

If you want the line, use this instead: (buffer-substring-no-properties (line-beginning-position) (line-end-position)). 〔☛ All About Processing Lines in Emacs Lisp

Problems with Grabbing 「'url」

What thing-at-point returns is not necessarily the exact text under cursor.

For example, when the URL you want to grab does not start with “http”, it adds it. ⁖ if the text under cursor is xahlee_org/emacs/elisp.html , it'll return http://xahlee_org/emacs/elisp.html. This is annoying.

Sometimes i just want to grab a sequence of chars that may be file path or URL, in a HTML file text such as href="my_cat.html" or href="http://example/my_cat.html". You do not know which in advance, but after you got the thing you can test it by checking for “http” or other things. But if you use thing-at-point with 'filename or 'url, it does things to the string that you didn't expect.

(thing-at-point 'url) gets confused if the URL contains parenthesis. ⁖ http://en.wikipedia.org/wiki/Oz_(programming_language). (fixed in emacs 23.2)

test code

Here's a simple test code to see what thing-at-point returns.

(defun xx ()
  "temp function for testing what `thing-at-point' returns"
  (interactive)
  (let (myresult)
    (setq myresult (thing-at-point 'url))
    (message "%s" myresult)
    ))

Get Text Selection or Unit at Current Cursor Position

Starting with emacs 23.x, text selection is highlighted by default. (this means: transient-mark-mode is on by default. 〔☛ Emacs: What's Region, Active Region, transient-mark-mode?〕) There's a new user interface idiom. When there is a text selection, the command will act on the text selection. Otherwise, the command acts on the current word, line, paragraph, buffer, …, whichever is appropriate for the command. This is great because users don't have to think about whether to call the “-region” version of the command. 〔☛ New Features in Emacs 23

When you write a command to do this, the code typically looks like this:

;; get current selection or word
(let (bds p1 p2 inputStr resultStr)

  ;; get boundary
  (if (region-active-p)
      (setq bds (cons (region-beginning) (region-end) ))
      (setq bds (bounds-of-thing-at-point 'word)) )
  (setq p1 (car bds) )
  (setq p2 (cdr bds) )

  ;; grab the string
  (setq inputStr (buffer-substring-no-properties p1 p2)  )

  ;; do something with inputStr here

  (delete-region p1 p2 ) ; delete the region
  (insert resultStr) ; insert new string
 )

It takes about 6 lines to get the boundary and the string. If you are grabbing line, then you need few more lines to check EOL.

Alternative Solution: “get-selection-or-unit” & “unit-at-cursor”

Because i need to grab the text so often, i got tired of repeatedly writing these 10 or so lines. I wrote a function that does this. See: Emacs Lisp: get-selection-or-unit.

blog comments powered by Disqus