This page shows you how to use emacs lisp's
thing-at-point function, and discuss some of its problems, and with a suggested solution.
In coding emacs lisp, there are 2 major types of purpose.
For text processing, i also see 2 major categories:
Many emacs commands that are used every few minutes are for interactive text processing.
Writing interactive commands are probably the most useful for beginning elisp coders. I have 100+ personal commands for interactive text processing. Typically, i press a key, then the text under cursor changes to another form. 〔☛ Emacs Lisp Power! Transform Text Under Cursor〕
When writing interactive commands, one of the most useful function is
thing-at-point. Here's a excerpt from its online doc:
Return the THING at point. THING is a symbol which specifies the kind of syntactic entity you want. Possibilities include `symbol', `list', `sexp', `defun', `filename', `url', `email', `word', `sentence', `whitespace', `line', `page' and others.
thing-at-point basically lets you get the string of the current word,
current line, current sentence, paragraph, file, URL, etc, that's under
cursor. Without it, you typically have to code about 5 to 10 lines, using
skip-chars-forward to find
the boundary. Then you need to set the positions to variables. Then call
buffer-substring to get the string. Also, you need to wrap the whole thing with
save-excursion so that the cursor does not jump to unexpected places when your command is finished.
A associated function is
bounds-of-thing-at-point. This is useful because: sometimes you also need to know a thing's boundary, because you may need to delete it (using
(delete-region ‹position1› ‹position2›)) and replace it with some transformed string.
thing-at-point for several years, i started to get slightly annoyed by some of its problems. Here's the problems i see:
When you call
(thing-at-point 'word), what string you get exactly depends on the syntax table of the current mode.
For example, if you always want your “word” to mean any alphanumeric plus hyphen, you can't rely on
thing-at-point to give you the right thing, because it may include underscore, or may not include hyphen, or may include apostrophe, depending on the current major mode's syntax table.
This problem also applies for “'sentence”, “'paragraph”.
When you call
(thing-at-point 'line), it will return the line with the end of line (EOL) character. However, if the line is at the end of buffer, then no EOL is included.
This means you have to do extra code to add or truncate the last char of the line.
If you want the line, use this instead:
(buffer-substring-no-properties (line-beginning-position) (line-end-position)).
〔☛ All About Processing Lines in Emacs Lisp〕
thing-at-point returns is not necessarily the exact text under cursor.
For example, when the URL you want to grab does not start with “http”, it adds it.
⁖ if the text under cursor is
xahlee_org/emacs/elisp.html , it'll return
http://xahlee_org/emacs/elisp.html. This is annoying.
Sometimes i just want to grab a sequence of chars that may be file path or URL, in a HTML file text such as
href="http://example/my_cat.html". You do not know which in advance, but after you got the thing you can test it by checking for “http” or other things. But if you use
'url, it does things to the string that you didn't expect.
(thing-at-point 'url) gets confused if the URL contains parenthesis. ⁖
(fixed in emacs 23.2)
Here's a simple test code to see what
(defun xx () "temp function for testing what `thing-at-point' returns" (interactive) (let (myresult) (setq myresult (thing-at-point 'url)) (message "〔%s〕" myresult) ))
Starting with emacs 23.x, text selection is highlighted by default. (this means:
transient-mark-mode is on by default. 〔☛ Emacs: What's Region, Active Region, transient-mark-mode?〕) There's a new user interface idiom. When there is a text selection, the command will act on the text selection. Otherwise, the command acts on the current word, line, paragraph, buffer, …, whichever is appropriate for the command. This is great because users don't have to think about whether to call the “-region” version of the command. 〔☛ New Features in Emacs 23〕
When you write a command to do this, the code typically looks like this:
;; get current selection or word (let (bds p1 p2 inputStr resultStr) ;; get boundary (if (region-active-p) (setq bds (cons (region-beginning) (region-end) )) (setq bds (bounds-of-thing-at-point 'word)) ) (setq p1 (car bds) ) (setq p2 (cdr bds) ) ;; grab the string (setq inputStr (buffer-substring-no-properties p1 p2) ) ;; do something with inputStr here (delete-region p1 p2 ) ; delete the region (insert resultStr) ; insert new string )
It takes about 6 lines to get the boundary and the string. If you are grabbing line, then you need few more lines to check EOL.
Because i need to grab the text so often, i got tired of repeatedly writing these 10 or so lines. I wrote a function that does this. See: Emacs Lisp: get-selection-or-unit.