A Text Editor Feature: Extend Selection by Semantic Unit

By Xah Lee. Date: . Last updated: .

This article introduces a feature in the Mathematica IDE, that could be useful in any editor and for any language.

In Mathematica, a user can press a key 【Ctrl+.】, and the token the cursor is on will be selected (highlighted). When the key is pressed again, the selection expands to highlight the next smallest semantic unit. When the key is pressed again, it extends further.

Example with Mathematica Syntax

Here is a example of Mathematica code with highlights showing its extend selection behavior, starting at the “n” inside the braces, extend outwards to cover higher level syntactical unit.

Table[n/(n + 1), {n, 1, 10, 1/2}]

Here are some different scenarios showing the extend behavior for different starting cursor positions:

Table[n/(n + 1), {n, 1, 10, 1/2}]
Table[n/(n + 1), {n, 1, 10, 1/2}]
Table[n/(n + 1), {n, 1, 10, 1/2}]

(If the above does not render well in older browser, see a rendered image here: syntax_highlight_mma.png.

Examples for C-Like Syntax

Here's some examples on a language with C-like syntax (C, C++, C#, Java, JavaScript, and others).

class PrintMe {public main(String[] args) {print("Nice!");}}
class PrintMe {public main(String[] args) {print("Nice!");}}
class PrintMe {public main(String[] args) {print("Nice!");}}
/* print out a string */
/* print out a string */


Nested Syntax Examples: XML

For a language with nested syntax, suppose we have this XML example:

  <title>Gulliver's Travels</title>
  <summary>Annotated a chapter of Gulliver's Travels</summary>
  <link rel="alternate" href="../p/Gullivers_Travels/gt3ch05.html"/>


If the cursor is inside a tag's enclosing content, say, on the letter T in the string “Gulliver's Travels” inside the <title> tag, then the repeated extension is obvious. But, suppose the cursor is at t in the “alternate” inside the “link” tag, then it would first select the whole “alternate” word, then expand to the double quotes “"alternate"”, then the whole property “rel="alternate"”, then the whole link tag, then the whole content of the entry tag, then including the <entry> tags itself.

Lisp Example

For the lisp, the language syntax is almost a pure nested parentheses (exceptions are chars such as ; ' , @ | # that have special syntactical meanings). Here's some example on how this feature would work in lisp.

(defun insertMe () (interactive) (insert "«»") (backward-char 1))
(defun insertMe () (interactive) (insert "«»") (backward-char 1))
(defun insertMe () (interactive) (insert "«»") (backward-char 1))
(defun insertMe () (interactive) (insert "«»") (backward-char 1))


Note: emacs's lisp mode provides several functions to traverse nested syntax: backward-sexp, forward-sexp, backward-up-list, down-list, backward-list, forward-list, mark-sexp. Effectively, it is relatively trivial to implement the above extend-selection-semantic-unit function. You just need to call one of the sexp walking function to move the cursor to the right place, then call mark-sexp.


In summary, this extend selection feature is a lexical syntax tree walker. Each invocation will go up one level on the syntax tree and select all its branches.

Ideally, the editor includes a full parser for the language, and is able to use the parser to fully read in source code and regenerate it on the fly for the purposes of reformatting the code. However, it is important to note that is only the ideal situation. A full parser in emacs include for languages elisp, XML (nxml mode), JavaScript (js2 mode), but not for most languages. Also, parsers often discard comments, thus is not usable. Also, parsers expect the code to be valid, so cannot be used for formatting code that are being edited.

A more practical solution is to have the algorithm base on text processing approach using a simple lexical scanner. (which is in fact the case in all emacs's language modes, even for elisp mode.)

Emacs Implementation

For the lisp case, it's easy to implement. See a implementation at Emacs: Select Line, between Quotes, Extend Selection.

For the case of languages with C syntax, a practical solution that works 99% of the time should be easy. The selection will extend somewhat like the following sequence:

Elisp system has many functions that already understand each of these syntactical units. It's not difficult to put the whole together.

For the XML case, with its regular nested syntax of start/end tags where the start tag may contain tokens in sequence, one may need a bit more work than the lisp case, but there's already a full XML parser in the nxml mode.

Solution: expand-region mode

Magnar Sveen has written expand-region mode. See: https://github.com/magnars/expand-region.el

Like it? Buy Xah Emacs Tutorial. Thanks.

or, buy something from Best Keyboard for Emacs