A character property is a named attribute of a character that specifies how the character behaves and how it should be handled during text processing and display. Thus, character properties are an important part of specifying the character's semantics.
On the whole, Emacs follows the Unicode Standard in its implementation of character properties. In particular, Emacs supports the Unicode Character Property Model, and the Emacs character property database is derived from the Unicode Character Database (UCD). See the Character Properties chapter of the Unicode Standard, for a detailed description of Unicode character properties and their meaning. This section assumes you are already familiar with that chapter of the Unicode Standard, and want to apply that knowledge to Emacs Lisp programs.
In Emacs, each property has a name, which is a symbol, and a set of
possible values, whose types depend on the property; if a character
does not have a certain property, the value is nil. As a
general rule, the names of character properties in Emacs are produced
from the corresponding Unicode properties by downcasing them and
replacing each ‘_’ character with a dash ‘-’. For example,
Canonical_Combining_Class becomes
canonical-combining-class. However, sometimes we shorten the
names to make their use easier.
Here is the full list of value types for all the character properties that Emacs knows about:
nameName property. The
value is a string consisting of upper-case Latin letters A to Z,
digits, spaces, and hyphen ‘-’ characters.
general-categoryGeneral_Category
property. The value is a symbol whose name is a 2-letter abbreviation
of the character's classification.
canonical-combining-classCanonical_Combining_Class property.
The value is an integer number.
bidi-classBidi_Class property. The value is a
symbol whose name is the Unicode directional type of the
character.
decompositionDecomposition_Type and
Decomposition_Value properties. The value is a list, whose
first element may be a symbol representing a compatibility formatting
tag, such as small1; the other elements are characters that give the compatibility
decomposition sequence of this character.
decimal-digit-valueNumeric_Value property for
characters whose Numeric_Type is ‘Digit’. The value is an
integer number.
digitNumeric_Value property for
characters whose Numeric_Type is ‘Decimal’. The value is
an integer number. Examples of such characters include compatibility
subscript and superscript digits, for which the value is the
corresponding number.
numeric-valueNumeric_Value property for
characters whose Numeric_Type is ‘Numeric’. The value of
this property is an integer or a floating-point number. Examples of
characters that have this property include fractions, subscripts,
superscripts, Roman numerals, currency numerators, and encircled
numbers. For example, the value of this property for the character
U+2155 (vulgar fraction one fifth) is 0.2.
mirroredBidi_Mirrored property. The value
of this property is a symbol, either Y or N.
old-nameUnicode_1_Name property. The value
is a string.
iso-10646-commentISO_Comment property. The value is
a string.
uppercaseSimple_Uppercase_Mapping property.
The value of this property is a single character.
lowercaseSimple_Lowercase_Mapping property.
The value of this property is a single character.
titlecaseSimple_Titlecase_Mapping property.
Title case is a special form of a character used when the first
character of a word needs to be capitalized. The value of this
property is a single character.
This function returns the value of char's propname property.
(get-char-code-property ? 'general-category) ⇒ Zs (get-char-code-property ?1 'general-category) ⇒ Nd (get-char-code-property ?\u2084 'digit-value) ; subscript 4 ⇒ 4 (get-char-code-property ?\u2155 'numeric-value) ; one fifth ⇒ 1/5 (get-char-code-property ?\u2163 'numeric-value) ; Roman IV ⇒ \4
This function returns the description string of property prop's value, or
nilif value has no description.(char-code-property-description 'general-category 'Zs) ⇒ "Separator, Space" (char-code-property-description 'general-category 'Nd) ⇒ "Number, Decimal Digit" (char-code-property-description 'numeric-value '1/5) ⇒ nil
This function stores value as the value of the property propname for the character char.
The value of this variable is a char-table (see Char-Tables) that specifies, for each character, a symbol whose name is the script to which the character belongs, according to the Unicode Standard classification of the Unicode code space into script-specific blocks. This char-table has a single extra slot whose value is the list of all script symbols.
The value of this variable is a char-table that specifies the width of each character in columns that it will occupy on the screen.
The value of this variable is a char-table that specifies, for each character, whether it is printable or not. That is, if evaluating
(aref printable-chars char)results int, the character is printable, and if it results innil, it is not.
[1] Note that the Unicode spec writes these tag names inside ‘<..>’ brackets. The tag names in Emacs do not include the brackets; ⁖, Unicode specifies ‘<small>’ where Emacs uses ‘small’.