### A Notation for Character Collections for the WWW

“A Notation for Character Collections for the WWW” [9]
(hereafter W3C notation for short) provides an XML syntax for describing subsets. Although
it has not become a W3C recommendation and has not been implemented, it has a number of
interesting ideas.

The W3C notation does not use regular expressions. Rather, it introduces XML elements
(`range`

and `enum`

) for representing ranges and code points,
respectively.

An interesting feature of the W3C notation is its `kernel`

and
`hull`

elements. They are used to define open collections.

Unlike regular expressions, the W3C notation is equipped with a mechanism that
references other subset descriptions or well-known subsets (e.g., collections in ISO/IEC
10646). This notation can thus easily describe subsets defined in terms of other
subsets.

The W3C notation also has set operations (union, inverse, difference, and intersection).
They allow subsets to be defined in terms of other subsets.

However, the W3C notation lacks mechanisms for describing grapheme clusters.