Language Design


Prev		Next

CREPDL is intended to combine the best parts of Unicode regular expressions and the W3C notation. Unlike regular expressions, CREPDL can easily handle large collections. Unlike the W3C notation, CREPDL can handle grapheme clusters.

First, CREPDL allows the use of Unicode regular expressions as atomic expressions. This is done by the char element of CREPDL. Note that sequences of code points, which represent grapheme clusters, can be represented by regular expressions.

Second, CREPDL borrows mechanisms of the W3C notation with some modifications.

CREPDL allows references to collections defined in ISO/IEC 10646 and other well-known subsets. The repertoire element of CREPDL represents such references. For example, IICORE (collection 370) can be referenced by <repertoire registry="10646" number="370"/>.
CREPDL allows references to other CREPDL scripts by URIs. The ref element of CREPDL represents such references.
CREPDL provides set operation by the union, intersection, and difference elements.
CREPDL allows open collections and fixed collections by the kernel and hull elements.

The CREPDL processor has two working modes: character and graphmeCluster. If the mode is character, the CREPDL processor examines each code point in the input text stream. If the mode is graphemeCluster, the CREPDL processor extracts grapheme clusters from the text stream by applying the algorithm as defined in [6]. It then validates each grapheme cluster.

Huge well-known collections referenced by <repertoire> can be implemented by hash-based sets. Thus, the CREPDL processor can handle such collections very efficiently.

This paper does not cover details of the CREPDL language. Interested readers are encouraged to review the CD or upcoming DIS for ISO/IEC 19757-7.