CREPDL is intended to combine the best parts of Unicode regular expressions and the W3C notation. Unlike regular expressions, CREPDL can easily handle large collections. Unlike the W3C notation, CREPDL can handle grapheme clusters.
First, CREPDL allows the use of Unicode regular expressions as atomic expressions. This
is done by the char
element of CREPDL. Note that sequences of code points,
which represent grapheme clusters, can be represented by regular expressions.
Second, CREPDL borrows mechanisms of the W3C notation with some modifications.
CREPDL allows references to collections defined in ISO/IEC 10646 and other
well-known subsets. The repertoire
element of CREPDL represents such
references. For example, IICORE
(collection 370) can be referenced by
<repertoire registry="10646" number="370"/>
.
CREPDL allows references to other CREPDL scripts by URIs. The ref
element of CREPDL represents such references.
CREPDL provides set operation by the union
, intersection
,
and difference
elements.
CREPDL allows open collections and fixed collections by the kernel
and
hull
elements.
The CREPDL processor has two working modes: character
and
graphmeCluster
. If the mode is character
, the CREPDL processor
examines each code point in the input text stream. If the mode is
graphemeCluster
, the CREPDL processor extracts grapheme clusters from the
text stream by applying the algorithm as defined in [6]. It then
validates each grapheme cluster.
Huge well-known collections referenced by <repertoire>
can be implemented
by hash-based sets. Thus, the CREPDL processor can handle such collections very
efficiently.
This paper does not cover details of the CREPDL language. Interested readers are encouraged to review the CD or upcoming DIS for ISO/IEC 19757-7.