CREPDL is intended to combine the best parts of Unicode regular expressions and the W3C notation. Unlike regular expressions, CREPDL can easily handle large collections. Unlike the W3C notation, CREPDL can handle grapheme clusters.

First, CREPDL allows the use of Unicode regular expressions as atomic expressions. This
is done by the `char`

element of CREPDL. Note that sequences of code points,
which represent grapheme clusters, can be represented by regular expressions.

Second, CREPDL borrows mechanisms of the W3C notation with some modifications.

CREPDL allows references to collections defined in ISO/IEC 10646 and other well-known subsets. The

`repertoire`

element of CREPDL represents such references. For example,`IICORE`

(collection 370) can be referenced by`<repertoire registry="10646" number="370"/>`

.CREPDL allows references to other CREPDL scripts by URIs. The

`ref`

element of CREPDL represents such references.CREPDL provides set operation by the

`union`

,`intersection`

, and`difference`

elements.CREPDL allows open collections and fixed collections by the

`kernel`

and`hull`

elements.

The CREPDL processor has two working modes: `character`

and
`graphmeCluster`

. If the mode is `character`

, the CREPDL processor
examines each code point in the input text stream. If the mode is
`graphemeCluster`

, the CREPDL processor extracts grapheme clusters from the
text stream by applying the algorithm as defined in [6]. It then
validates each grapheme cluster.

Huge well-known collections referenced by `<repertoire>`

can be implemented
by hash-based sets. Thus, the CREPDL processor can handle such collections very
efficiently.

This paper does not cover details of the CREPDL language. Interested readers are encouraged to review the CD or upcoming DIS for ISO/IEC 19757-7.