A grapheme cluster [6] is a sequence of code points that represents“user-perceived characters”. A simple example is a base character followed by a combining character.
CONTEMPORARY LITHUANIAN LETTERS
(collection 284) is the first collection
containing grapheme clusters such as <004A, 0303> and <0069, 0307, 0301>. Note that
0303 is allowed to follow some code points (e.g, 004A), but is not allowed to follow
others (e.g., 004B).
MOJI-JOHO-KIBAN IDEOGRAPHS-2016
(collection 390) is a collection
applicable to persons' names in Japanese public service. This collection contains grapheme
clusters such as <5289,E0101> and <5351,FE00>, where E0101 is an ideographic
variation selector and FE00 is a variation selector. Although E0101 is allowed to follow
5289, it is not allowed to follow other characters (5288, for example).
The size of CONTEMPORARY LITHUANIAN LETTERS
is much smaller than that of
MOJI-JOHO-KIBAN IDEOGRAPHS-2016
. The number of code points and grapheme
clusters in CONTEMPORARY LITHUANIAN LETTERS
(collection 284) is less than
100. But the number of code points in MOJI-JOHO-KIBAN IDEOGRAPHS-2016
is more
than 52000 and that of grapheme clusters is more than 10000.