... | ... | @@ -94,4 +94,4 @@ see `tools/wc/ucount.cpp`. |
|
|
Grapheme clusters are sequences of Unicode codepoints are generally considered together to represent one logical character. For example, a base character such as the letter `a` may be followed by an accent character such as ´ to produced the accented character `á`. The task of separating a stream of characters into grapheme clusters is a text segmentation problem known as the grapheme cluster boundary problem. The full Unicode rules for this are documented in
|
|
|
[UAX #29: Unicode Text Segmentation](https://unicode.org/reports/tr29/).
|
|
|
|
|
|
The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](../tools/wc/gcount.cpp) utility. |
|
|
\ No newline at end of file |
|
|
The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](https://cs-git-research.cs.sfu.ca/cameron/parabix-devel/tools/wc/gcount.cpp) utility. |
|
|
\ No newline at end of file |