Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 8
    • Issues 8
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • UCD: Unicode Property Database and Compilers

UCD: Unicode Property Database and Compilers · Changes

Page history
Update UCD: Unicode Property Database and Compilers authored May 22, 2024 by cameron's avatar cameron
Hide whitespace changes
Inline Side-by-side
Showing with 1 addition and 1 deletion
+1 -1
  • UCD:-Unicode-Property-Database-and-Compilers.md UCD:-Unicode-Property-Database-and-Compilers.md +1 -1
  • No files found.
UCD:-Unicode-Property-Database-and-Compilers.md
View page @ 46d03367
...@@ -94,4 +94,4 @@ see `tools/wc/ucount.cpp`. ...@@ -94,4 +94,4 @@ see `tools/wc/ucount.cpp`.
Grapheme clusters are sequences of Unicode codepoints are generally considered together to represent one logical character. For example, a base character such as the letter `a` may be followed by an accent character such as ´ to produced the accented character `á`. The task of separating a stream of characters into grapheme clusters is a text segmentation problem known as the grapheme cluster boundary problem. The full Unicode rules for this are documented in Grapheme clusters are sequences of Unicode codepoints are generally considered together to represent one logical character. For example, a base character such as the letter `a` may be followed by an accent character such as ´ to produced the accented character `á`. The task of separating a stream of characters into grapheme clusters is a text segmentation problem known as the grapheme cluster boundary problem. The full Unicode rules for this are documented in
[UAX #29: Unicode Text Segmentation](https://unicode.org/reports/tr29/). [UAX #29: Unicode Text Segmentation](https://unicode.org/reports/tr29/).
The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](../tools/wc/gcount.cpp) utility. The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](https://cs-git-research.cs.sfu.ca/cameron/parabix-devel/tools/wc/gcount.cpp) utility.
\ No newline at end of file \ No newline at end of file
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages