Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 8
    • Issues 8
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • UCD: Unicode Property Database and Compilers

UCD: Unicode Property Database and Compilers · Changes

Page history
Update UCD: Unicode Property Database and Compilers authored May 22, 2024 by cameron's avatar cameron
Show whitespace changes
Inline Side-by-side
Showing with 1 addition and 2 deletions
+1 -2
  • UCD:-Unicode-Property-Database-and-Compilers.md UCD:-Unicode-Property-Database-and-Compilers.md +1 -2
  • No files found.
UCD:-Unicode-Property-Database-and-Compilers.md
View page @ 842c2f39
The Unicode consortium defines a database of character properties for Unicode characters,
as documented in [UAX #44: Unicode Character Database](https://unicode.org/reports/tr44/).
Parabix has built-in support for these properties.
......@@ -95,4 +94,4 @@ see `tools/wc/ucount.cpp`.
Grapheme clusters are sequences of Unicode codepoints are generally considered together to represent one logical character. For example, a base character such as the letter `a` may be followed by an accent character such as ´ to produced the accented character `á`. The task of separating a stream of characters into grapheme clusters is a text segmentation problem known as the grapheme cluster boundary problem. The full Unicode rules for this are documented in
[UAX #29: Unicode Text Segmentation](https://unicode.org/reports/tr29/).
The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](tools/wc/gcount.cpp).
\ No newline at end of file
The logic for computing grapheme cluster boundaries with Parabix methods is illustrated by the [gcount](../tools/wc/gcount.cpp) utility.
\ No newline at end of file
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages