Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • StaticCCC

Last edited by cameron May 22, 2024
Page history

StaticCCC

The Python Static Character Class Compiler

The python character class compiler charsetcompiler.py takes a data file of character class definitions as input and produces a set of bitwise logic equations as output. It generates a set of Pablo equations to implement given character classes. This compiler is primarily useful as a prototype to demonstrate the concepts of generating code to build character class bit streams from the basis bit streams. Originally, it was used to produce the first implementations of important lexical streams for the original Parabix XML parser. This compiler is found in the pybix subdirectory.

For example, consider the input file "delim_and_alphanum" consisting of the following definitions:

delimiters = [:.,;]
alphanumeric = [A-Za-z0-9]

A set of equations to compute these character classes from the eight basis bit streams can then be produced by the running the compiler as follows. python charset_compiler.py delim_and_alphanum The following results are produced.

	temp1 = (basis_bits.bit_0 | basis_bits.bit_1)
	temp2 = (basis_bits.bit_2 &~ basis_bits.bit_3)
	temp3 = (temp2 &~ temp1)
	temp4 = (basis_bits.bit_4 & basis_bits.bit_5)
	temp5 = (basis_bits.bit_6 | basis_bits.bit_7)
	temp6 = (basis_bits.bit_6 &~ basis_bits.bit_7)
	temp7 = (temp5 &~ temp6)
	temp8 = (temp4 &~ temp7)
	temp9 = (temp3 & temp8)
	temp10 = (basis_bits.bit_2 & basis_bits.bit_3)
	temp11 = (temp10 &~ temp1)
	temp12 = (basis_bits.bit_4 &~ basis_bits.bit_5)
	temp13 = (temp12 & basis_bits.bit_6)
	temp14 = (temp11 & temp13)
	delimiters = (temp9 | temp14)
	temp15 = (basis_bits.bit_5 | basis_bits.bit_6)
	temp16 = (basis_bits.bit_4 & temp15)
	temp17 = (temp11 &~ temp16)
	temp18 = (basis_bits.bit_1 &~ basis_bits.bit_0)
	temp19 = (temp18 &~ basis_bits.bit_2)
	temp20 = (basis_bits.bit_6 & basis_bits.bit_7)
	temp21 = (basis_bits.bit_5 | temp20)
	temp22 = (basis_bits.bit_4 & temp21)
	temp23 = (~temp22)
	temp24 = (basis_bits.bit_4 | basis_bits.bit_5)
	temp25 = (temp24 | temp5)
	temp26 = ((basis_bits.bit_3 & temp23)|(~(basis_bits.bit_3) & temp25))
	temp27 = (temp19 & temp26)
	temp28 = (temp17 | temp27)
	temp29 = (temp18 & basis_bits.bit_2)
	temp30 = (temp29 & temp26)
	alphanumeric = (temp28 | temp30)

The python compiler supports several options which can be displayed with the following help command. python charset_compiler.py -h

Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages