The Python Static Character Class Compiler
The python character class compiler charsetcompiler.py
takes a data file
of character class definitions as input and produces a set of bitwise logic equations
as output. It generates a set of Pablo equations to implement given character classes. This compiler is primarily useful as a prototype to demonstrate
the concepts of generating code to build character class bit streams from
the basis bit streams. Originally, it was used to produce the first
implementations of important lexical streams for the original Parabix XML parser.
This compiler is found in the pybix subdirectory.
For example, consider the input file "delim_and_alphanum
" consisting of the following definitions:
delimiters = [:.,;]
alphanumeric = [A-Za-z0-9]
A set of equations to compute these character classes from the eight basis
bit streams can then be produced by the running the compiler as follows.
python charset_compiler.py delim_and_alphanum
The following results are produced.
temp1 = (basis_bits.bit_0 | basis_bits.bit_1)
temp2 = (basis_bits.bit_2 &~ basis_bits.bit_3)
temp3 = (temp2 &~ temp1)
temp4 = (basis_bits.bit_4 & basis_bits.bit_5)
temp5 = (basis_bits.bit_6 | basis_bits.bit_7)
temp6 = (basis_bits.bit_6 &~ basis_bits.bit_7)
temp7 = (temp5 &~ temp6)
temp8 = (temp4 &~ temp7)
temp9 = (temp3 & temp8)
temp10 = (basis_bits.bit_2 & basis_bits.bit_3)
temp11 = (temp10 &~ temp1)
temp12 = (basis_bits.bit_4 &~ basis_bits.bit_5)
temp13 = (temp12 & basis_bits.bit_6)
temp14 = (temp11 & temp13)
delimiters = (temp9 | temp14)
temp15 = (basis_bits.bit_5 | basis_bits.bit_6)
temp16 = (basis_bits.bit_4 & temp15)
temp17 = (temp11 &~ temp16)
temp18 = (basis_bits.bit_1 &~ basis_bits.bit_0)
temp19 = (temp18 &~ basis_bits.bit_2)
temp20 = (basis_bits.bit_6 & basis_bits.bit_7)
temp21 = (basis_bits.bit_5 | temp20)
temp22 = (basis_bits.bit_4 & temp21)
temp23 = (~temp22)
temp24 = (basis_bits.bit_4 | basis_bits.bit_5)
temp25 = (temp24 | temp5)
temp26 = ((basis_bits.bit_3 & temp23)|(~(basis_bits.bit_3) & temp25))
temp27 = (temp19 & temp26)
temp28 = (temp17 | temp27)
temp29 = (temp18 & basis_bits.bit_2)
temp30 = (temp29 & temp26)
alphanumeric = (temp28 | temp30)
The python compiler supports several options which can be displayed with the following help command.
python charset_compiler.py -h