|
|
### The Python Static Character Class Compiler
|
|
|
|
|
|
The python character class compiler `charsetcompiler.py` takes a data file
|
|
|
of character class definitions as input and produces a set of bitwise logic equations
|
|
|
as output. It generates a set of Pablo equations to implement given character classes. This compiler is primarily useful as a prototype to demonstrate
|
|
|
the concepts of generating code to build character class bit streams from
|
|
|
the basis bit streams. Originally, it was used to produce the first
|
|
|
implementations of important lexical streams for the original Parabix XML parser.
|
|
|
This compiler is found in the pybix subdirectory.
|
|
|
|
|
|
For example, consider the input file "`delim_and_alphanum`" consisting of the following definitions:
|
|
|
```
|
|
|
delimiters = [:.,;]
|
|
|
alphanumeric = [A-Za-z0-9]
|
|
|
```
|
|
|
|
|
|
A set of equations to compute these character classes from the eight basis
|
|
|
bit streams can then be produced by the running the compiler as follows.
|
|
|
`python charset_compiler.py delim_and_alphanum`
|
|
|
The following results are produced.
|
|
|
```
|
|
|
temp1 = (basis_bits.bit_0 | basis_bits.bit_1)
|
|
|
temp2 = (basis_bits.bit_2 &~ basis_bits.bit_3)
|
|
|
temp3 = (temp2 &~ temp1)
|
|
|
temp4 = (basis_bits.bit_4 & basis_bits.bit_5)
|
|
|
temp5 = (basis_bits.bit_6 | basis_bits.bit_7)
|
|
|
temp6 = (basis_bits.bit_6 &~ basis_bits.bit_7)
|
|
|
temp7 = (temp5 &~ temp6)
|
|
|
temp8 = (temp4 &~ temp7)
|
|
|
temp9 = (temp3 & temp8)
|
|
|
temp10 = (basis_bits.bit_2 & basis_bits.bit_3)
|
|
|
temp11 = (temp10 &~ temp1)
|
|
|
temp12 = (basis_bits.bit_4 &~ basis_bits.bit_5)
|
|
|
temp13 = (temp12 & basis_bits.bit_6)
|
|
|
temp14 = (temp11 & temp13)
|
|
|
delimiters = (temp9 | temp14)
|
|
|
temp15 = (basis_bits.bit_5 | basis_bits.bit_6)
|
|
|
temp16 = (basis_bits.bit_4 & temp15)
|
|
|
temp17 = (temp11 &~ temp16)
|
|
|
temp18 = (basis_bits.bit_1 &~ basis_bits.bit_0)
|
|
|
temp19 = (temp18 &~ basis_bits.bit_2)
|
|
|
temp20 = (basis_bits.bit_6 & basis_bits.bit_7)
|
|
|
temp21 = (basis_bits.bit_5 | temp20)
|
|
|
temp22 = (basis_bits.bit_4 & temp21)
|
|
|
temp23 = (~temp22)
|
|
|
temp24 = (basis_bits.bit_4 | basis_bits.bit_5)
|
|
|
temp25 = (temp24 | temp5)
|
|
|
temp26 = ((basis_bits.bit_3 & temp23)|(~(basis_bits.bit_3) & temp25))
|
|
|
temp27 = (temp19 & temp26)
|
|
|
temp28 = (temp17 | temp27)
|
|
|
temp29 = (temp18 & basis_bits.bit_2)
|
|
|
temp30 = (temp29 & temp26)
|
|
|
alphanumeric = (temp28 | temp30)
|
|
|
```
|
|
|
|
|
|
The python compiler supports several options which can be displayed with the following help command.
|
|
|
`python charset_compiler.py -h` |