Welcome to the Parabix Technology Home Page
Parabix technology is a high-performance programming framework for streaming text processing applications, leveraging both SIMD and multicore parallel processing features.
Parabix Transform
The Parabix framework is based on the concept of parallel bit streams, a fundamentally new transform representation of text. Byte-oriented character stream data is first transformed into eight parallel bit streams, each bit stream comprising one bit per character code unit. Code units may be ASCII characters or UTF-8 bytes, for example, with one parallel bit stream defined for each of bit 0 through bit 7 of each code unit. Given such a representation, the 128-bit SIMD (single-instruction multiple-data) registers of the SSE (Intel architecture SIMD technology) or Altivec (Power PC architecture) may be used to process 128 code unit positions at a time.
See the Parabix Transform page for details.
Alphabets, Character Classes, Unicode
The Parabix framework contains many facilities for working with character representations of various kinds.
A fundamental notion is the character class bitstream. This is a stream of bits in one-to-one correspondence
with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate
characters outside of the class. Often we use regular-expression notation to identify character classes,
such as [abc]
for the class containing the three lower-case letters "a", "b", and "c", and [0-9]
as the class for decimal digits. The following example shows an input character stream and the corresponding
bit streams for the [abc]
and [0-9]
streams, respectively. We conventionally mark 0 bits with
periods (".") to make the 1 bits stand out.
input: This is just 1 abbreviated example of character stream input containing 24 instances of the [abc] class and 6 instances of the [0-9] class.
[abc]: ...............111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
[0-9]: .............1..........................................................11..................................1...................1.1........
Read about the Parabix Character Class Compilers for more information.