Welcome to the Parabix Technology Home Page
Parabix technology is a high-performance programming framework for streaming text processing applications, leveraging both SIMD and multicore parallel processing features.
Parabix Transform
The Parabix framework is based on the concept of parallel bit streams, a fundamentally new transform representation of text. Byte-oriented character stream data is first transformed into eight parallel bit streams, each bit stream comprising one bit per character code unit. Code units may be ASCII characters or UTF-8 bytes, for example, with one parallel bit stream defined for each of bit 0 through bit 7 of each code unit. Given such a representation, the 128-bit SIMD (single-instruction multiple-data) registers of the SSE (Intel architecture SIMD technology) or Altivec (Power PC architecture) may be used to process 128 code unit positions at a time.
See the Parabix Transform page for details.
Alphabets, Character Classes, Unicode
The Parabix framework contains many facilities for working with character representations of various kinds.
A fundamental notion is the character class bitstream. This is a stream of bits in one-to-one correspondence
with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate
characters outside of the class. Often we use regular-expression notation to identify character classes,
such as [abc]
for the class containing the three lower-case letters "a", "b", and "c", and [0-9]
as the class for decimal digits. The following example shows an input character stream and the corresponding
bit streams for the [abc]
and [0-9]
streams, respectively. We conventionally mark 0 bits with
periods (".") to make the 1 bits stand out.
input: This is just 1 abbreviated example of character stream input containing 24 instances of the [abc] class and 6 instances of the [0-9] class.
[abc]: ...............111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
[0-9]: .............1..........................................................11..................................1...................1.1........
Read about the Parabix Character Class Compilers for more information.
Build and Test
The Parabix project uses the CMake build system. See the [CMake documentation at cmake.org] (https://cmake.org/documentation/) for details on CMake.
The parabix-devel/CMakeLists.txt file controls the overall build. There are generally subdirectories for each Parabix library and tool, each with their own CMakeLists.txt file.
Tests for Parabix applications are found in the parabix-devel/QA directory.
- For more information on testing of icgrep see greptest.