Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • Home

Last edited by cameron May 10, 2020
Page history
This is an old version of this page. You can view the most recent version or browse the history.

Home

Welcome to the Parabix Technology Home Page

Parabix technology is a high-performance programming framework for streaming text processing applications, leveraging both SIMD and multicore parallel processing features.

Programming Model: Kernels + Stream Sets = Programs

Parabix programming is based on the concepts of computational kernels operating on sets of data streams.

Data Streams and Stream Sets

Data streams are streams of data fields all of a given bit width. If the bit width is N, the type of the field is said to be iN, an integer of N bits. Bit streams are streams of type i1.

Stream sets are sets of data streams all of the same type and in one-to-one correspondence.
An 8 x i1 stream set is a set of eight parallel bit streams.
All streams in the set are of the same length and are allocated and processed together by the underlying system.

A 1 x i8 stream is a stream of bytes. Most often, Parabix programs operation read byte streams from files or other input sources, transform those streams into sets of bit streams and process those bit streams using bitwise logic and shifting.

Kernels: Stream Processing Functions

Parabix programs are assembled as sequences of kernels operating on stream sets. Kernels are generally just functions, taking stream sets as input and producing stream sets as output.

From a programmer's point of view, kernels can often be viewed as pure mathematical functions defined over stream sets. Some kernels are side-effecting, achieving computational results by calling back to standard C++ programs, for example. But many kernels have a pure mathematical result, in which the only effect is the set of output streams produced in response to the given input streams.

Parabix Transform

The Parabix framework is based on the concept of parallel bit streams, a fundamentally new transform representation of text. Byte-oriented character stream data is first transformed into eight parallel bit streams, each bit stream comprising one bit per character code unit. The byte stream is represented as a 1 x i8 stream set.
The transposed parallel bit streams comprise a 8 x i1 stream set of the same length of the basis streams. The code units of the byte stream may be ASCII characters or UTF-8 bytes, for example. The Parabix transform extracts the bits of each byte and produces separate streams for each of them. Given such a representation, the 128-bit SIMD (single-instruction multiple-data) registers of the SSE (Intel architecture SIMD technology) or Altivec (Power PC architecture) may be used to process 128 code unit positions at a time.

The transposition process is implemented by the Parabix S2P kernel (S2P stands for serial-to-parallel). See the Parabix Transform page for details.

Alphabets, Character Classes, Unicode

The Parabix framework contains many facilities for working with character representations of various kinds.

A fundamental notion is the character class bitstream. This is a stream of bits in one-to-one correspondence with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate characters outside of the class. Often we use regular-expression notation to identify character classes, such as [abc] for the class containing the three lower-case letters "a", "b", and "c", and [0-9] as the class for decimal digits. The following example shows an input character stream and the corresponding bit streams for the [abc] and [0-9] streams, respectively. We conventionally mark 0 bits with periods (".") to make the 1 bits stand out.

input:   This is just 1 abbreviated example of character stream input containing 24 instances of the [abc] class and 6 instances of the [0-9] class. 
[abc]:   ...............111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
[0-9]:   .............1..........................................................11..................................1...................1.1........

Read about the Parabix Character Class Compilers for more information.

Build and Test

The Parabix project uses the CMake build system. See the [CMake documentation at cmake.org] (https://cmake.org/documentation/) for details on CMake.

The parabix-devel/CMakeLists.txt file controls the overall build. There are generally subdirectories for each Parabix library and tool, each with their own CMakeLists.txt file.

Tests for Parabix applications are found in the parabix-devel/QA directory.

  1. For more information on testing of icgrep see greptest.
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages