... | @@ -4,18 +4,6 @@ Parabix technology is a high-performance programming framework for streaming |
... | @@ -4,18 +4,6 @@ Parabix technology is a high-performance programming framework for streaming |
|
text processing applications, leveraging both SIMD and multicore
|
|
text processing applications, leveraging both SIMD and multicore
|
|
parallel processing features.
|
|
parallel processing features.
|
|
|
|
|
|
* October, 2018: Unicode Level 2 support for regex matching under canonical and compatible (?K flag) equivalence
|
|
|
|
* September, 2018: ICU 42 presentation on AVX-512/Parabix
|
|
|
|
* June, 2018: AVX-512 support for Parabix/icgrep
|
|
|
|
* June, 2018: u32u8 application
|
|
|
|
* April, 2018: File glob parsing using Parabix methods
|
|
|
|
* May 2, 2016: Check out the [wiki:ParabixOS ParabixOS] project!
|
|
|
|
* November 18-20, 2015: Join us in Zhangjiajie, China for our presentation at [http://trust.csu.edu.cn/conference/ICA3PP2015/ ICA3PP 2015].
|
|
|
|
* October 28-29, 2015: Join us at the [http://llvm.org/devmtg/2015-10/ 2015 LLVM Developers' Meeting] for Parabix-LLVM discussion.
|
|
|
|
* October 26-27, 2015: Join us at [http://www.unicodeconference.org/ Unicode Conference 39] for our presentation of Unicode regular expression matching in icgrep.
|
|
|
|
* September 2015: Look at our plans for additional Parabix regular expression facilities in the [ParabixRegexRoadMap Parabix Regular Expression Road Map]
|
|
|
|
* February 2015: Check out [wiki:ICgrep icGrep] 1.0 offering [GigabytePerSecondGrep Gigabyte Per Second Performance]!
|
|
|
|
|
|
|
|
## Parabix Transform
|
|
## Parabix Transform
|
|
|
|
|
|
The Parabix framework is based on the concept of parallel bit streams,
|
|
The Parabix framework is based on the concept of parallel bit streams,
|
... | @@ -30,3 +18,22 @@ to process 128 code unit positions at a time. |
... | @@ -30,3 +18,22 @@ to process 128 code unit positions at a time. |
|
|
|
|
|
See the [Parabix Transform](ParabixTransform) page for details.
|
|
See the [Parabix Transform](ParabixTransform) page for details.
|
|
|
|
|
|
|
|
## Alphabets, Character Classes, Unicode
|
|
|
|
|
|
|
|
The Parabix framework contains many facilities for working with character representations of various kinds.
|
|
|
|
|
|
|
|
A fundamental notion is the character class bitstream. This is a stream of bits in one-to-one correspondence
|
|
|
|
with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate
|
|
|
|
characters outside of the class. Often we use regular-expression notation to identify character classes,
|
|
|
|
such as `[abc]` for the class containing the three lower-case letters "a", "b", and "c", and `[0-9]`
|
|
|
|
as the class for decimal digits. The following example shows an input character stream and the corresponding
|
|
|
|
bit streams for the `[abc]` and `[0-9]` streams, respectively. We conventionally mark 0 bits with
|
|
|
|
periods (".") to make the 1 bits stand out.
|
|
|
|
|
|
|
|
```
|
|
|
|
input: This is just 1 abbreviated example of character stream input containing 24 instances of the [abc] class and 6 instances of the [0-9] class.
|
|
|
|
[abc]: ...............111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
|
|
|
|
[0-9]: .............1..........................................................11..................................1...................1.1........
|
|
|
|
```
|
|
|
|
|
|
|
|
Read about the [Parabix Character Class Compilers](CharacterClassCompiler) for more information. |
|
|
|
\ No newline at end of file |