Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • Parallel Deletion

Last edited by cameron Jul 13, 2021
Page history

Parallel Deletion

Parallel Deletion

Basic Problem

Given a stream of data elements and a corresponding deletion mask bit stream, the deletion problem is to compute a compressed bit stream consisting of only the nondeleted elements.

For example, consider the following bit stream and deletion mask:

abcdefghijklmnopqrstuvwxyz
...1....11..1.1111.1......

The compressed bit stream resulting from deletion of positions marked by 1 bits is the following:

abcefghklnsuvwxyz

Problem Variations

  1. Apply a single deletion mask to a single bit stream.
  2. Apply a single deletion mask to each individual bit stream in a set of N parallel bit streams.
  3. Apply a deletion mask to a stream of larger units, for example a stream of bytes.

In the event that N bit streams are processed, it may be possible to reduce the overall cost by sharing preprocessing overhead.

Overview of Parallel Deletion Methods

Intra-Field Deletion vs. Field-by-Field Processing

When implementing parallel deletion operations on a data stream, we first perform deletions within fields of a particular size (intra-field parallel deletion) and then arrange to join together the results of adjacent fields.

For example, working with 128-bit SSE registers with 128-bit fields, we may delete 7 positions in block A, and 23 positions in block B. Given these results, we typically then want to join the partial blocks together to form a continuous data stream.

The field granularity for these tasks may be the full block width, or a smaller field width that is more convenient for processing.

One method is to always maintain a partial block of pending results. When a new partial block is produced, we transfer as many positions as possible into the pending block. If we fill it, then we emit the full block and move the remaining data from the new block to become the new pending block.

In the Parabix framework, two separate kernels are used for these two tasks. The FieldCompress kernel performs intra-field compression, while the StreamCompress kernel perform the field-by-field compression of partial data fields. The FilterByMask routine combines conveniently generates the necessary kernel calls to perform both tasks.

Intra-Field Parallel Deletion

There are several different methods that can be used to implement parallel deletion within SIMD registers. Depending on the problem variation, it is often beneficial to combine methods.

  1. Parallel prefix deletion.
  2. SIMD deletion by left-result induction.
  3. SIMD deletion by central result induction.
  4. Deletion using PEXT bit manipulation instructions.
  5. Deletion by permutation vector computation.

Full Field Deletion

When the ultimate goal is to produce streams of bytes, doublebytes, or other units, then it is often beneficial to limit the bit stream deletion work to smaller field sizes. For example, suppose that a stream of bytes is to be produced after application of parallel deletion to a the corresponding 8 parallel bit streams. Working with 128-bit registers, each register will ultimately hold at most 16 bytes. Then we may proceed as follows:

  1. Perform parallel deletion within 16-bit fields, so that nondeleted bits are leftmost in each such field.
  2. Perform the inverse Parabix transform to byte stream form, producing a squence of 8 register values in which the nondeleted bytes are leftmost in each register.
  3. Sequentially write the register values to the output stream, advancing the output stream pointer by the count of nondeleted bytes for each register in turn.

Summary of Deletion Methods

Parallel Prefix Deletion

This method uses only bitwise logic and shifting to perform deletion. For deletion within 2k bit fields, k steps are required. Each step i involves application of a mask to select of bits to move 2i positions. 4 operations are required per step. However, the calculation of masks requires k2 preprocessing steps. However, when applied to deletion of data from several parallel bit streams, the preprocessing cost is shared across the streams.

Deletion by Left Result Induction

In this method, deletion steps ensure that results are leftmost within fields of size 2i at step i. The process involves SIMD multiplication operations at each step.

Deletion by Central Result Induction

In this method, deletion steps produce results that span or touch the centre position of each 2(i+1) bit field at step i. The process involves SIMD rotate operations at each step.

Deletion by PEXT

Intel introduced the PEXT instruction with its Haswell processors. This operation uses a mask to select nondeleted bits of a 64-bit value and compress them together leftmost in a 64-bit result. Thus it performs parallel deletion in a single step, with a mask that is the inverse of the deletion mask.

Deletion by Permutation Vector Computation

This method is suitable for performing deletion at byte or larger granularities, given a processor supporting shuffle or permutation operations with variable selectors.

Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages