Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • String Insertion

Last edited by cameron Mar 03, 2023
Page history

String Insertion

String Insertion

The string insertion problem is the problem of inserting a fixed string at all selected locations in a text stream.

Consider the string insertion problem to insert the string pre into a data stream according to a stream marking insert positions with one bits. For example, consider the data stream below together with a mask stream marking insertion positions.

data:   He pared the apple and then added selected spices.
mask:   ...1..............................1...............

The stream resulting from insertion of the fixed string pre before the marked positions is:

result: He prepared the apple and then added preselected spices.

Insertion Method Overview

To solve the string insertion problem, using Parabix methods, we use a process consisting of the following steps.

  1. Transpose the data stream to Parabix form.
  2. Insert zeroes into the data stream to make room for inserted data.
  3. Create a repeating stream set with fixed string to be inserted.
  4. Spread the repeating stream data to align with the inserted zeroes.
  5. Merge the stream sets together.
  6. Perform inverse Parabix transposition to produce the final output.

Insertion of Zeroes

For insertion of the string pre in marked positions in the file, we first need to create a string insertion bixnum that shows the number of zeroes to insert at each position. A bixnum is a set of bit streams giving the binary representation of the number of bits to insert. In the case of insertion of pre, the number of bits to insert is 3 and the binary representation is 11. So our bixnum B for insertion is:

data:   He pared the apple and then added selected spices.

B[1]    ...1..............................1...............
B[0]    ...1..............................1...............

Note that the number of bits to be inserted at each position is 0 for all positions except those in the mask.

The bixnum B is created as follows:

std::vector<unsigned> insertAmts = {3};
StreamSet * B = E->CreateStreamSet(2, 1);
E->CreateKernelCall<ZeroInsertBixNum>(insertAmts, mask, B);

Using this bixnum, we can now create a spread mask for spreading out the data.

StreamSet * SpreadMask = InsertionSpreadMask(E, InsertBixNum, InsertPosition::Before);

The resulting SpreadMask is

spread: 111...11111111111111111111111111111111...111111111111111

The SpreadByMask operation can then use this mask to spread out the original basis data to produce a new expanded stream set with the zeroes inserted.

StreamSet * ExpandedBasis = E->CreateStreamSet(8);
SpreadByMask(E, SpreadMask, BasisBits, ExpandedBasis);

Repeating Stream Set Creation.

std::vector<uint64_t> insertBytes;
for (auto ch : "pre") {
    insertBytes.push_back(static_cast<uint64_t>(ch));
}
StreamSet * RepeatingBasis = CreateRepeatingBixNum(E, 8, insertBytes);

This will create an infinite set of 8 bit streams with the transposed representation of "pre" repeating indefinitely.

The code for CreateRepeatingBixNum can be found in the csv2json.cpp file.

Spreading the Repeated Stream and Merging

Once we have the repeated stream, we can use SpreadByMask to spread it out to positions associated with the inserted zeroes in our ExpandedBasis. In this case the spread mask used must be the inverse of the spread mask used to spread the data stream. That is:

invmsk: ...111................................111...............

After spreading out the repeated pre bits, we can then use simple bitwise logic to merge this data with ExpandedBasis.

The result will have the correct insertions of the string performed in the resulting transposed data stream.

Using the P2S_Kernel, we can transform this back to a byte stream and generate it to stdout.

Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages