Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • String Insertion

String Insertion · Changes

Page history
Create String Insertion authored Mar 03, 2023 by cameron's avatar cameron
Show whitespace changes
Inline Side-by-side
Showing with 108 additions and 0 deletions
+108 -0
  • String-Insertion.md String-Insertion.md +108 -0
  • No files found.
String-Insertion.md 0 → 100644
View page @ 94c42bdb
## String Insertion
The string insertion problem is the problem of inserting a fixed string at
all selected locations in a text stream.
Consider the string insertion problem to insert the string `pre` into a data
stream according to a stream marking insert positions with one bits.
For example, consider the data stream below together with a mask stream marking
insertion positions.
```
data: He pared the apple and then added selected spices.
mask: ...1..............................1...............
```
The stream resulting from insertion of the fixed string `pre` before the marked positions is:
```
result: He prepared the apple and then added preselected spices.
```
## Insertion Method Overview
To solve the string insertion problem, using Parabix methods, we use a
process consisting of the following steps.
1. Transpose the data stream to Parabix form.
1. Insert zeroes into the data stream to make room for inserted data.
1. Create a repeating stream set with fixed string to be inserted.
1. Spread the repeating stream data to align with the inserted zeroes.
1. Merge the stream sets together.
1. Perform inverse Parabix transposition to produce the final output.
## Insertion of Zeroes
For insertion of the string `pre` in marked positions in the file, we
first need to create a string insertion bixnum that shows the number of
zeroes to insert at each position. A bixnum is a set of bit streams giving
the binary representation of the number of bits to insert. In the case
of insertion of `pre`, the number of bits to insert is 3 and the binary
representation is `11`. So our bixnum B for insertion is:
```
data: He pared the apple and then added selected spices.
B[1] ...1..............................1...............
B[0] ...1..............................1...............
```
Note that the number of bits to be inserted at each position is
0 for all positions except those in the mask.
The bixnum B is created as follows:
```
std::vector<unsigned> insertAmts = {3};
StreamSet * B = E->CreateStreamSet(2, 1);
E->CreateKernelCall<ZeroInsertBixNum>(insertAmts, mask, B);
```
Using this bixnum, we can now create a spread mask for spreading out the data.
```
StreamSet * SpreadMask = InsertionSpreadMask(E, InsertBixNum, InsertPosition::Before);
```
The resulting SpreadMask is
```
spread: 111...11111111111111111111111111111111...111111111111111
```
The SpreadByMask operation can then use this mask to spread out the
original basis data to produce a new expanded stream set with the zeroes
inserted.
```
StreamSet * ExpandedBasis = E->CreateStreamSet(8);
SpreadByMask(E, SpreadMask, BasisBits, ExpandedBasis);
```
## Repeating Stream Set Creation.
```
std::vector<uint64_t> insertBytes;
for (auto ch : "pre") {
insertBytes.push_back(static_cast<uint64_t>(ch));
}
StreamSet * RepeatingBasis = CreateRepeatingBixNum(E, 8, insertBytes);
```
This will create an infinite set of 8 bit streams with the transposed representation
of "pre" repeating indefinitely.
The code for `CreateRepeatingBixNum` can be found in the `csv2json.cpp` file.
## Spreading the Repeated Stream and Merging
Once we have the repeated stream, we can use SpreadByMask to spread
it out to positions associated with the inserted zeroes in our `ExpandedBasis`.
In this case the spread mask used must be the inverse of the spread mask
used to spread the data stream. That is:
```
invmsk: ...111................................111...............
```
After spreading out the repeated `pre` bits, we can then use simple
bitwise logic to merge this data with `ExpandedBasis`.
The result will have the correct insertions of the string performed
in the resulting transposed data stream.
Using the `P2S_Kernel`, we can transform this back to a byte stream and generate it to stdout.
\ No newline at end of file
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages