|
|
|
## String Insertion
|
|
|
|
|
|
|
|
The string insertion problem is the problem of inserting a fixed string at
|
|
|
|
all selected locations in a text stream.
|
|
|
|
|
|
|
|
Consider the string insertion problem to insert the string `pre` into a data
|
|
|
|
stream according to a stream marking insert positions with one bits.
|
|
|
|
For example, consider the data stream below together with a mask stream marking
|
|
|
|
insertion positions.
|
|
|
|
|
|
|
|
```
|
|
|
|
data: He pared the apple and then added selected spices.
|
|
|
|
mask: ...1..............................1...............
|
|
|
|
```
|
|
|
|
|
|
|
|
The stream resulting from insertion of the fixed string `pre` before the marked positions is:
|
|
|
|
```
|
|
|
|
result: He prepared the apple and then added preselected spices.
|
|
|
|
```
|
|
|
|
|
|
|
|
## Insertion Method Overview
|
|
|
|
|
|
|
|
To solve the string insertion problem, using Parabix methods, we use a
|
|
|
|
process consisting of the following steps.
|
|
|
|
|
|
|
|
1. Transpose the data stream to Parabix form.
|
|
|
|
1. Insert zeroes into the data stream to make room for inserted data.
|
|
|
|
1. Create a repeating stream set with fixed string to be inserted.
|
|
|
|
1. Spread the repeating stream data to align with the inserted zeroes.
|
|
|
|
1. Merge the stream sets together.
|
|
|
|
1. Perform inverse Parabix transposition to produce the final output.
|
|
|
|
|
|
|
|
## Insertion of Zeroes
|
|
|
|
|
|
|
|
For insertion of the string `pre` in marked positions in the file, we
|
|
|
|
first need to create a string insertion bixnum that shows the number of
|
|
|
|
zeroes to insert at each position. A bixnum is a set of bit streams giving
|
|
|
|
the binary representation of the number of bits to insert. In the case
|
|
|
|
of insertion of `pre`, the number of bits to insert is 3 and the binary
|
|
|
|
representation is `11`. So our bixnum B for insertion is:
|
|
|
|
|
|
|
|
```
|
|
|
|
data: He pared the apple and then added selected spices.
|
|
|
|
|
|
|
|
B[1] ...1..............................1...............
|
|
|
|
B[0] ...1..............................1...............
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that the number of bits to be inserted at each position is
|
|
|
|
0 for all positions except those in the mask.
|
|
|
|
|
|
|
|
The bixnum B is created as follows:
|
|
|
|
```
|
|
|
|
std::vector<unsigned> insertAmts = {3};
|
|
|
|
StreamSet * B = E->CreateStreamSet(2, 1);
|
|
|
|
E->CreateKernelCall<ZeroInsertBixNum>(insertAmts, mask, B);
|
|
|
|
```
|
|
|
|
|
|
|
|
Using this bixnum, we can now create a spread mask for spreading out the data.
|
|
|
|
```
|
|
|
|
StreamSet * SpreadMask = InsertionSpreadMask(E, InsertBixNum, InsertPosition::Before);
|
|
|
|
```
|
|
|
|
The resulting SpreadMask is
|
|
|
|
```
|
|
|
|
spread: 111...11111111111111111111111111111111...111111111111111
|
|
|
|
```
|
|
|
|
|
|
|
|
The SpreadByMask operation can then use this mask to spread out the
|
|
|
|
original basis data to produce a new expanded stream set with the zeroes
|
|
|
|
inserted.
|
|
|
|
|
|
|
|
```
|
|
|
|
StreamSet * ExpandedBasis = E->CreateStreamSet(8);
|
|
|
|
SpreadByMask(E, SpreadMask, BasisBits, ExpandedBasis);
|
|
|
|
```
|
|
|
|
|
|
|
|
## Repeating Stream Set Creation.
|
|
|
|
|
|
|
|
```
|
|
|
|
std::vector<uint64_t> insertBytes;
|
|
|
|
for (auto ch : "pre") {
|
|
|
|
insertBytes.push_back(static_cast<uint64_t>(ch));
|
|
|
|
}
|
|
|
|
StreamSet * RepeatingBasis = CreateRepeatingBixNum(E, 8, insertBytes);
|
|
|
|
```
|
|
|
|
|
|
|
|
This will create an infinite set of 8 bit streams with the transposed representation
|
|
|
|
of "pre" repeating indefinitely.
|
|
|
|
|
|
|
|
The code for `CreateRepeatingBixNum` can be found in the `csv2json.cpp` file.
|
|
|
|
|
|
|
|
## Spreading the Repeated Stream and Merging
|
|
|
|
|
|
|
|
Once we have the repeated stream, we can use SpreadByMask to spread
|
|
|
|
it out to positions associated with the inserted zeroes in our `ExpandedBasis`.
|
|
|
|
In this case the spread mask used must be the inverse of the spread mask
|
|
|
|
used to spread the data stream. That is:
|
|
|
|
```
|
|
|
|
invmsk: ...111................................111...............
|
|
|
|
```
|
|
|
|
|
|
|
|
After spreading out the repeated `pre` bits, we can then use simple
|
|
|
|
bitwise logic to merge this data with `ExpandedBasis`.
|
|
|
|
|
|
|
|
The result will have the correct insertions of the string performed
|
|
|
|
in the resulting transposed data stream.
|
|
|
|
|
|
|
|
Using the `P2S_Kernel`, we can transform this back to a byte stream and generate it to stdout. |
|
|
|
\ No newline at end of file |