Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • csv2json

csv2json · Changes

Page history
Update csv2json authored May 15, 2020 by cameron's avatar cameron
Hide whitespace changes
Inline Side-by-side
Showing with 49 additions and 4 deletions
+49 -4
  • csv2json.md csv2json.md +49 -4
  • No files found.
csv2json.md
View page @ 39be34ce
......@@ -171,11 +171,56 @@ In order to do this, we first need to number the marked positions according to t
We create a BixNum stream set for the positions at which insertions.
For the positions marking the starts of each field, we assign consecutive numbers starting with 0.
This can be achieved using the pablo `EveryNth` operation.
This can be achieved using the pablo `EveryNth` operation, where N is the number of fields in the CSV records.
The BixNum values for other template strings are calculated using bitwise logic over the `FilteredMarks`.
Given these numbered values, we can use the Character Class Compiler to compute the bits of another BixNum
representing the number of 0 bits to insert.
Given these BixNum values for the template strings, we next want to compute the another BixNum representing
the number of 0 bits to insert at each position. This can be achieved by the Parabix utility
`StringInsertBixNum`.
`SpreadByMask` then does the expansion.
Given the number of zeroes to insert at selected positions, `InsertionSpreadMask` computes a mask that
actually has 0 bits inserted at all the desired positions and 1 bits everywhere else. This mask can
be used by the `SpreadByMask` operation to compute the `ExpandedBasisBits`.
### Filling in Template Values
The final step is to use the `StringReplaceKernel` to fill in template values.
As a guide to this entire process, it may be useful to refer to the icgrep colorization code, which
does the insertion of color escape sequences for matched strings.
```
std::string ESC = "\x1B";
std::vector<std::string> colorEscapes = {ESC + "[01;31m" + ESC + "[K", ESC + "[m"};
unsigned insertLengthBits = 4;
StreamSet * const InsertBixNum = E->CreateStreamSet(insertLengthBits, 1);
E->CreateKernelCall<StringInsertBixNum>(colorEscapes, InsertMarks, InsertBixNum);
//E->CreateKernelCall<DebugDisplayKernel>("InsertBixNum", InsertBixNum);
StreamSet * const SpreadMask = InsertionSpreadMask(E, InsertBixNum, InsertPosition::Before);
//E->CreateKernelCall<DebugDisplayKernel>("SpreadMask", SpreadMask);
// For each run of 0s marking insert positions, create a parallel
// bixnum sequentially numbering the string insert positions.
StreamSet * const InsertIndex = E->CreateStreamSet(insertLengthBits);
E->CreateKernelCall<RunIndex>(SpreadMask, InsertIndex, nullptr, /*invert = */ true);
//E->CreateKernelCall<DebugDisplayKernel>("InsertIndex", InsertIndex);
StreamSet * FilteredBasis = E->CreateStreamSet(8, 1);
E->CreateKernelCall<S2PKernel>(Filtered, FilteredBasis);
// Baais bit streams expanded with 0 bits for each string to be inserted.
StreamSet * ExpandedBasis = E->CreateStreamSet(8);
SpreadByMask(E, SpreadMask, FilteredBasis, ExpandedBasis);
//E->CreateKernelCall<DebugDisplayKernel>("ExpandedBasis", ExpandedBasis);
// Map the match start/end marks to their positions in the expanded basis.
StreamSet * ExpandedMarks = E->CreateStreamSet(2);
SpreadByMask(E, SpreadMask, InsertMarks, ExpandedMarks);
StreamSet * ColorizedBasis = E->CreateStreamSet(8);
E->CreateKernelCall<StringReplaceKernel>(colorEscapes, ExpandedBasis, SpreadMask, ExpandedMarks, InsertIndex, ColorizedBasis);
StreamSet * ColorizedBytes = E->CreateStreamSet(1, 8);
E->CreateKernelCall<P2SKernel>(ColorizedBasis, ColorizedBytes);
```
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages