Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • CSVediting

CSVediting · Changes

Page history
Update CSVediting authored Nov 27, 2021 by cameron's avatar cameron
Show whitespace changes
Inline Side-by-side
Showing with 38 additions and 1 deletion
+38 -1
  • CSVediting.md CSVediting.md +38 -1
  • No files found.
CSVediting.md
View page @ e6eb43f0
......@@ -88,7 +88,9 @@ preceding comma.
## Matching and Deleting a Row
Suppose that we have a regular expression R to select CSV rows for deletion.
Suppose that we have a regular expression R to select CSV rows for deletion,
where R has no Unicode properties or other features.
The Parabix regular expression engine can be used to perform the matching and
`FilterByMask` can be used for the deletion.
......@@ -103,6 +105,41 @@ Using the ICGrepKernel to perform the matching may be implemented as follows.
P->CreateKernelCall<ICGrepKernel>(std::move(options));
```
The resulting `MatchResults` stream will have 1 bits on any matching CSV row.
To select the row, the next task is to move the matches to the line end position,
assuming that the line ends are given by `mLineBreakStream`.
```
StreamSet * const MovedMatches = P->CreateStreamSet();
P->CreateKernelCall<MatchedLinesKernel>(MatchResults, mLineBreakStream, MovedMatches);
```
We can next get a stream that is indexed by line number (1 bit per CSV row).
```
StreamSet * MatchesByLine = P->CreateStreamSet(1, 1);
FilterByMask(P, mLineBreakStream, MovedMatches, MatchesByLine);
```
LineStarts can then be identified as the positions immediately after a line break
or at the beginning of the file. These are computed by the `LineStartsKernel`.
```
StreamSet * LineStarts = E->CreateStreamSet(1, 1);
P->CreateKernelCall<LineStartsKernel>(mLineBreakStream, LineStarts);
```
The starts of the matched lines are now computed by a ```SpreadByMask```.
```
StreamSet * MatchedLineStarts = E->CreateStreamSet(1, 1);
SpreadByMask(E, LineStarts, MatchesByLine, MatchedLineStarts);
```
Now a mask for an entire matched row can be computed, using the LineSpansKernel.
```
StreamSet * MatchedLineSpans = E->CreateStreamSet(1, 1);
P->CreateKernelCall<LineSpansKernel>(MatchedLineStarts, MatchedLineEnds, MatchedLineSpans);
```
If FilterByMask was used at this point, you would get the matched rows. To delete
the matched rows, the MatchedLineSpans must be negated (use a Pablo createNot operation).
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages