Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 8
    • Issues 8
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • CSVediting

CSVediting · Changes

Page history
Update CSVediting authored Nov 10, 2021 by cameron's avatar cameron
Hide whitespace changes
Inline Side-by-side
Showing with 35 additions and 5 deletions
+35 -5
  • CSVediting.md CSVediting.md +35 -5
  • No files found.
CSVediting.md
View page @ b797ac47
...@@ -6,10 +6,40 @@ After successfully [parsing](CSVparsing) a CSV file, now let's consider how to e ...@@ -6,10 +6,40 @@ After successfully [parsing](CSVparsing) a CSV file, now let's consider how to e
One of the basic editing operations that we might want to support is deleting a column from all records in a file. One of the basic editing operations that we might want to support is deleting a column from all records in a file.
Suppose we want to delete the second column in the following CSV data. Suppose we want to delete the second column in every row of the following CSV data.
```
Data_stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Field_separators: .........1....1.........1...1........1...............1
Record_separators: ........................1............................1
```
The Parabix `FilterByMask` operation can do this for us, if we set up a mask stream that selects all of the data except the second column and its following comma.
```
Data stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
To keep: 1111111111.....11111111111111.........1111111111111111
```
How do we calculate this mask? With the following set of operations using a
`PabloBuilder pb`.
```
PabloAST * F1start = pb.createNot(pb.createAdvance(pb.createNot(record_separators), 1);
PabloAST * F1follow = pb.createScanTo(F1start, Field_separators);
PabloAST * F2start = pb.createAdvance(F1start, 1);
PabloAST * F2follow = pb.createScanTo(F2start, Field_separators);
PabloAST * toDelete = pb.createIntrinsicCall(pablo::Intrinsic::InclusiveSpan, {F2start, F2follow});
PabloAST * toKeep = pb.createNot(toDelete);
```
```
Data stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
F1start: 1........................1............................
F1follow: .........1..................1.........................
F2start: ..........1..................1........................
F2follow: ..............1......................1................
toDelete: ..........11111..............111111111................
toKeep: 1111111111.....11111111111111.........1111111111111111
``` ```
Data stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Field starts: 1.........1....1.........1...1........1...............
Field follows: .........1....1.........1...1........1...............1
```
\ No newline at end of file
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages