Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • CSVediting

Last edited by cameron Feb 22, 2023
Page history
This is an old version of this page. You can view the most recent version or browse the history.

CSVediting

CSV Editing

After successfully parsing a CSV file, now let's consider how to edit it.

Deleting a column

One of the basic editing operations that we might want to support is deleting a column from all records in a file.

Suppose we want to delete the second column in every row of the following CSV data.

Data_stream:         Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Field_separators:    .........1....1.........1...1........1...............1
Record_separators:   ........................1............................1

The Parabix FilterByMask operation can do this for us, if we set up a mask stream that selects all of the data except the second column and its following comma.

Data stream:         Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
To keep:             1111111111.....11111111111111.........1111111111111111

How do we calculate this mask? With the following set of operations using a PabloBuilder pb.

PabloAST * F1start = pb.createNot(pb.createAdvance(pb.createNot(record_separators), 1);
PabloAST * F1follow = pb.createScanTo(F1start, Field_separators);
PabloAST * F2start = pb.createAdvance(F1start, 1);
PabloAST * F2follow = pb.createScanTo(F2start, Field_separators);
PabloAST * toDelete = pb.createIntrinsicCall(pablo::Intrinsic::InclusiveSpan, {F2start, F2follow});
PabloAST * toKeep = pb.createNot(toDelete);
Data stream:         Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
F1start:             1........................1............................
F1follow:            .........1..................1.........................
F2start:             ..........1..................1........................
F2follow:            ..............1......................1................
toDelete:            ..........11111..............111111111................
toKeep:              1111111111.....11111111111111.........1111111111111111
Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages