CSV Editing
After successfully parsing a CSV file, now let's consider how to edit it.
Deleting a column
One of the basic editing operations that we might want to support is deleting a column from all records in a file.
Suppose we want to delete the second column in every row of the following CSV data.
Data_stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Field_separators: .........1....1.........1...1........1...............1
Record_separators: ........................1............................1
The Parabix FilterByMask
operation can do this for us, if we set up a mask stream that selects all of the data except the second column and its following comma.
Data stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
To keep: 1111111111.....11111111111111.........1111111111111111
How do we calculate this mask? With the following set of operations using a
PabloBuilder pb
.
PabloAST * F1start = pb.createNot(pb.createAdvance(pb.createNot(record_separators), 1);
PabloAST * F1follow = pb.createScanTo(F1start, Field_separators);
PabloAST * F2start = pb.createAdvance(F1start, 1);
PabloAST * F2follow = pb.createScanTo(F2start, Field_separators);
PabloAST * toDelete = pb.createIntrinsicCall(pablo::Intrinsic::InclusiveSpan, {F2start, F2follow});
PabloAST * toKeep = pb.createNot(toDelete);
Data stream: Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
F1start: 1........................1............................
F1follow: .........1..................1.........................
F2start: ..........1..................1........................
F2follow: ..............1......................1................
toDelete: ..........11111..............111111111................
toKeep: 1111111111.....11111111111111.........1111111111111111