Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Wiki
  • csv2json

Last edited by cameron Nov 10, 2021
Page history
This is an old version of this page. You can view the most recent version or browse the history.

csv2json

CSV to JSON

Introduction

The overall structure of CSV to JSON conversion has 3 phases.

  1. Determination of the CSV to JSON translation scheme for the particular file to be converted. This scheme identifies the number of fields in each record, the corresponding field names for JSON attributes and any other parameters that govern the translation.
  2. Parsing the CSV input to determine the records and fields.
  3. Transforming the parsed input according to the scheme to produce JSON output.

Example

As a simple running example, we use the following CSV input file.

Family Name,Given Name,email
Henderson,Paul,ph@sfu.ca
Lin,Qingshan,1234@zju.edu.cn

A corresponding JSON output could be as follows.

[
{"Family Name":"Henderson","Given Name":"Paul","email":"ph@sfu.ca"},
{"Family Name":"Lin","Given Name":"Qingshan","email":"1234@zju.edu.cn"}
]

CSV to JSON Scheme

  1. First determine the number of fields for each record and their field names. Represent as a C++ string vector FieldNames such that the size of the vector is the number of fields. The list of fields could be taken from the first line of the file or supplied as a program parameter. Example: Family Name,Given Name,email

  2. Define a vector of TemplateStrings in which CSV field values will be embedded to produce the JSON output records For example, in a compact fully-quoted mode, the template strings could be as follows.

    • {"Family Name":"
    • ","Given Name":"
    • ","email":"
    • "}
  3. Define three additional strings for combining JSON records together.

    1. JSON_prefix as a string to be printed at the very beginning of JSON output (e.g., [),
    2. JSON_separator as a string to be printed in between each record (e.g., ,\n), and
    3. JSON_suffix as a string to be printed after at the end of the output (e.g. ']\n`).

CSV Parsing

CSV parsing is the process of identifying the beginning and ending of each data field in the CSV file. Following Parabix methods, we define bit streams for significant positions.

For each field, let its start position be the position of the first character in the field, and let its follow position be the position immediately after the last character of the field. For our example, we have the following, where newlines are marked by ⏎.

Data stream:     Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Field starts:    1.........1....1.........1...1........1...............
Field follows:   .........1....1.........1...1........1...............1

We also need the starts and ends of records.

Data stream:     Henderson,Paul,ph@sfu.ca⏎Lin,Qingshan,1234@zju.edu.cn⏎
Record starts:   1........................1............................
Record follows:  ........................1............................1

String Transformation

In general, the string transformation process requires the modification of the CSV input to insert the required JSON template strings. There are two steps.

  1. Expansion of the input to the correct size, inserting zeroes.
  2. Filling in template values at the inserted positions.

These two steps will be implemented using the SpreadByMask and StringReplaceKernel of the Parabix infrastructure. More information on this process will be provided later.

Clone repository
  • Bracket Matching
  • CSV Validation
  • CSVediting
  • CSVparsing
  • Character Code Compilers
  • KernelLibrary
  • Pablo
  • ParabixTransform
  • Parallel Deletion
  • Parallel Hashing
  • Performance Testing Script
  • Shuffle Pattern Library
  • StaticCCC
  • String Insertion
  • UCD: Unicode Property Database and Compilers
View All Pages