Table - Csv Data Structure

> (Data|State) Management and Processing > (Data Type|Data Structure) > (Relation|Table) - Tabular data > Table - Physical Data Structure

1 - About

The CSV format is a physical representation of a relation (table).

Tabular formats are often more space-efficient than JSON, which can improve loading times for large datasets.

Advertising

3 - Syntax

While there are various specifications and implementations for the CSV format, there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files.

Csv Specification - rfc 4180

Summary:

  • Each record is located on a separate line, delimited by a line break
  • The last record in the file may or may not have an ending line break.
  • There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.
  • Within the header and each record, there may be one or more fields, separated by a delimiter character (commas).
  • Each field may or may not be enclosed in double quotes (also known as Text Qualifier)
  • Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
  • If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

3.1 - Text Qualifier

The text qualifier or better known the double quote character is used to enclose text values in the exported data.

This is required when some cell has values that include:

Advertising

3.2 - Delimiter

Within the header and each record, there may be one or more fields, separated by a delimiter character (generally commas).

3.3 - Escape character

The escape character for the text qualifier character is by default the double quote:

Example:

You are "top"

in csv becomes:

"You are ""top"""

3.4 - New Line

The newline in CSV follows this rules:

  • Each record is located on a separate line, delimited by a line break
  • The last record in the file may or may not have an ending line break.
  • A cell that has a line break in its content should be quoted

Example:

  • Newline as record separator
"You are ""top""" CRLF
  • A New line in the value of a cell needs to be quoted
"You are CRLF 
  top" CRLF
Advertising

3.5 - Extended

Extended CSV format add metadata to the data (such as data type,…)

By order of preference:

4 - Parsing

4.1 - Parser algorithm

4.2 - Library

Tool;

  • json2csv (json to xml) Converts a stream of newline separated json data to csv format,
  • csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats.

5 - Row to column storage

  • A simple way to turn a CSV file into a column-oriented (columnar) format is to save each column to a separate file. To load the data back in, read a single line from each file (column), and 'stitch' the data back together into a row.
data/type/relation/structure/csv.txt · Last modified: 2019/11/15 12:06 by gerardnico