

TABULAR DATA HOW TO
Microsoft Excel and Apple Numbers don’t even agree on how to interpret some edge cases for CSV. Writing a parser to handle all the different dialects is not at all trivial. Use of separators and line endings are inconsistent (sometimes comma, sometimes semicolon). It is difficult to parse efficiently using multiple cores, due to the quoting (you can’t start parsing from part way through a file). One quote in the wrong place and the file is invalid. And they are all highly sub-optimal for the job.ĬSV is a mess.

Most tabular data currently gets exchanged as: CSV, Tab separated, XML, JSON or Excel. There doesn’t seem to be anything that is reasonably space efficient, simple and quick to parse and text based (not binary) so you can view and edit it with a standard editor.

It is an important part of my data transformation software.
TABULAR DATA SOFTWARE
I support reading and writing tabular data in various formats in all 3 of my software application. Thus a Tabular Data Package MUST be a Data Package and conform to the Data Package specification.Tabular data is everywhere. Tabular Data Package builds directly on the Data Package specification. Text-based, no need for proprietary tools etc (Rufus For similar reasons it means that the format supports streaming.ĬSV is the data Kalashnikov: not pretty, but many wars have beenĬSV is the ultimate simple, standard data format - streamable, It is line-oriented which means it can be incrementally processed - you do not need to read an entire file to extract a single row.It is text-based and therefore amenable to manipulation and access from a wide range of standard tools (including revision control systems such as git, mercurial and subversion).It is widely supported - practically every spreadsheet program, relational database and programming language in existence can handle CSV in some form or other.It is open and the “standard” is well-known.Most data structures are either tabular or can be transformed to a tabular structure by some form of normalization CSV is very simple – it is possibly the most simple data format.We chose CSV as the data format for the Tabular Data Package specification because: Reuse of existing work including other Frictionless Data specificationsĪs suggested by the name, Tabular Data Package extends and specializes the Data Package spec for the specific case where the data is tabular.Single JSON file (datapackage.json) to describe the dataset including a schema for data files.CSV (comma separated variables) for data files.The key features of this format are the following: In addition, the format is focused on data that can be presented in a tabular structure and in making it easy to produce (and consume) tabular data packages from spreadsheets and relational databases. The format’s focus is on simplicity and ease of use, especially online. Tabular Data Package is a simple container format used for publishing and sharing tabular-style data. The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 # Introduction

A simple format for describing tabular-style data for publishing and sharing.
