Skip to content

Data formats

Leon du Toit edited this page Mar 3, 2014 · 9 revisions

This section is not about data types - such as integers, floating point numbers, strings and dates. It is about common data storage and interchange formats.

Delimited files

The workhorses of data analysis are delimited files - tab and comma-separated files often referred to as TSV and CSV files. This should be familiar to almost anyone who has read this far, so I won't spend much time on it. Both python and R have good tools for reading and writing from such files.

File IO with python pandas

Let's look at reading and writing CSV files with pandas. Suppose we have a small CSV file:

$ touch myfile.csv
$ echo "day, sleep" >> myfile.csv
import pandas as pd
doshit

File IO with R

JSON

js object notation

the JSON-stat format for tabular data.

tools, jq etc

Clone this wiki locally