parquet instead of pickle?

Sorry for being nosy - I am curious about all the columnar technologies in an actual analysis context like Hgg :)

I saw that `pickle` is mentioned/used in several places. You might try parquet instead (`df.to_parquet()`, `pd.read_parquet()`). It's essentially the industry version of ROOT, so it'll be much faster at serializing/deserializing than pickle. Pickle is probably faster without compression, but if you use `df.to_pickle("blah.pkl.gz")`, parquet will be faster. Of course this all depends on how big your files are.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parquet instead of pickle? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parquet instead of pickle? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions