Skip to content

Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest.

Notifications You must be signed in to change notification settings

BenedatLLC/CacheML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cache ML

Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest. Currently supported backends are local filesystem and S3.

Installation

You can install via pip:

pip install cacheml

Example Usage

Here is an example from a Jupyter notebook:

import pandas as pd
from cacheml.cache import LocalFile, Cache
cache = Cache()
@cache.cache # this function's result will be cached
def read_and_filter_commits(commits_file_obj):
    return pd.read_csv(commits_file_obj.path)
ts_all = read_and_filter_commits(LocalFile(commits.csv.gz))

Performance Test Results

There are from running the unit tests which simulate loading the time series data from datahut.ai, which is in a 216MB compressed csv file. The first case just loads into a dataframe, while the second case does some additional processing (sorting, removing entries outside a time range).

Caching results from unit test, raw dataframes
File location Time for raw df read Time for initial read and caching of file Time for cached read
Local File 134.0 130.9 0.41
S3 153.6 144.6 0.38
Caching results from unit test, procesed dataframes
File location Time for original function Time for initial read and caching of file Time for cached read
Local File 139.6 142.49 1.04
S3 153.4 155.8 0.99

Copyright

Copyright 2021 by Benedat LLC. Available under the Apache 2.0 license.

About

Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published