-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy on intake for larger data assets #14
Comments
I think using intake is a good default, but we should probably have a couple of examples that show other ways of getting data files. |
I have an approach that I've already been doing. See #13 for instance. All that the project writer needs to do is add a dir in the test_data dir corresponding to the project |
We should make some of the examples use Intake for its own sake, but I think the anaconda-project.yml handles the typical case. |
Just to circle back. I tried to use intake for the 1-billion osm data point case in #20, and ended up bailing because the download often takes longer than the 10 minutes before a notebook cell times out. I kind of decided that for really big downloads ( ~>3GB) we probably want to just tell the user to download the file rather than handling it behind the scenes. I have set up the infrastructure to use intake on these things generally (#22). The benefit to using intake over anaconda-project download is that the download only happens at the moment it is needed, this ends up being sort of annoying in AE because the deployment doesn't download the data on deployment. |
In the end, we are mostly using anaconda-project data loading support. We separately have some intake examples. So far, this has been working well. |
The opensky notebook is an example of a topic that requires a large data file (
opensky.parq
). I would like to suggest usingintake
in such cases instead of the current approach that will need updating anyway (right now it relies on datashader's system for getting data files).The text was updated successfully, but these errors were encountered: