Policy on intake for larger data assets #14

jlstevens · 2019-05-21T20:21:43Z

The opensky notebook is an example of a topic that requires a large data file (opensky.parq). I would like to suggest using intake in such cases instead of the current approach that will need updating anyway (right now it relies on datashader's system for getting data files).

The text was updated successfully, but these errors were encountered:

jbednar · 2019-05-21T20:23:03Z

I think using intake is a good default, but we should probably have a couple of examples that show other ways of getting data files.

jsignell · 2019-05-22T19:11:19Z

I have an approach that I've already been doing. See #13 for instance. All that the project writer needs to do is add a dir in the test_data dir corresponding to the project

jbednar · 2019-05-22T19:15:49Z

We should make some of the examples use Intake for its own sake, but I think the anaconda-project.yml handles the typical case.

jsignell · 2019-05-28T13:52:52Z

Just to circle back. I tried to use intake for the 1-billion osm data point case in #20, and ended up bailing because the download often takes longer than the 10 minutes before a notebook cell times out. I kind of decided that for really big downloads ( ~>3GB) we probably want to just tell the user to download the file rather than handling it behind the scenes. I have set up the infrastructure to use intake on these things generally (#22). The benefit to using intake over anaconda-project download is that the download only happens at the moment it is needed, this ends up being sort of annoying in AE because the deployment doesn't download the data on deployment.

ppwadhwa · 2020-10-05T15:47:03Z

In the end, we are mostly using anaconda-project data loading support. We separately have some intake examples. So far, this has been working well.

ppwadhwa closed this as completed Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy on intake for larger data assets #14

Policy on intake for larger data assets #14

jlstevens commented May 21, 2019

jbednar commented May 21, 2019

jsignell commented May 22, 2019 •

edited

Loading

jbednar commented May 22, 2019

jsignell commented May 28, 2019

ppwadhwa commented Oct 5, 2020

Policy on intake for larger data assets #14

Policy on intake for larger data assets #14

Comments

jlstevens commented May 21, 2019

jbednar commented May 21, 2019

jsignell commented May 22, 2019 • edited Loading

jbednar commented May 22, 2019

jsignell commented May 28, 2019

ppwadhwa commented Oct 5, 2020

jsignell commented May 22, 2019 •

edited

Loading