Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: Add csv datasets from Analyze Boston #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

henrykironde
Copy link
Contributor

Out of 154 datasets we got about 47 datapackages with only CSV data
ref: #34

Out of 154 datasets we got about 47 datapackages with only csv data
ref: weecology#34
@henrykironde
Copy link
Contributor Author

currently testing the data packages.
Current issues:
Some datasets are changing daily or frequently with new random data file names.

  • Proposed solution: we check if the files are archived, we can work with monthly updates. Users can easily update a datapackage to get latest changes on their systems.
  • Encoding or errors. (scripts are made with autocreate. links are manually added. I will need to follow up by testing individual packages). Remove the encoded data and add it later

@shubhank-saxena
Copy link

@henrykironde , for the dynamic datasets that change, we can set up a Celery scheduler, which can then fetch data. And for updating and appending, we can keep changing the last state in the task_scheduler.

@henrykironde
Copy link
Contributor Author

Thanks @shubhank-saxena for the idea. We do have a retriever dashboard https://github.com/weecology/retrieverdash that could probably use this feature.
I have not used a Celery scheduler but would be nice to discus more about the details and design.

Base automatically changed from master to main February 11, 2021 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants