-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate datasets from OpenML to Figshare #1217
Comments
also, we should mock downloads in unit tests (less urgent) |
OpenML is currently unreachable because of a cyberattack that hit TU Eindhoven. The service has been very reliable but this event is sadly out of our control. OpenML itself is not affected and we're in contact with the university IT team to bring it back soon. We have redundancy but sadly all within the tue network. In the meantime, we are preparing to set up a secondary deployment in the Dutch supercomputing center so that such an outage won't happen again. |
Hi @joaquinvanschoren , thanks very much for explaining this outage -- I hope this attack gets resolved quickly and without major consequences for the university! Indeed OpenML has been generally reliable and we're very grateful for it. However for the skrub datasets we don't really need the great features of OpenML because they are just a handful of fixed, pre-defined datasets so we basically just need a place to store a few small parquet files (and some of the datasets are already stored like this). So I still think it makes sense to have a copy of those datasets on figshare if their license allows it. Of course in any case skrub depends on scikit-learn so skrub users will always have easy access to any OpenML dataset through scikit-learn's |
I would be more in favor to have the redundancy instead of migrating. We also experience issue in the past with scikit-learn: scikit-learn/scikit-learn#28297 The resolution of the issue took several months and I would say that the resolution of the ticket was not easy. At least, it is a plus on my side when trying to resolve an issue with OpenML because @joaquinvanschoren and the team do a great job. |
Problem Description
After IRL discussions with @jeromedockes, we observed that we're still experiencing a lot of CI disruptions and errors when fetching datasets from OpenML. Figshare seems to be a more reliable alternative.
Feature Description
Move all skrub datasets to Figshare.
Alternative Solutions
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: