Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny bug : oml_id must be an int in load_openml_task #686

Merged
merged 3 commits into from
Jan 2, 2025

Conversation

SubhadityaMukherjee
Copy link
Contributor

Check if oml_id is a valid integer before trying to load the task. This sometimes does not happen by default. (Only get_suite supports string ids.)

I am not sure if this behavior is different in the new OpenML API.

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 68.16%. Comparing base (b719142) to head (ece0a6c).

Files with missing lines Patch % Lines
amlb/benchmarks/openml.py 85.71% 2 Missing ⚠️
amlb/benchmarks/file.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #686   +/-   ##
=======================================
  Coverage   68.15%   68.16%           
=======================================
  Files          54       54           
  Lines        6730     6741   +11     
=======================================
+ Hits         4587     4595    +8     
- Misses       2143     2146    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. However, I wanted to understand why sometimes there was an integer and sometimes a string. Modified/moved the logic a little bit to catch these mistakes earlier in the process to give a faster and more informative error message.

btw. you are using a "too new" version of OpenML. The one defined in requirements.txt still supports identifiers passed as a str, which is why this wasn't raised before :) but it's good future proofing anyway, I want to upgrade someday.

Comment on lines +25 to +30
try:
oml_id = int(oml_id_str)
except ValueError:
raise ValueError(
f"Could not convert OpenML id {oml_id_str!r} in {benchmark!r} to integer."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it was sometimes str and sometimes not was because when specified as an openml benchmark (openml/{s,t}/{id} it was passed as str. But when loaded from a file the ids were conventionally defined as integers.

Comment on lines +38 to +42
for task in tasks:
if task["openml_task_id"] is not None and not isinstance(
task["openml_task_id"], int
):
raise TypeError("OpenML task id for task {task.name!r} must be integer.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While tasks defined in a yaml file should be integer. Nothing stops a user from defining a str instead. So here we can give an early warning.

@PGijsbers PGijsbers merged commit 2c2a93d into openml:master Jan 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants