Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pre-commit in the CI #1666

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

e10e3
Copy link
Contributor

@e10e3 e10e3 commented Jan 16, 2025

Because pre-commit only checks edited files, side effects can cause errors to appear in other files and go unnoticed. On the contrary, the CI's role is to guarantee the soundness of the entire codebase; only checking what changed is insufficient.

This PR replaces pre-commit with an explicit call to the linters (in CI only, pre-commit is still available as a hook).

This process discovered previously silent typing errors.
If anything, we lose pre-commit's interface to select file types: the files must now be given as command arguments.

Closes #1665

@e10e3
Copy link
Contributor Author

e10e3 commented Jan 16, 2025

@MaxHalford Do you remember why you added kwargs handling in simulate_qa?

river/river/stream/qa.py

Lines 145 to 146 in 5531ae5

for i, (x, y, *kwargs) in enumerate(dataset):
kwargs = kwargs[0] if kwargs else None

I'm a bit puzzled by it, no dataset from River yields any additional argument.

@MaxHalford
Copy link
Member

Hey 👋

I believe it was for progressive validation... But I'd need to check deeper. Do tests pass if you remove it?

@e10e3
Copy link
Contributor Author

e10e3 commented Jan 16, 2025

I may have found why: some datasets like MovieLens100K have an optional parameter to unpack attributes.

if self.unpack_user_and_item:
for x, y in X_y:
user = x.pop("user")
item = x.pop("item")
yield x, y, {"user": user, "item": item}

As you wrote in an example, this mechanism is used by recommender models.

All good then.

@e10e3
Copy link
Contributor Author

e10e3 commented Jan 16, 2025

It looks like the tests timed out while fetching a remote dataset. I didn't touch that!

@MaxHalford
Copy link
Member

Indeed, it was for the user_id and item_id fields in the recsys models. Good catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-commit does not report MyPy errors
2 participants