-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPC-H] Simplify Dask queries and avoid .query
and .apply
#1335
Conversation
@@ -839,6 +841,7 @@ def query_16(dataset_path, fs): | |||
supplier = dd.read_parquet(dataset_path + "supplier", filesystem=fs) | |||
|
|||
supplier["is_complaint"] = supplier.s_comment.str.contains("Customer.*Complaints") | |||
# FIXME: We have to compute this early because passing a `dask_expr.Series` to `isin` is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XREF: dask/dask-expr#834
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could rewrite to a merge, but not blocking
Honestly, I'm pretty confused by CI, but I don't touch any of these tests in here, so this appears to be unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya noticed the int errors in the other PR. Agree unreleated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, feel free to tackle in a follow up
No description provided.