-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We can't properly deal with delayed inputs #777
Comments
This introduced some big performance regressions in our framework that makes heavy use of We're getting around it for now by using the legacy dataframe directly. |
Do you have a reproducer? I suspect that this is actually something different that we can address more easily |
Simple code snippet.py:
Starting from a fresh virtual environment:
|
Thx, that's helpful put up a fix here: #1048 |
I suspect there is more than one operation that is being slowed by the addition of dask-expr. Using a similar reproducer as before:
3x slowdown, unchanged by #1048
|
Yeah this is a know limitation unfortunately. Going to delayed objects roundtrips through the legacy implementation, which materialises the graph. That causes the slowdown here. Improvements are certainly very welcome |
Every argument could theoretically be a delayed object, similar to how every argument could be a dask-expr collection, we can't deal with this yet since we never check for them
I created a very naive implementation of a delayed expression
_Delayed
to capture some things, but we should think critically about it before moving ahead with this issue.I suggest that we unpack delayed object similar to what we do with collections, in the constructor of
Expr
The text was updated successfully, but these errors were encountered: