Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly computing results limits potential for optimization #826

Open
hendrikmakait opened this issue Jan 31, 2024 · 5 comments
Open

Eagerly computing results limits potential for optimization #826

hendrikmakait opened this issue Jan 31, 2024 · 5 comments

Comments

@hendrikmakait
Copy link
Member

hendrikmakait commented Jan 31, 2024

Problem

By eagerly computing, e.g., when calling .head(), we limit the potential for optimizations. For example, it shouldn't matter whether I call df.head(n)[["a", "b"]] or df[["a", "b"]].head(n). With eager computations, we lose the opportunity to push the column projection before the head selection.

Apart from that, in my opinion that eager computation comes as a surprise and makes it harder to argue about when Dask actually computes things.

Proposed solution(s)

In general, limit eager computation as much as possible. For this particular example: Deprecate compute=True as default and switch to compute=False in the future.

@phofl
Copy link
Collaborator

phofl commented Jan 31, 2024

With "eagerly computing" you are referring to stuff like head, tail, Len, ...., correct? Just to avoid confusion with the set_index space

@hendrikmakait
Copy link
Member Author

Yes, I loosely refer to everything that calls compute() under the hood to retrieve a result, not to compute data required to perform the actual delayed computation.

@hendrikmakait hendrikmakait changed the title Eagerly computing limits potential for optimization Eagerly computing results limits potential for optimization Jan 31, 2024
@crusaderky
Copy link
Collaborator

IMHO it depends.
head() is meant as a quick and dirty development tool. I don't see a problem with it being eager.
An eager len() is, on the other hand, catastrophic as there's a strong user expectation for it to be always trivial.

@hendrikmakait
Copy link
Member Author

Why would head be considered a "quick and dirty" development tool? IMO, there are plenty of use cases where I just want to compute the top K of something.

Honestly, I don't care too much about len in the context of optimization because I don't see any use cases where it would prohibit further optimization even if it's executed eagerly.

@crusaderky
Copy link
Collaborator

I guess that's just my coding habits. Yes, if head() is used in production code it should definitely be lazy by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants