Open
Description
Problem
By eagerly computing, e.g., when calling .head()
, we limit the potential for optimizations. For example, it shouldn't matter whether I call df.head(n)[["a", "b"]]
or df[["a", "b"]].head(n)
. With eager computations, we lose the opportunity to push the column projection before the head selection.
Apart from that, in my opinion that eager computation comes as a surprise and makes it harder to argue about when Dask actually computes things.
Proposed solution(s)
In general, limit eager computation as much as possible. For this particular example: Deprecate compute=True
as default and switch to compute=False
in the future.
Metadata
Metadata
Assignees
Labels
No labels