Skip to content

Eagerly computing results limits potential for optimization #826

Open
@hendrikmakait

Description

@hendrikmakait

Problem

By eagerly computing, e.g., when calling .head(), we limit the potential for optimizations. For example, it shouldn't matter whether I call df.head(n)[["a", "b"]] or df[["a", "b"]].head(n). With eager computations, we lose the opportunity to push the column projection before the head selection.

Apart from that, in my opinion that eager computation comes as a surprise and makes it harder to argue about when Dask actually computes things.

Proposed solution(s)

In general, limit eager computation as much as possible. For this particular example: Deprecate compute=True as default and switch to compute=False in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions