Skip to content

Add --dask-graph and fix prt bp graph#41

Merged
JamesMcClung merged 11 commits into
mainfrom
dask-graph
May 28, 2026
Merged

Add --dask-graph and fix prt bp graph#41
JamesMcClung merged 11 commits into
mainfrom
dask-graph

Conversation

@JamesMcClung

Copy link
Copy Markdown
Owner

Add the --dask-graph option, which constructs a pipeline and shows its dask graph instead of running it. Useful for diagnostics, and revealed a bug in the prt.bp files wherein time-stacking was opaque to the optimizer and reads of every component couldn't be ruled out.

JamesMcClung and others added 11 commits May 22, 2026 11:44
Co-Authored-By: Claude <noreply@anthropic.com>
Factor out of get_animation so the upcoming --dask-graph path
can reuse it.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Optional dependency on the graphviz Python package, needed by
dask.visualize for --dask-graph.

Co-Authored-By: Claude <noreply@anthropic.com>
The old approach (xr.open_dataset(chunks=...).to_dask_dataframe() with a
map_partitions lambda to add the t column) had two problems:

1. to_dask_dataframe creates a separate dask Array per variable, so every
   file was read once per column regardless of downstream projection.
2. The opaque map_partitions(lambda) blocked dask-expr's column-projection
   optimizer from pushing the downstream Projection through to the reads,
   so even columns that were projected away were still read.

dd.from_map supplies a `columns=` kwarg to the partition function when the
optimizer wants to project, so unused variables are never read from disk.
Chunking is preserved by iterating (path, time, particle_dim, slice) tuples
sized by CONFIG.dask_chunk_size.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@JamesMcClung JamesMcClung added bug Something isn't working enhancement New feature or request optimization Improves performance testing Adds or improves tests labels May 28, 2026
@JamesMcClung JamesMcClung merged commit 0842168 into main May 28, 2026
2 checks passed
@JamesMcClung JamesMcClung deleted the dask-graph branch May 28, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request optimization Improves performance testing Adds or improves tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant