[DRAFT] Support zarr 281 #287
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Address #281 (& #239)
Zarr Data Source Support
rslearn.data_sources.zarr.ZarrDataSource/ZarrItem, enabling rslearn to treat chunked spatio‑temporal Zarr cubes as first-class raster sources. The implementation exposes both ingestion (writing chunks into the dataset tile store) and direct materialization (acting as a read-only TileStore) while reusing existing band-set selection and compositing logic.docs/DatasetConfig.md, covering required keys (store_uri, axis mapping, pixel size/origin, dtype, bands, optional chunk hints) plus an example underdocs/examples/ZarrDataSource.md. README now lists the dev extra install (uv pip install -e ".[dev]") so integration tests have their fixtures.extraextra, keeping the base install light but ensuring the Zarr source works when the extra is enabled.Testing
tests/unit/data_sources/test_zarr.py) builds an in-memory cube, exercises get_items → ingest → read_raster, and verifies serialization round-trips.Key Technical Notes
chunk_shapeoverrides; time slices default to one cube time-step per item while honoring the dataset QueryConfig.Example config.json:
{ "tile_store": { "name": "file", "root_dir": "tiles" }, "layers": { "label": { "type": "vector" }, "output": { "type": "vector" }, "era5_precip": { "type": "raster", "band_sets": [ { "dtype": "float32", "bands": ["precipitation"] } ], "data_source": { "name": "rslearn.data_sources.zarr.ZarrDataSource", "store_uri": "s3://..precipitation.zarr", "data_variable": "daily_precipitation", "crs": "EPSG:4326", "pixel_size": { "x": 0.1, "y": -0.1 }, "origin": [-130.0, -60.0], "axis_names": { "x": "x", "y": "y", "time": "time" }, "bands": ["precipitation"], "dtype": "float32", "chunk_shape": { "y": 256, "x": 256 }, "query_config": { "space_mode": "MOSAIC", "time_mode": "WITHIN", "min_matches": 0, "max_matches": 1 }, "ingest": true } } } }Note:
"ingest": falseto stream directly from the Zarr buckets without caching.Visual validation of materialized data vs raw