Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,16 @@ Quick links:
Setup
-----

rslearn requires Python 3.10+ (Python 3.12 is recommended).
rslearn requires Python 3.11+ (Python 3.12 is recommended).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since requires-python = ">=3.11"


```
git clone https://github.com/allenai/rslearn.git
cd rslearn
pip install .[extra]
uv venv --python 3.11
source .venv/bin/activate
uv sync
uv pip install -e ".[extra]"
uv pip install -e ".[dev]" # If running tests
```


Expand Down
44 changes: 44 additions & 0 deletions docs/DatasetConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,50 @@ Available bands:
- B10
- B11

### rslearn.data_sources.zarr.ZarrDataSource

This data source reads spatio-temporal cubes that are stored in a Zarr hierarchy. It can
either ingest items into the dataset tile store or act as the tile store itself when
`ingest` is set to false. Access to the underlying cube requires the optional
dependencies installed via `pip install rslearn[extra]`.

```jsonc
{
// Required URI pointing to the root of the Zarr store. Any fsspec-compatible URI is
// supported.
"store_uri": "s3://bucket/path/to/datacube.zarr",
// Optional variable name inside the store. If omitted, the store must contain a
// single data variable.
"data_variable": "reflectance",
// Required CRS of the cube, expressed as an EPSG code or WKT string.
"crs": "EPSG:32633",
// Required pixel size. Provide either a scalar (identical resolutions) or an object
// with explicit x and y values.
"pixel_size": 10,
// Required origin of pixel (0, 0) expressed as [min_x, max_y] in CRS units.
"origin": [500000.0, 4200000.0],
// Required mapping from conceptual axes to dimension names in the Zarr array.
"axis_names": {"x": "x", "y": "y", "time": "time", "band": "band"},
// Required list of bands. The length must match the band dimension when present.
"bands": ["B02", "B03", "B04"],
// Required numpy dtype string that matches the underlying Zarr array.
"dtype": "float32",
// Optional nodata value applied when writing tiles and returned during direct reads.
"nodata": 0.0,
// Optional override for how the cube is broken into items. Each value is the number
// of pixels per chunk along that axis.
"chunk_shape": {"y": 1024, "x": 1024},
// Optional fsspec storage options passed to xarray.open_zarr.
"storage_options": {"anon": true},
// Optional flag toggling consolidated metadata support. Defaults to true.
"consolidated": true
}
```

The Zarr data source currently creates one item per time step. When you skip ingestion
(`"ingest": false` on the layer), the source acts as a read-only tile store so windows
can be materialized directly from the Zarr cube.

### rslearn.data_sources.xyz_tiles.XyzTiles

This data source is for web xyz image tiles (slippy tiles).
Expand Down
54 changes: 54 additions & 0 deletions docs/examples/ZarrDataSource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Zarr Data Source Example

The snippet below demonstrates how to reference a spatio-temporal Zarr cube from a
raster layer. Install the optional dependencies before running the dataset workflow:

```bash
uv pip install -e ".[extra]"
```

Add a layer similar to the following in your dataset's `config.json`:

```jsonc
"sentinel2": {
"type": "raster",
"bands": [
{
"name": "B02",
"dtype": "float32",
"nodata": 0.0
},
{
"name": "B03",
"dtype": "float32",
"nodata": 0.0
},
{
"name": "B04",
"dtype": "float32",
"nodata": 0.0
}
],
"data_source": {
"name": "rslearn.data_sources.zarr.ZarrDataSource",
"store_uri": "s3://bucket/path/to/datacube.zarr",
"data_variable": "reflectance",
"crs": "EPSG:32633",
"pixel_size": 10,
"origin": [500000.0, 4200000.0],
"axis_names": {"x": "x", "y": "y", "time": "time", "band": "band"},
"bands": ["B02", "B03", "B04"],
"dtype": "float32",
"nodata": 0.0,
"chunk_shape": {"y": 1024, "x": 1024},
"storage_options": {"anon": true}
},
// Set to false to stream directly from the cube instead of ingesting.
"ingest": true
}
```

When `ingest` is left at the default `true`, run `rslearn dataset ingest` to cache each
chunk into your tile store. If you flip `ingest` to `false`, `rslearn dataset
materialize` will read the necessary portions directly from the Zarr store instead.

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,15 @@ extra = [
"planetary_computer>=1.0",
"pycocotools>=2.0",
"pystac_client>=0.9",
"rioxarray>=0.15",
"rtree>=1.4",
"s3fs==2025.3.0",
"satlaspretrain_models>=0.3",
"scipy>=1.16",
"terratorch>=1.0.2",
"transformers>=4.55",
"xarray>=2024.1",
"zarr>=2.17",
"wandb>=0.21",
]

Expand Down
3 changes: 3 additions & 0 deletions rslearn/data_sources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from rslearn.log_utils import get_logger

from .data_source import DataSource, Item, ItemLookupDataSource, RetrieveItemDataSource
from .zarr import ZarrDataSource, ZarrItem

logger = get_logger(__name__)

Expand Down Expand Up @@ -47,5 +48,7 @@ def data_source_from_config(config: LayerConfig, ds_path: UPath) -> DataSource:
"Item",
"ItemLookupDataSource",
"RetrieveItemDataSource",
"ZarrDataSource",
"ZarrItem",
"data_source_from_config",
)
Loading