Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,16 @@ Quick links:
Setup
-----

rslearn requires Python 3.10+ (Python 3.12 is recommended).
rslearn requires Python 3.11+ (Python 3.12 is recommended).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since requires-python = ">=3.11"


```
git clone https://github.com/allenai/rslearn.git
cd rslearn
pip install .[extra]
uv venv --python 3.11
source .venv/bin/activate
uv sync
uv pip install -e ".[extra]"
uv pip install -e ".[dev]" # If running tests
```


Expand Down
45 changes: 45 additions & 0 deletions docs/DatasetConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,51 @@ Available bands:
- B10
- B11

### rslearn.data_sources.zarr.ZarrDataSource

This data source reads spatio-temporal cubes that are stored in a Zarr hierarchy. It can
either ingest items into the dataset tile store or act as the tile store itself when
`ingest` is set to false. Access to the underlying cube requires the optional
dependencies installed via `pip install rslearn[extra]`.

```jsonc
{
// Required URI pointing to the root of the Zarr store. Any fsspec-compatible URI is
// supported.
"store_uri": "s3://bucket/path/to/datacube.zarr",
// Optional variable name inside the store. If omitted, the store must contain a
// single data variable.
"data_variable": "reflectance",
// Required CRS of the cube, expressed as an EPSG code or WKT string.
"crs": "EPSG:32633",
// Required pixel size. Provide either a scalar (identical resolutions) or an object
// with explicit x and y values.
"pixel_size": 10,
// Required origin of pixel (0, 0) expressed as [min_x, max_y] in CRS units.
"origin": [500000.0, 4200000.0],
// Required mapping from conceptual axes to dimension names in the Zarr array.
"axis_names": {"x": "x", "y": "y", "time": "time", "band": "band"},
// Required list of bands. The length must match the band dimension when present.
"bands": ["B02", "B03", "B04"],
// Required numpy dtype string that matches the underlying Zarr array.
"dtype": "float32",
// Optional nodata value applied when writing tiles and returned during direct reads.
"nodata": 0.0,
// Optional override for how the cube is broken into items. Each value is the number
// of pixels per chunk along that axis.
"chunk_shape": {"y": 1024, "x": 1024},
// Optional fsspec storage options passed to xarray.open_zarr.
"storage_options": {"anon": true},
// Optional flag toggling consolidated metadata support. Defaults to true.
"consolidated": true
}
```

The Zarr data source currently creates one item per time step. When you skip ingestion
(`"ingest": false` on the layer), the source acts as a read-only tile store so windows
can be materialized directly from the Zarr cube.

=======
### rslearn.data_sources.worldcover.WorldCover

This data source is for the ESA WorldCover 2021 land cover map.
Expand Down
54 changes: 54 additions & 0 deletions docs/examples/ZarrDataSource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Zarr Data Source Example

The snippet below demonstrates how to reference a spatio-temporal Zarr cube from a
raster layer. Install the optional dependencies before running the dataset workflow:

```bash
uv pip install -e ".[extra]"
```

Add a layer similar to the following in your dataset's `config.json`:

```jsonc
"sentinel2": {
"type": "raster",
"bands": [
{
"name": "B02",
"dtype": "float32",
"nodata": 0.0
},
{
"name": "B03",
"dtype": "float32",
"nodata": 0.0
},
{
"name": "B04",
"dtype": "float32",
"nodata": 0.0
}
],
"data_source": {
"name": "rslearn.data_sources.zarr.ZarrDataSource",
"store_uri": "s3://bucket/path/to/datacube.zarr",
"data_variable": "reflectance",
"crs": "EPSG:32633",
"pixel_size": 10,
"origin": [500000.0, 4200000.0],
"axis_names": {"x": "x", "y": "y", "time": "time", "band": "band"},
"bands": ["B02", "B03", "B04"],
"dtype": "float32",
"nodata": 0.0,
"chunk_shape": {"y": 1024, "x": 1024},
"storage_options": {"anon": true}
},
// Set to false to stream directly from the cube instead of ingesting.
"ingest": true
}
```

When `ingest` is left at the default `true`, run `rslearn dataset ingest` to cache each
chunk into your tile store. If you flip `ingest` to `false`, `rslearn dataset
materialize` will read the necessary portions directly from the Zarr store instead.

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,15 @@ extra = [
"planetary_computer>=1.0",
"pycocotools>=2.0",
"pystac_client>=0.9",
"rioxarray>=0.15",
"rtree>=1.4",
"s3fs==2025.3.0",
"satlaspretrain_models>=0.3",
"scipy>=1.16",
"terratorch>=1.0.2",
"transformers>=4.55",
"xarray>=2024.1",
"zarr>=2.17",
"wandb>=0.21",
]

Expand Down
3 changes: 3 additions & 0 deletions rslearn/data_sources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from rslearn.log_utils import get_logger

from .data_source import DataSource, Item, ItemLookupDataSource, RetrieveItemDataSource
from .zarr import ZarrDataSource, ZarrItem

logger = get_logger(__name__)

Expand Down Expand Up @@ -47,5 +48,7 @@ def data_source_from_config(config: LayerConfig, ds_path: UPath) -> DataSource:
"Item",
"ItemLookupDataSource",
"RetrieveItemDataSource",
"ZarrDataSource",
"ZarrItem",
"data_source_from_config",
)
Loading