Use geotrellis-contrib to perform windowed reads on remote rasters

A typical GPS workflow might start like this:
```python
pats = ["s3://bucket/one.tiff"]
raster_layer = gps.RasterLayer.from_numpy_rdd(gps.LayerType.SPATIAL, gps.rasterio.get(paths))
tiled_layer = raster_layer.tile_to_layout(gps.GlobalLayout(256), target_crs=3857)
tiled_layer.count()
```

This is great for various reasons but has following cost:
- Conversion from `numpy` to GeoTrellis Tiles on read (low cost)
- Iterate over full dataset to figure out the data resolution and extent
  - Seen as `RasterSummary` spark job
- Iterate over full dataset again to read it
- Spark Shuffle to input tiles to some layout
- Spark Shuffle to generate `BufferedTiles` for reproject operation
  - reproject operation is implied by inclusion of `target_crs`
- Spark Shuffle to compute result of reproject operation on `BufferedTiles`

Prototyped here is an alternative workflow that avoids the high cost of spark shuffles on pixel tiles in this process: https://github.com/geotrellis/geotrellis-contrib/blob/demo/wsel/wse/src/main/scala/Main.scala

This can be encapsulated in a GeoPySpark operation that combines the `read` and `tile_to_layout` (with optional reprojection) steps to produce a `TiledRasterLayer`.

Ideally API can be similar to `tile_to_layout` perhaps:

```python
    def read_to_layout(paths,
                       layout=LocalLayout(),
                       target_crs=None,
                       resample_method=ResampleMethod.NEAREST_NEIGHBOR,
                       partition_strategy=None,
                       reader=ReadMethod.GEOTRELLIS):
```

There is an outstanding question of what should happen to merge order. I think currently any overlapping raster is going to get merged in arbitrary order as result of `tile_to_layout` operation and that would be an easy default to start from here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use geotrellis-contrib to perform windowed reads on remote rasters #689

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use geotrellis-contrib to perform windowed reads on remote rasters #689

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions