Skip to content

Conversation

gabrieltseng
Copy link
Contributor

@gabrieltseng gabrieltseng commented Sep 9, 2025

Example single yaml file for RSLP, which can be run with the following command:

python -m rslp.rslearn_main model fit --config this_config.yaml

In the decoder definition, in_channels is [{patch_size}, {encoder_embedding_size}]

model:
  class_path: rslearn.train.lightning_module.RslearnLightningModule
  init_args:
    model:
      class_path: rslearn.models.multitask.MultiTaskModel
      init_args:
        encoder:
          - class_path: rslearn.models.galileo.GalileoModel
            init_args:
              size: NANO
              patch_size: 4
        decoders:
          segment:
            - class_path: rslearn.models.unet.UNetDecoder
              init_args:
                in_channels: [[4, 128]]
                out_channels: 20
                conv_layers_per_resolution: 2
                num_channels: {8: 128, 4: 128, 2: 128, 1: 128}
            - class_path: rslearn.train.tasks.segmentation.SegmentationHead
    lr: 0.0001
    scheduler:
      class_path: rslearn.train.scheduler.PlateauScheduler
      init_args:
        factor: 0.2
        patience: 2
        min_lr: 0
        cooldown: 20
data:
  class_path: rslearn.train.data_module.RslearnDataModule
  init_args:
    path: /weka/dfive-default/rslearn-eai/datasets/pastis/rslearn_dataset/
    inputs:
      sentinel2_0:
        data_type: "raster"
        layers: ["sentinel2"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_1:
        data_type: "raster"
        layers: ["sentinel2.1"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_2:
        data_type: "raster"
        layers: ["sentinel2.2"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_3:
        data_type: "raster"
        layers: ["sentinel2.3"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_4:
        data_type: "raster"
        layers: ["sentinel2.4"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_5:
        data_type: "raster"
        layers: ["sentinel2.5"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_6:
        data_type: "raster"
        layers: ["sentinel2.6"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_7:
        data_type: "raster"
        layers: ["sentinel2.7"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_8:
        data_type: "raster"
        layers: ["sentinel2.8"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_9:
        data_type: "raster"
        layers: ["sentinel2.9"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_10:
        data_type: "raster"
        layers: ["sentinel2.10"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      sentinel2_11:
        data_type: "raster"
        layers: ["sentinel2.11"]
        bands: ["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12"]
        passthrough: true
      targets:
        data_type: "raster"
        layers: ["label"]
        bands: ["class"]
        is_target: true
    default_config:
      transforms:
        - class_path: rslearn.train.transforms.concatenate.Concatenate
          init_args:
            selections:
              sentinel2_0: []
              sentinel2_1: []
              sentinel2_2: []
              sentinel2_3: []
              sentinel2_4: []
              sentinel2_5: []
              sentinel2_6: []
              sentinel2_7: []
              sentinel2_8: []
              sentinel2_9: []
              sentinel2_10: []
              sentinel2_11: []
            output_selector: s2
    task:
      class_path: rslearn.train.tasks.multi_task.MultiTask
      init_args:
        tasks:
          segment:
            class_path: rslearn.train.tasks.segmentation.SegmentationTask
            init_args:
              num_classes: 20
              remap_values: [[0, 1], [0, 255]]
              zero_is_invalid: true
              metric_kwargs:
                average: "micro"
              enable_miou_metric: true
        input_mapping:
          segment:
            targets: "targets"
    train_config:
      groups: ["fold1", "fold2", "fold3"]
      transforms:
        - class_path: rslearn.train.transforms.concatenate.Concatenate
          init_args:
            selections:
              sentinel2_0: []
              sentinel2_1: []
              sentinel2_2: []
              sentinel2_3: []
              sentinel2_4: []
              sentinel2_5: []
              sentinel2_6: []
              sentinel2_7: []
              sentinel2_8: []
              sentinel2_9: []
              sentinel2_10: []
              sentinel2_11: []
            output_selector: s2
        - class_path: rslearn.train.transforms.flip.Flip
          init_args:
            image_selectors: ["s2", "target/segment/classes", "target/segment/valid"]
    batch_size: 8
    num_workers: 16
    val_config:
      groups: ["fold4"]
    test_config:
      groups: ["fold5"]
trainer:
  max_epochs: 500
  callbacks:
    - class_path: lightning.pytorch.callbacks.LearningRateMonitor
      init_args:
        logging_interval: "epoch"
    - class_path: lightning.pytorch.callbacks.ModelCheckpoint
      init_args:
        save_top_k: 1
        save_last: true
        monitor: val_segment/accuracy
        mode: max
rslp_project: placeholder
rslp_experiment: placeholder

@APatrickJ APatrickJ self-requested a review September 11, 2025 16:28
@robmarkcole
Copy link
Contributor

Also requires changes from nasaharvest/galileo#17

@gabrieltseng
Copy link
Contributor Author

gabrieltseng commented Sep 15, 2025

Also requires changes from nasaharvest/galileo#17

Thanks @robmarkcole ! Updated in 8c00f16

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 22, 2025

@gabrieltseng could you also add an example train config compatible with the dataset from https://github.com/allenai/rslearn_projects/blob/master/data/lfmc/config.json ? Normalisation being included would be useful too
Thanks

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 25, 2025

I've used era5 monthly as a feature and this results in

einops.EinopsError:  Error while processing rearrange-reduction pattern "b (t c) -> b t c".
 Input tensor shape: torch.Size([8, 2, 32, 32]). Additional info: {'t': 1}.
 Wrong shape: expected 2 dims. Received 4-dim tensor.

This is because it is created as a 2d raster:

        "era5": {
            "type": "raster",
            "band_sets": [
                {
                    "dtype": "float32",
                    "bands": [
                        "2m-temperature",
                        "total-precipitation"
                    ]
                }
            ],
            "data_source": {
                "name": "rslearn.data_sources.climate_data_store.ERA5LandMonthlyMeans",
                "api_key": "xxx"
            }
        }

Perhaps it will be necessary to support another type (in addition to raster, vector) as I suggested in #275, OR just handle the aggregation in the code?

I've also noted that if there are NaN values in the ERA5 this feeds into the loss becoming NaN - the aggregation needs to be NaN aware - not sure if this is already supported somehow via config..

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 26, 2025

Pulled in the latest changes, get

  File "/Users/robin.cole/gitlab/rslearn-galileo-soil-moisture-fine-tune/rslearn/rslearn/models/galileo/galileo.py", line 446, in forward
    galileo_input = self.construct_galileo_input(**stacked_inputs, normalize=True)

  File "/Users/robin.cole/gitlab/rslearn-galileo-soil-moisture-fine-tune/rslearn/rslearn/models/galileo/galileo.py", line 263, in construct_galileo_input
    raise ValueError("Inconsistent heights per input")
ValueError: Inconsistent heights per input

@gabrieltseng
Copy link
Contributor Author

Pulled in the latest changes, get

@robmarkcole do you have example data against which I can reproduce this?

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 26, 2025

@gabrieltseng you could create from:

{
    "layers": {
        "label": {
            "type": "vector"
        },
        "output": {
            "type": "vector"
        },
        "sentinel1": {
            "band_sets": [
                {
                    "bands": [
                        "vv",
                        "vh"
                    ],
                    "dtype": "float32"
                }
            ],
            "data_source": {
                "cache_dir": "cache/planetary_computer",
                "ingest": false,
                "name": "rslearn.data_sources.planetary_computer.Sentinel1",
                "query": {
                    "sar:instrument_mode": {
                        "eq": "IW"
                    },
                    "sar:polarizations": {
                        "eq": [
                            "VV",
                            "VH"
                        ]
                    }
                },
                "query_config": {
                    "max_matches": 1,
                    "min_matches": 1,
                    "space_mode": "INTERSECTS"
                },
                "time_offset": "0d"
            },
            "type": "raster"
        },
        "sentinel2": {
            "band_sets": [
                {
                    "bands": [
                        "B02",
                        "B03",
                        "B04",
                        "B08"
                    ],
                    "dtype": "uint16"
                },
                {
                    "bands": [
                        "B05",
                        "B06",
                        "B07",
                        "B8A",
                        "B11",
                        "B12"
                    ],
                    "dtype": "uint16",
                    "zoom_offset": -1
                },
                {
                    "bands": [
                        "B01",
                        "B09"
                    ],
                    "dtype": "uint16",
                    "zoom_offset": -2
                }
            ],
            "data_source": {
                "cache_dir": "cache/planetary_computer",
                "duration": "366d",
                "harmonize": true,
                "ingest": false,
                "max_cloud_cover": 50,
                "name": "rslearn.data_sources.planetary_computer.Sentinel2",
                "query_config": {
                    "max_matches": 1,
                    "period_duration": "120d",
                    "space_mode": "PER_PERIOD_MOSAIC"
                },
                "sort_by": "eo:cloud_cover",
                "time_offset": "-120d"
            },
            "type": "raster"
        },
        "srtm": {
            "band_sets": [
                {
                    "bands": [
                        "srtm"
                    ],
                    "dtype": "int32"
                }
            ],
            "data_source": {
                "name": "rslearn.data_sources.earthdata_srtm.SRTM"
            },
            "resampling_method": "nearest",
            "type": "raster"
        },
        "era5": {
            "type": "raster",
            "band_sets": [
                {
                    "dtype": "float32",
                    "bands": [
                        "2m-temperature",
                        "total-precipitation"
                    ]
                }
            ],
            "data_source": {
                "name": "rslearn.data_sources.climate_data_store.ERA5LandMonthlyMeans",
                "api_key": "xxx"
            }
        }
    },
    "tile_store": {
        "name": "file",
        "root_dir": "tiles"
    }
}

@robmarkcole
Copy link
Contributor

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 29, 2025

@gabrieltseng construct_galileo_input mislabels the spatial dimensions when validating space-only tensors (SRTM/DW/WC): it takes x.shape[0] and x.shape[1] as height/width, but those indices actually correspond to batch and height for tensors shaped (batch, height, width, channels). When a space-only tensor is passed alongside a space-time tensor (where height lives at index 1), the combined height_list contains [correct_height, batch_size], so the consistency check raises “Inconsistent heights per input” even though both tensors share the same grid.

import pytest
import torch

from rslearn.models.galileo.galileo import GalileoModel, S1_BANDS, SRTM_BANDS


def test_construct_galileo_input_raises_for_inconsistent_height_with_space_inputs():
    batch = 2
    height = 8
    width = 8
    timesteps = 3
    s1_channels = len(S1_BANDS)
    srtm_channels = len(SRTM_BANDS)

    s1 = torch.zeros((batch, height, width, timesteps, s1_channels))
    srtm = torch.zeros((batch, height, width, srtm_channels))

    with pytest.raises(ValueError, match="Inconsistent heights per input"):
        GalileoModel.construct_galileo_input(s1=s1, srtm=srtm)

Resolution: Update the shape checks so space-only tensors contribute their spatial axes instead of their batch axis

@gabrieltseng
Copy link
Contributor Author

Thanks for digging into this @robmarkcole

It should be fixed in the latest commit.

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 29, 2025

@gabrieltseng can confirm the error is cleared!

I have some samples where era5 data contains NaNs, should these be discarded, somehow imputed? This causes loss to become nan

Note: a couple of the ERA5 sites are apparently over water, resulting in all NaN for their values - I've dropped these all now

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 30, 2025

@gabrieltseng In the following I get an error RuntimeError: mat1 and mat2 shapes cannot be multiplied (24x192 and 128x1) when switching from NANO to TINY:

model:
  class_path: rslearn.train.lightning_module.RslearnLightningModule
  init_args:
    model:
      class_path: rslearn.models.multitask.MultiTaskModel
      init_args:
        encoder:
          - class_path: rslearn.models.galileo.GalileoModel
            init_args:
              # size: NANO
              size: TINY
              patch_size: 32
        decoders:
          soil_moisture:
            - class_path: rslearn.models.pooling_decoder.PoolingDecoder
              init_args:
                in_channels: 128
                out_channels: 1
            - class_path: rslearn.train.tasks.regression.RegressionHead

Resolved by setting in_channels: 192 for TINY and 768 for BASE Perhaps this needs documenting, or could be picked up from config to prevent user error?

@robmarkcole
Copy link
Contributor

robmarkcole commented Sep 30, 2025

Another potential issue I've noted is data being discarded. I have 16094 windows tagged train but these get additionally filtered to 14859 examples in split train. I understand that check_window enforces that every required DataInput be fully materialized, however Galileo should be robust to missing inputs so why are these samples not included?

Another feature I would like to see is the ability to disable latlon being used as a feature - I am seeing poor spatial generalisation and want to test the impact of this feature

Copy link
Collaborator

@APatrickJ APatrickJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise LGTM!

@gabrieltseng gabrieltseng merged commit 24b0ef0 into master Oct 3, 2025
4 checks passed
@gabrieltseng
Copy link
Contributor Author

@robmarkcole I merged this in so that we could run some of our experiments. I'll make some issues for the comments you brought up. To summarize:

  1. Better document relation between the decoder shapes and the model size
  2. Potentially allow not-fully-materialized windows to be used for training and just mask them

Does this cover what's still to be done?

@gabrieltseng gabrieltseng deleted the galileo branch October 3, 2025 17:42
@robmarkcole
Copy link
Contributor

Thanks @gabrieltseng that plus optional masking of lat/lon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants