-
Notifications
You must be signed in to change notification settings - Fork 6
Add Galileo #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Galileo #272
Conversation
Also requires changes from nasaharvest/galileo#17 |
Thanks @robmarkcole ! Updated in 8c00f16 |
@gabrieltseng could you also add an example train config compatible with the dataset from https://github.com/allenai/rslearn_projects/blob/master/data/lfmc/config.json ? Normalisation being included would be useful too |
I've used era5 monthly as a feature and this results in
This is because it is created as a 2d raster: "era5": {
"type": "raster",
"band_sets": [
{
"dtype": "float32",
"bands": [
"2m-temperature",
"total-precipitation"
]
}
],
"data_source": {
"name": "rslearn.data_sources.climate_data_store.ERA5LandMonthlyMeans",
"api_key": "xxx"
}
} Perhaps it will be necessary to support another type (in addition to I've also noted that if there are NaN values in the ERA5 this feeds into the loss becoming NaN - the aggregation needs to be NaN aware - not sure if this is already supported somehow via config.. |
Pulled in the latest changes, get
|
@robmarkcole do you have example data against which I can reproduce this? |
@gabrieltseng you could create from: {
"layers": {
"label": {
"type": "vector"
},
"output": {
"type": "vector"
},
"sentinel1": {
"band_sets": [
{
"bands": [
"vv",
"vh"
],
"dtype": "float32"
}
],
"data_source": {
"cache_dir": "cache/planetary_computer",
"ingest": false,
"name": "rslearn.data_sources.planetary_computer.Sentinel1",
"query": {
"sar:instrument_mode": {
"eq": "IW"
},
"sar:polarizations": {
"eq": [
"VV",
"VH"
]
}
},
"query_config": {
"max_matches": 1,
"min_matches": 1,
"space_mode": "INTERSECTS"
},
"time_offset": "0d"
},
"type": "raster"
},
"sentinel2": {
"band_sets": [
{
"bands": [
"B02",
"B03",
"B04",
"B08"
],
"dtype": "uint16"
},
{
"bands": [
"B05",
"B06",
"B07",
"B8A",
"B11",
"B12"
],
"dtype": "uint16",
"zoom_offset": -1
},
{
"bands": [
"B01",
"B09"
],
"dtype": "uint16",
"zoom_offset": -2
}
],
"data_source": {
"cache_dir": "cache/planetary_computer",
"duration": "366d",
"harmonize": true,
"ingest": false,
"max_cloud_cover": 50,
"name": "rslearn.data_sources.planetary_computer.Sentinel2",
"query_config": {
"max_matches": 1,
"period_duration": "120d",
"space_mode": "PER_PERIOD_MOSAIC"
},
"sort_by": "eo:cloud_cover",
"time_offset": "-120d"
},
"type": "raster"
},
"srtm": {
"band_sets": [
{
"bands": [
"srtm"
],
"dtype": "int32"
}
],
"data_source": {
"name": "rslearn.data_sources.earthdata_srtm.SRTM"
},
"resampling_method": "nearest",
"type": "raster"
},
"era5": {
"type": "raster",
"band_sets": [
{
"dtype": "float32",
"bands": [
"2m-temperature",
"total-precipitation"
]
}
],
"data_source": {
"name": "rslearn.data_sources.climate_data_store.ERA5LandMonthlyMeans",
"api_key": "xxx"
}
}
},
"tile_store": {
"name": "file",
"root_dir": "tiles"
}
} |
@gabrieltseng import pytest
import torch
from rslearn.models.galileo.galileo import GalileoModel, S1_BANDS, SRTM_BANDS
def test_construct_galileo_input_raises_for_inconsistent_height_with_space_inputs():
batch = 2
height = 8
width = 8
timesteps = 3
s1_channels = len(S1_BANDS)
srtm_channels = len(SRTM_BANDS)
s1 = torch.zeros((batch, height, width, timesteps, s1_channels))
srtm = torch.zeros((batch, height, width, srtm_channels))
with pytest.raises(ValueError, match="Inconsistent heights per input"):
GalileoModel.construct_galileo_input(s1=s1, srtm=srtm) Resolution: Update the shape checks so space-only tensors contribute their spatial axes instead of their batch axis |
Thanks for digging into this @robmarkcole It should be fixed in the latest commit. |
@gabrieltseng can confirm the error is cleared! I have some samples where era5 data contains NaNs, should these be discarded, somehow imputed? This causes loss to become nan Note: a couple of the ERA5 sites are apparently over water, resulting in all NaN for their values - I've dropped these all now |
@gabrieltseng In the following I get an error model:
class_path: rslearn.train.lightning_module.RslearnLightningModule
init_args:
model:
class_path: rslearn.models.multitask.MultiTaskModel
init_args:
encoder:
- class_path: rslearn.models.galileo.GalileoModel
init_args:
# size: NANO
size: TINY
patch_size: 32
decoders:
soil_moisture:
- class_path: rslearn.models.pooling_decoder.PoolingDecoder
init_args:
in_channels: 128
out_channels: 1
- class_path: rslearn.train.tasks.regression.RegressionHead Resolved by setting |
Another potential issue I've noted is data being discarded. I have 16094 windows tagged Another feature I would like to see is the ability to disable latlon being used as a feature - I am seeing poor spatial generalisation and want to test the impact of this feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise LGTM!
@robmarkcole I merged this in so that we could run some of our experiments. I'll make some issues for the comments you brought up. To summarize:
Does this cover what's still to be done? |
Thanks @gabrieltseng that plus optional masking of lat/lon |
Example single yaml file for RSLP, which can be run with the following command:
python -m rslp.rslearn_main model fit --config this_config.yaml
In the decoder definition,
in_channels
is[{patch_size}, {encoder_embedding_size}]