Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8cb4415
add mozambique runs
yawenzzzz Sep 18, 2025
fbb9267
change to descending
yawenzzzz Sep 18, 2025
de56e2e
linting
gabrieltseng Sep 25, 2025
78fd6fb
tmp
gabrieltseng Sep 25, 2025
2920e99
add mozambique es run configs
gabrieltseng Sep 30, 2025
b9ee3b1
class to crop_type_classification
gabrieltseng Sep 30, 2025
dd939f4
Merge branch 'master' into yawenz/20250917_mozambique
gabrieltseng Oct 1, 2025
6bc59d1
new beaker image means this is no longer necessary
gabrieltseng Oct 1, 2025
2090fd8
create label raster for mozambique
gabrieltseng Oct 1, 2025
5fa632b
use segmentation for finetuning
gabrieltseng Oct 1, 2025
86532c1
actually its not in the newest config yet
gabrieltseng Oct 1, 2025
4710c0c
Add more metrics
gabrieltseng Oct 1, 2025
ed24d07
:facepalm:
gabrieltseng Oct 1, 2025
4d56262
lets ignore this for now
gabrieltseng Oct 1, 2025
4228443
Update label in yaml
gabrieltseng Oct 1, 2025
4f55680
incremental fixes
gabrieltseng Oct 1, 2025
6849d40
tmp
gabrieltseng Oct 1, 2025
e1cfa71
Have to crop the labels too
gabrieltseng Oct 1, 2025
e2b3829
its just label
gabrieltseng Oct 1, 2025
474578e
cargo culting
gabrieltseng Oct 1, 2025
7e147fe
update es run model config to reflect new changes
gabrieltseng Oct 1, 2025
96d566c
fix typo
gabrieltseng Oct 1, 2025
ef8a1b6
I don't think these keys are necessary in the es run config?
gabrieltseng Oct 1, 2025
3527240
Add comment about necessary upload
gabrieltseng Oct 1, 2025
d4e74d7
We need to encode zeros as invalid
gabrieltseng Oct 2, 2025
f648484
We are missing a comma
gabrieltseng Oct 2, 2025
fa48b31
zero is invalid with the new rasters
gabrieltseng Oct 2, 2025
9954dde
we have +1 classes
gabrieltseng Oct 2, 2025
a5db79b
oops
gabrieltseng Oct 2, 2025
ebc7a05
update es run yamls for mozambique
gabrieltseng Oct 2, 2025
939776b
BIGGER
gabrieltseng Oct 2, 2025
f726b84
the new images need an extra /
gabrieltseng Oct 2, 2025
10e92b3
SMALLER
gabrieltseng Oct 3, 2025
8cc6cc8
smaller still
gabrieltseng Oct 6, 2025
41fac26
Merge branch 'master' into yawenz/20250917_mozambique
gabrieltseng Oct 8, 2025
b7ccffb
Finetune s2 only
gabrieltseng Oct 8, 2025
ea7c1be
Reduce grid size
gabrieltseng Oct 8, 2025
e8a68b1
Remove sentinel1 from yaml
gabrieltseng Oct 8, 2025
ae09f9e
the beaker image has been updated
gabrieltseng Oct 8, 2025
0692d8c
Update with main
gabrieltseng Oct 23, 2025
a693174
move out of crop folder
gabrieltseng Oct 23, 2025
55300d5
update path to segmentation pooling decoder
gabrieltseng Oct 23, 2025
ee68b73
helios -> olmoearth
gabrieltseng Oct 23, 2025
1450205
Add notes, add Gaza geometry
gabrieltseng Oct 23, 2025
f0bce22
predict 16x16 at inference time
gabrieltseng Oct 24, 2025
b39bada
fix task name
gabrieltseng Oct 24, 2025
9e90c37
Reduce batch size
gabrieltseng Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions data/helios/v2_mozambique_lulc/finetune_s1_s2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
model:
class_path: rslearn.train.lightning_module.RslearnLightningModule
init_args:
model:
class_path: rslearn.models.multitask.MultiTaskModel
init_args:
encoder:
- class_path: rslp.helios.model.Helios
init_args:
checkpoint_path: /weka/dfive-default/helios/checkpoints/henryh/base_v6.1_add_chm_cdl_worldcereal/step300000
selector: ["encoder"]
forward_kwargs:
patch_size: 1
decoders:
crop_type_classification:
- class_path: rslearn.models.pooling_decoder.PoolingDecoder
init_args:
in_channels: 768
out_channels: 7
- class_path: rslearn.train.tasks.classification.ClassificationHead
lr: 0.0001
scheduler:
class_path: rslearn.train.scheduler.PlateauScheduler
init_args:
factor: 0.2
patience: 2
min_lr: 0
cooldown: 10
data:
class_path: rslearn.train.data_module.RslearnDataModule
init_args:
path: /weka/dfive-default/rslearn-eai/datasets/crop/mozambique_lulc
inputs:
sentinel2_l2a:
data_type: "raster"
layers: ["sentinel2"]
bands: ["B02", "B03", "B04", "B08", "B05", "B06", "B07", "B8A", "B11", "B12", "B01", "B09"]
passthrough: true
dtype: FLOAT32
load_all_item_groups: true
load_all_layers: true
sentinel1:
data_type: "raster"
layers: ["sentinel1_descending"]
bands: ["vv", "vh"]
passthrough: true
dtype: FLOAT32
load_all_item_groups: true
load_all_layers: true
label:
data_type: "vector"
layers: ["label"]
is_target: true
task:
class_path: rslearn.train.tasks.multi_task.MultiTask
init_args:
tasks:
crop_type_classification:
class_path: rslearn.train.tasks.classification.ClassificationTask
init_args:
property_name: "category"
classes: ["Water", "Bare Ground", "Rangeland", "Flooded Vegetation", "Trees", "Cropland", "Buildings"]
enable_f1_metric: true
metric_kwargs:
average: "micro"
input_mapping:
crop_type_classification:
label: "targets"
batch_size: 32
num_workers: 32
default_config:
transforms:
- class_path: rslp.helios.norm.HeliosNormalize
init_args:
config_fname: "/opt/helios/data/norm_configs/computed.json"
band_names:
sentinel2_l2a: ["B02", "B03", "B04", "B08", "B05", "B06", "B07", "B8A", "B11", "B12", "B01", "B09"]
sentinel1: ["vv", "vh"]
- class_path: rslearn.train.transforms.pad.Pad
init_args:
size: 4
mode: "center"
image_selectors: ["sentinel2_l2a", "sentinel1"]
train_config:
groups: ["gaza"]
tags:
split: "train"
val_config:
groups: ["gaza"]
tags:
split: "test"
test_config:
groups: ["gaza"]
tags:
split: "test"
trainer:
max_epochs: 100
callbacks:
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: "epoch"
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
save_top_k: 1
save_last: true
monitor: val_loss
mode: min
- class_path: rslearn.train.callbacks.freeze_unfreeze.FreezeUnfreeze
init_args:
module_selector: ["model", "encoder", 0]
unfreeze_at_epoch: 20
unfreeze_lr_factor: 10
rslp_project: 2025_09_18_mozambique_lulc
rslp_experiment: mozambique_lulc_helios_base_S1_S2_ts_ws4_ps1_gaza
24 changes: 24 additions & 0 deletions rslp/crop/mozambique/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Mozambique LULC and Crop Type Classification

This project has two main tasks:
1. Land Use/Land Cover (LULC) and cropland classification
2. Crop type classification

The annotations come from field surveys across three provinces in Mozambique: Gaza, Zambezia, and Manica.

For LULC classification, the train/test splits are:
- Gaza: 2,262 / 970
- Manica: 1,917 / 822
- Zambezia: 1,225 / 525

## LULC Classification

```
python /weka/dfive-default/yawenz/rslearn_projects/rslp/crop/mozambique/create_windows_for_lulc.py --gpkg_dir /weka/dfive-default/yawenz/datasets/mozambique/train_test_samples --ds_path /weka/dfive-default/rslearn-eai/datasets/crop/mozambique_lulc --window_size 32

export DATASET_PATH=/weka/dfive-default/rslearn-eai/datasets/crop/mozambique_lulc
rslearn dataset prepare --root $DATASET_PATH --workers 64 --no-use-initial-job --retry-max-attempts 8 --retry-backoff-seconds 60
python -m rslp.main common launch_data_materialization_jobs --image favyen/rslp_image --ds_path $DATASET_PATH --clusters+=ai2/neptune-cirrascale --num_jobs 5

python -m rslp.main helios launch_finetune --image_name favyen/rslphelios10 --config_paths+=data/helios/v2_mozambique_lulc/finetune_s1_s2.yaml --cluster+=ai2/neptune --rslp_project 2025_09_18_mozambique_lulc --experiment_id mozambique_lulc_helios_base_S1_S2_ts_ws4_ps1_gaza
```
248 changes: 248 additions & 0 deletions rslp/crop/mozambique/create_windows_for_lulc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
"""Create windows for crop type mapping from GPKG files (fixed splits)."""

import argparse
import multiprocessing
from collections.abc import Iterable
from datetime import datetime, timezone
from pathlib import Path

import geopandas as gpd
import shapely
import tqdm
from rslearn.const import WGS84_PROJECTION
from rslearn.dataset import Window
from rslearn.utils import Projection, STGeometry, get_utm_ups_crs
from rslearn.utils.feature import Feature
from rslearn.utils.mp import star_imap_unordered
from rslearn.utils.vector_format import GeojsonVectorFormat
from upath import UPath

from rslp.utils.windows import calculate_bounds

WINDOW_RESOLUTION = 10
LABEL_LAYER = "label"

CLASS_MAP = {
0: "Water",
1: "Bare Ground",
2: "Rangeland",
3: "Flooded Vegetation",
4: "Trees",
5: "Cropland",
6: "Buildings",
}

# Per-province temporal coverage (UTC)
PROVINCE_TIME = {
"gaza": (
datetime(2024, 10, 23, tzinfo=timezone.utc),
datetime(2025, 5, 7, tzinfo=timezone.utc),
),
"manica": (
datetime(2024, 11, 23, tzinfo=timezone.utc),
datetime(2025, 6, 7, tzinfo=timezone.utc),
),
"zambezia": (
datetime(2024, 11, 23, tzinfo=timezone.utc),
datetime(2025, 6, 7, tzinfo=timezone.utc),
),
}


def process_gpkg(gpkg_path: UPath) -> gpd.GeoDataFrame:
"""Load a GPKG and ensure lon/lat in WGS84; expect 'fid' and 'class' columns."""
gdf = gpd.read_file(str(gpkg_path))

# Normalize CRS to WGS84
if gdf.crs is None:
gdf = gdf.set_crs("EPSG:4326", allow_override=True)
else:
gdf = gdf.to_crs("EPSG:4326")

required_cols = {"class", "geometry"}
missing = [c for c in required_cols if c not in gdf.columns]
if missing:
raise ValueError(f"{gpkg_path}: missing required column(s): {missing}")

return gdf


def iter_points(gdf: gpd.GeoDataFrame) -> Iterable[tuple[int, float, float, int]]:
"""Yield (fid, latitude, longitude, category) per feature using centroid for polygons."""
for fid, row in gdf.iterrows():
geom = row.geometry
if geom is None or geom.is_empty:
continue
if isinstance(geom, shapely.Point):
pt = geom
else:
pt = geom.centroid
lon, lat = float(pt.x), float(pt.y)
category = int(row["class"])
yield fid, lat, lon, category


def create_window(
rec: tuple[int, float, float, int],
ds_path: UPath,
group_name: str,
split: str,
window_size: int,
start_time: datetime,
end_time: datetime,
) -> None:
"""Create a single window and write label layer."""
fid, latitude, longitude, category_id = rec
category_label = CLASS_MAP.get(category_id, f"Unknown_{category_id}")

# Geometry/projection
src_point = shapely.Point(longitude, latitude)
src_geometry = STGeometry(WGS84_PROJECTION, src_point, None)
dst_crs = get_utm_ups_crs(longitude, latitude)
dst_projection = Projection(dst_crs, WINDOW_RESOLUTION, -WINDOW_RESOLUTION)
dst_geometry = src_geometry.to_projection(dst_projection)
bounds = calculate_bounds(dst_geometry, window_size)

# Group = province name; split is taken from file name (train/test)
group = group_name
window_name = f"{fid}_{latitude:.6f}_{longitude:.6f}"

window = Window(
path=Window.get_window_root(ds_path, group, window_name),
group=group,
name=window_name,
projection=dst_projection,
bounds=bounds,
time_range=(start_time, end_time),
options={
"split": split, # 'train' or 'test' as provided
"category_id": category_id,
"category": category_label,
"fid": fid,
"source": "gpkg",
},
)
window.save()

# Label layer (same as before, using window geometry)
feature = Feature(
window.get_geometry(),
{
"category_id": category_id,
"category": category_label,
"fid": fid,
"split": split,
},
)
layer_dir = window.get_layer_dir(LABEL_LAYER)
GeojsonVectorFormat().encode_vector(layer_dir, [feature])
window.mark_layer_completed(LABEL_LAYER)


def create_windows_from_gpkg(
gpkg_path: UPath,
ds_path: UPath,
group_name: str,
split: str,
window_size: int,
max_workers: int,
start_time: datetime,
end_time: datetime,
) -> None:
"""Create windows from a single GPKG file."""
gdf = process_gpkg(gpkg_path)
records = list(iter_points(gdf))

jobs = [
dict(
rec=rec,
ds_path=ds_path,
group_name=group_name,
split=split,
window_size=window_size,
start_time=start_time,
end_time=end_time,
)
for rec in records
]

print(
f"[{group_name}:{split}] file={gpkg_path.name} features={len(jobs)} "
f"time={start_time.date()}→{end_time.date()}"
)

if max_workers <= 1:
for kw in tqdm.tqdm(jobs):
create_window(**kw)
else:
p = multiprocessing.Pool(max_workers)
outputs = star_imap_unordered(p, create_window, jobs)
for _ in tqdm.tqdm(outputs, total=len(jobs)):
pass
p.close()


if __name__ == "__main__":
multiprocessing.set_start_method("forkserver", force=True)

parser = argparse.ArgumentParser(description="Create windows from GPKG files")
parser.add_argument(
"--gpkg_dir",
type=str,
required=True,
help="Directory containing gaza_[train|test].gpkg, manica_[train|test].gpkg, zambezia_[train|test].gpkg",
)
parser.add_argument(
"--ds_path",
type=str,
required=True,
help="Path to the dataset root",
)
parser.add_argument(
"--window_size",
type=int,
default=1,
help="Window size (pixels per side in projected grid)",
)
parser.add_argument(
"--max_workers",
type=int,
default=32,
help="Worker processes (set 1 for single-process)",
)
args = parser.parse_args()

gpkg_dir = Path(args.gpkg_dir)
ds_path = UPath(args.ds_path)

expected = [
("gaza", "train", gpkg_dir / "gaza_train.gpkg"),
("gaza", "test", gpkg_dir / "gaza_test.gpkg"),
("manica", "train", gpkg_dir / "manica_train.gpkg"),
("manica", "test", gpkg_dir / "manica_test.gpkg"),
("zambezia", "train", gpkg_dir / "zambezia_train.gpkg"),
("zambezia", "test", gpkg_dir / "zambezia_test.gpkg"),
]

# Basic checks
for province, _, path in expected:
if province not in PROVINCE_TIME:
raise ValueError(f"Unknown province '{province}'")
if not path.exists():
raise FileNotFoundError(f"Missing expected file: {path}")

# Run per file
for province, split, path in expected:
start_time, end_time = PROVINCE_TIME[province]
create_windows_from_gpkg(
gpkg_path=UPath(path),
ds_path=ds_path,
group_name=province, # group == province
split=split, # honor provided split
window_size=args.window_size,
max_workers=args.max_workers,
start_time=start_time,
end_time=end_time,
)

print("Done.")
Loading
Loading