Official PyTorch implementation of Time2Agri, a self-supervised learning framework for agricultural representation learning using satellite imagery. This work has been accepted at AAAI 2026 Social Impact Track.
Time2Agri introduces agriculture-focused temporal pretext tasks that capture seasonal cycles and temporal patterns unique to agricultural landscapes, addressing the limitations of existing remote sensing foundation models that neglect agricultural temporal dynamics.
We propose three novel temporal pretext tasks specifically designed for agricultural monitoring:
- Time-Difference Prediction (TD) - Captures temporal changes between observations to model agricultural dynamics
- Temporal Frequency Prediction (FP) - Analyzes cyclical patterns in agricultural data using frequency-domain representations
- Future-Frame Prediction (FF) - Forecasts upcoming satellite imagery to learn causal temporal dependencies
- Crop Mapping: 69.6% IoU on crop mapping benchmarks
- Yield Prediction: 30.7% MAPE, outperforming baseline approaches
- Field Delineation: 54.2% IoU on FTW India dataset for field boundary delineation
- Python 3.11+
- PyTorch 2.5+
- PyTorch Lightning
- timm (PyTorch Image Models)
- einops
- zarr (for data loading)
- tensorboard
- tqdm
- matplotlib
git clone https://github.com/Geospatial-Computer-Vision-Group/agri-pretext.git
cd agri-pretext
# Install dependencies
pip install torch torchvision lightning timm einops zarr tensorboard tqdm matplotlibThe code for training regional models is contained in the regional_ssl folder. Navigate to this folder for reproducing the regional pretraining experiments.
cd regional_sslWe are actively working on releasing the following components:
- National-scale pretraining code - Training pipeline for larger geographic coverage
- Datasets - Preprocessed satellite imagery datasets used in our experiments
- Evaluation code - Downstream task evaluation scripts for crop mapping, yield prediction, and field delineation
Stay tuned for updates!
The code expects satellite imagery data in Zarr format. With each zarr group representing a chip, and containing two arrays: data, containing the TxCxHxW tensor, and timestamps, containing the time stamp corresponding to a given instance.
The dataset will be released soon.
Update the data_dir path in the configuration files to point to your dataset:
data:
data_dir: /path/to/your/dataset.zarr
batch_size: 512
num_workers: 40
split_ratio: 0.8Before training, compute normalization statistics for your dataset.
Note: you need to update the path to the zarr in this calc_stats.py file.
python regional_ssl/calc_stats.pyThis will generate a stats.pth file containing mean and standard deviation values used for normalization.
We provide configuration files for all three pretext tasks in the regional_ssl/configs/ directory.
python regional_ssl/train_ff.py fit --config regional_ssl/configs/vits_ff.yamlpython regional_ssl/train_fp.py fit --config regional_ssl/configs/vits_fp.yamlpython regional_ssl/train_td.py fit --config regional_ssl/configs/vits_td.yamlpython regional_ssl/train_mae.py fit --config regional_ssl/configs/vits_mae.yamlAll configuration files use PyTorch Lightning CLI format. Key parameters you can adjust:
trainer.max_epochs: Number of training epochs (default: 100)trainer.devices: Number of GPUs to usemodel.learning_rate: Learning rate for optimizationmodel.warmup_epochs: Number of warmup epochs for learning rate scheduledata.batch_size: Training batch sizedata.num_workers: Number of data loading workers
Example configuration structure:
seed_everything: true
trainer:
max_epochs: 100
accelerator: gpu
devices: 1
default_root_dir: logs/ff_vits
model:
model_name: "vit_small_patch16_224"
learning_rate: 0.0009
img_size: 224
patch_size: 16
data:
data_dir: /path/to/data.zarr
batch_size: 512
num_workers: 40Time2Agri uses a Vision Transformer (ViT) backbone with task-specific components:
- Encoder: ViT-Small (patch size 16, 384 dimensions)
- Time Translator: A two-layer Transformer for predicting the future latent
- FreqDecoder/Decoder: Task-specific reconstruction/prediction heads
vits_ff.yaml- Future-Frame prediction with ViT-Smallvits_fp.yaml- Frequency prediction with ViT-Smallvits_td.yaml- Time-difference prediction with ViT-Smallvits_mae.yaml- Standard MAE baselinevits_mae_300.yaml- Standard MAE baseline, trained for 300 epochs
Training logs and checkpoints are saved in the directory specified by trainer.default_root_dir:
logs/
├── ff_vits/ # Future-Frame logs
├── fp_vits/ # Frequency Prediction logs
├── td_vits/ # Time-Difference logs
└── mae_vits/ # MAE baseline logs
Each run saves:
best.ckpt- Best model based on validation losslast.ckpt- Last checkpoint- TensorBoard logs for visualization
We use last.ckpt during our evaluation.
tensorboard --logdir logs/If you find this work useful, please cite our paper:
@misc{gupta2025time2agritemporalpretexttasks,
title={Time2Agri: Temporal Pretext Tasks for Agricultural Monitoring},
author={Moti Rattan Gupta and Anupam Sobti},
year={2025},
eprint={2507.04366},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.04366},
}This project is licensed under the MIT License - see the LICENSE file for details.
This research addresses critical challenges in agricultural monitoring using self-supervised learning on satellite imagery, with applications in crop mapping, yield prediction, and field delineation.
For questions or issues, please open an issue on GitHub or contact the authors.
AAAI 2026 Social Impact Track | Paper