Deep learning shadow removal model training code based on ISTD dataset.
ISTD dataset contains:
- train_A/: Original shadowed images (RGB, 640x480)
- train_B/: Shadow masks (grayscale, values 0 and 255)
- train_C/: Shadow-free target images (RGB, 640x480)
- test_A/, test_B/, test_C/: Test set with same structure
Total 5,610 images: 3,990 training samples and 1,620 test samples.
pip install -r requirements.txtpython train.pypython train.py \
--epochs 50 \
--batch_size 16 \
--learning_rate 1e-4 \
--model_type unet \
--save_dir ./my_checkpointspython train.py --eval_only --resume ./checkpoints/model_best.pthpython train.py --resume ./checkpoints/checkpoint_epoch_20.pth| Parameter | Default | Description |
|---|---|---|
--epochs |
100 | Number of training epochs |
--batch_size |
8 | Batch size |
--learning_rate |
1e-4 | Learning rate |
--model_type |
unet | Model type (unet/resnet) |
--image_size |
480 640 | Image size [height width] |
--save_every |
10 | Save model every N epochs |
--device |
auto | Training device (auto/cpu/cuda) |
- Based on classic U-Net architecture
- 4-channel input: RGB image(3) + shadow mask(1)
- 3-channel output: shadow-free RGB image
- Supports bilinear interpolation upsampling
- Based on ResNet encoder-decoder architecture
- Uses residual blocks for improved training stability
- Transpose convolution upsampling
Composite loss function includes:
- L1 Loss: Pixel-level reconstruction loss
- Perceptual Loss: High-level semantic loss based on VGG features
- Shadow Region Weighting: Higher weight for shadow regions
After training, the following files will be generated in the specified directory:
checkpoints/
├── model_best.pth # Best model
├── checkpoint_latest.pth # Latest checkpoint
└── checkpoint_epoch_X.pth # Epoch checkpoints
results/
├── training_history.png # Training history curves
├── final_metrics.txt # Final evaluation metrics
└── sample_X.png # Sample result visualizations
- MSE: Mean Squared Error
- PSNR: Peak Signal-to-Noise Ratio
- SSIM: Structural Similarity Index
- Shadow_MSE/PSNR: Shadow region metrics
- NonShadow_MSE/PSNR: Non-shadow region metrics
The training script automatically generates visualization results including:
- Shadowed input image
- Shadow mask
- Shadow region annotation
- Target image (ground truth)
- Prediction result
- Difference map
- GPU: Recommended CUDA-compatible GPU
- Memory: At least 8GB RAM
- Storage: At least 5GB available space
- Batch Size: Adjust based on GPU memory, recommend 8-16
- Learning Rate: Start with 1e-4, adjust based on training progress
- Data Augmentation: Can add random cropping, rotation, etc.
- Early Stopping: Monitor validation loss to avoid overfitting
- Reduce batch_size
- Reduce image_size
- Use CPU training
- Increase num_workers
- Use GPU training
- Reduce image size
- Adjust learning rate
- Check data preprocessing
- Try different model architectures
ISTD/
├── train.py # Main training script
├── data_loader.py # Data loader
├── model.py # Model definition
├── trainer.py # Trainer
├── utils.py # Utility functions
├── requirements.txt # Dependencies
├── README.md # Documentation
├── train/ # Training data
│ ├── train_A/ # Shadowed images
│ ├── train_B/ # Shadow masks
│ └── train_C/ # Target images
└── test/ # Test data
├── test_A/
├── test_B/
└── test_C/