Professional tool for downloading ERA5 reanalysis data with parallel processing, optimized for WRF/WPS preprocessing and HPC environments.
- ⚡ Parallel Processing: Download multiple time periods simultaneously using multiprocessing
- 🔧 Flexible Configuration: YAML-based configuration for easy customization
- 📊 Comprehensive Logging: Detailed logs for monitoring and debugging
- 🔄 Resume Capability: Skip already downloaded files automatically
- 🎯 WRF-Optimized: Pre-configured with standard WRF pressure levels and variables
- 🚀 HPC Compatible: Designed for use on high-performance computing clusters
⚠️ Error Handling: Robust error handling with detailed error reporting- 📈 Progress Tracking: Real-time progress updates during downloads
# Core dependencies
pip install cdsapi pyyaml
# Optional but recommended
pip install tqdm # For enhanced progress bars- Register at Climate Data Store: https://cds.climate.copernicus.eu/
- Accept Terms: Go to any ERA5 dataset page and accept the license terms
- Get API Key: Visit https://cds.climate.copernicus.eu/user and copy your UID and API key
- Configure your credentials:
Create ~/.cdsapirc file:
url: https://cds.climate.copernicus.eu/api
key: UID:API_KEY
Replace UID:API_KEY with your actual credentials (e.g., 12345:abcd-efgh-ijkl-mnop).
Important: Set proper permissions:
chmod 600 ~/.cdsapirc/DATA/datasets/era5/
├── pressure_levels/ # 3D atmospheric data (GRIB files)
├── single_levels/ # 2D surface data (GRIB files)
├── logs/ # Download logs
├── temp/ # Temporary files (auto-cleaned)
├── era5_downloader.py # Main download script
├── config_wrf.yaml # Configuration for WRF preprocessing
├── config_surface_only.yaml # Surface data only config
└── README.txt # This file
Download data for WRF simulation:
python era5_downloader.py -c config_wrf.yamlFor land surface or climate studies:
python era5_downloader.py -c config_surface_only.yamlValidate your configuration without downloading:
python era5_downloader.py -c config_wrf.yaml --dry-runFor detailed debugging information:
python era5_downloader.py -c config_wrf.yaml --log-level DEBUGThe YAML configuration file has 5 main sections:
temporal:
start_date: "2025-01-01"
end_date: "2025-12-31"
times:
- "00:00"
- "06:00"
- "12:00"
- "18:00"Tips:
- Use hourly data (00:00-23:00) for high-resolution WRF runs
- Use 6-hourly (00:00, 06:00, 12:00, 18:00) for faster downloads
- Consider your WRF time step when choosing temporal resolution
spatial:
area: [60, -30, 20, 25] # [North, West, South, East]
grid: null # Native 0.25° or "0.5/0.5", "1.0/1.0"
pressure_levels: ["1000", "925", "850", ...]Domain Guidelines:
- WRF Domain 1: Should be 3-5° larger than your actual domain for proper boundary conditions
- Nested Domains: Download for outermost domain only; WRF handles nesting
- Grid Resolution:
null: Native 0.25° (best quality, larger files)"0.5/0.5": Half degree (good balance)"1.0/1.0": One degree (fastest, suitable for large domains)
Common Domains:
# Iberian Peninsula (good for Spanish WRF runs)
area: [44, -10, 36, 4]
# Western Mediterranean
area: [45, -5, 35, 15]
# Europe
area: [72, -25, 34, 40]
# Global
area: [90, -180, -90, 180]Pressure Levels (3D atmospheric data):
variables:
pressure_levels:
- geopotential # Required for WRF
- temperature # Required for WRF
- u_component_of_wind # Required for WRF
- v_component_of_wind # Required for WRF
- relative_humidity # Required for WRFSingle Levels (2D surface data):
single_levels:
- 2m_temperature
- surface_pressure # Required for WRF
- soil_temperature_level_1
- volumetric_soil_water_layer_1
# ... etcEssential WRF Variables:
- Pressure levels: geopotential, temperature, U/V winds, relative humidity
- Surface: 2m temperature, surface pressure, soil temperatures/moisture, SST
All Available Variables: See https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels
output:
base_dir: "/DATA/datasets/era5"
format: "grib" # or "netcdf"Format Recommendations:
- GRIB: Best for WRF (directly compatible with WPS ungrib)
- NetCDF: Better for Python analysis (xarray, pandas)
download:
levels:
- pressure
- surface
chunk_size: "month" # "day", "week", or "month"
max_workers: 4
skip_existing: trueOptimization Tips:
| Period Length | Chunk Size | Workers | Reasoning |
|---|---|---|---|
| < 1 month | day |
2-3 | Fast turnaround, minimal overhead |
| 1-6 months | week |
3-5 | Good balance |
| > 6 months | month |
4-8 | Most efficient for long periods |
Worker Guidelines:
- Local PC: 2-4 workers (avoid overwhelming CDS)
- HPC: 4-8 workers (check cluster policy)
- Note: CDS API has rate limits; more workers doesn't always mean faster
Approximate times for different configurations (depends on CDS load):
| Configuration | Data Volume | Time (4 workers) |
|---|---|---|
| 3 days, full domain, hourly | ~500 MB | 15-30 min |
| 1 month, region, hourly | ~2 GB | 1-2 hours |
| 1 year, region, hourly | ~25 GB | 8-12 hours |
| 1 year, region, 6-hourly | ~10 GB | 3-6 hours |
For Full Year Downloads:
temporal:
times: ["00:00", "06:00", "12:00", "18:00"] # 6-hourly instead of hourly
spatial:
grid: "0.5/0.5" # Coarser grid
download:
chunk_size: "month"
max_workers: 6For High-Resolution Short Periods:
temporal:
times: ["00:00", "01:00", ..., "23:00"] # All hours
spatial:
grid: null # Native 0.25°
download:
chunk_size: "day"
max_workers: 3The script provides detailed logging:
2026-02-11 14:30:15 - INFO - ERA5 PARALLEL DOWNLOAD STARTING
2026-02-11 14:30:15 - INFO - Period: 2025-01-01 to 2025-12-31
2026-02-11 14:30:15 - INFO - Generated 12 date chunks for download
2026-02-11 14:30:15 - INFO - DOWNLOADING PRESSURE LEVELS
2026-02-11 14:32:45 - INFO - [1/12] SUCCESS: 2025-01-01 to 2025-01-31
2026-02-11 14:34:20 - INFO - [2/12] SUCCESS: 2025-02-01 to 2025-02-28
...
All logs are saved in /DATA/datasets/era5/logs/:
era5_download_20260211_143015.log
Each log contains:
- Configuration summary
- Individual download status
- Error messages (if any)
- Final summary with success/failure counts
# View real-time log
tail -f /DATA/datasets/era5/logs/era5_download_*.log
# Check downloaded files
ls -lh /DATA/datasets/era5/pressure_levels/
ls -lh /DATA/datasets/era5/single_levels/
# Count successful downloads
ls /DATA/datasets/era5/pressure_levels/ | wc -lDownload most recent month for operational WRF:
temporal:
start_date: "2025-02-01"
end_date: "2025-02-28"
times: ["00:00", "03:00", "06:00", "09:00", "12:00", "15:00", "18:00", "21:00"]
spatial:
area: [45, -10, 35, 5] # Your domain + margin
grid: null
download:
chunk_size: "week"
max_workers: 4
levels: [pressure, surface]Download 10 years at 6-hourly resolution:
temporal:
start_date: "2010-01-01"
end_date: "2019-12-31"
times: ["00:00", "06:00", "12:00", "18:00"]
spatial:
area: [44, -10, 36, 4]
grid: "0.5/0.5" # Reduce size
download:
chunk_size: "month"
max_workers: 8
levels: [surface] # Surface only for climate statsDownload extreme weather event with maximum detail:
temporal:
start_date: "2025-06-15"
end_date: "2025-06-17"
times: ["00:00", "01:00", ..., "23:00"] # Hourly
spatial:
area: [42, 0, 40, 3] # Small domain
grid: null # Native resolution
pressure_levels: ["1000", "925", "850", "700", "500", "300", "200"] # Key levels
download:
chunk_size: "day"
max_workers: 2
levels: [pressure, surface]Error: Client is not available
Solutions:
- Check
~/.cdsapircexists and has correct format - Verify credentials are correct (copy from CDS website)
- Check file permissions:
chmod 600 ~/.cdsapirc - Accept dataset license terms on CDS website
Error: Request timeout or stuck in queue
Solutions:
- Reduce
max_workers(CDS limits concurrent requests per user) - Use larger
chunk_size(fewer, larger requests are more efficient) - Try downloading during off-peak hours (avoid 9am-5pm CET)
- Check CDS status: https://cds.climate.copernicus.eu/live/queue
Error: No space left on device
Solutions:
- Check available space:
df -h /DATA - ERA5 data is large:
- 1 year hourly full domain ≈ 25-50 GB
- GRIB is ~40% smaller than NetCDF
- Use coarser grid or 6-hourly data
- Download in smaller time chunks
Symptoms: Downloads taking much longer than expected
Solutions:
- Check CDS load: High user demand slows everyone
- Reduce grid resolution: Request native 0.25° is slower than regridded
- Use monthly chunks: More efficient than daily for long periods
- Limit workers: Sometimes 2-3 workers faster than 8 (less queue contention)
- Check network: Run
speedtest-clito verify connection
Error: Invalid variable name
Solutions:
- Check variable names at: https://cds.climate.copernicus.eu/datasets
- Some variables only available in certain datasets:
- Wave variables: use
reanalysis-era5-single-levelsonly - Model levels: use
reanalysis-era5-complete
- Wave variables: use
- Variable names are case-sensitive and use underscores
python era5_downloader.py -c config_wrf.yamlYour downloaded files are ready for WPS ungrib:
cd /DATA/datasets/era5
# Link to WPS directory
ln -sf pressure_levels/*.grib /path/to/WPS/
ln -sf single_levels/*.grib /path/to/WPS/In namelist.wps, set:
&ungrib
out_format = 'WPS',
prefix = 'ERA5',
/cd /path/to/WPS
ln -sf ungrib/Variable_Tables/Vtable.ERA-interim.pl Vtable
# Note: ERA-interim Vtable works for ERA5./link_grib.csh /DATA/datasets/era5/pressure_levels/era5_pl_*.grib
./link_grib.csh /DATA/datasets/era5/single_levels/era5_sl_*.grib
./ungrib.exe- Dataset Overview: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels
- ERA5 Paper: Hersbach et al. (2020), Q.J.R. Meteorol. Soc.
- Variable List: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation
- WRF Users Guide: https://www2.mmm.ucar.edu/wrf/users/
- WPS Tutorial: https://www2.mmm.ucar.edu/wrf/OnLineTutorial/
- ERA5 for WRF: Check WRF forum for latest best practices
- API Documentation: https://cds.climate.copernicus.eu/how-to-api
- Python Examples: https://github.com/ecmwf/cdsapi
- Forum: https://forum.ecmwf.int/
If you encounter problems:
- Check the troubleshooting section above
- Review log files in
/DATA/datasets/era5/logs/ - Test with
--dry-runflag - Try a minimal configuration first
The script is designed to be easily customized:
- Modify
variablessection for different applications - Adjust
chunk_sizeandmax_workersfor your environment - Create new config files for different projects
- Add custom post-processing functions
For HPC environments, consider:
download:
max_workers: 8 # Increase on HPC
chunk_size: "month" # Efficient for clustersAnd run with:
# Submit as SLURM job
sbatch --time=12:00:00 --cpus-per-task=8 download_era5.shIf you use this tool in research, please cite:
- ERA5: Hersbach et al. (2020), The ERA5 global reanalysis. Q.J.R. Meteorol. Soc., 146: 1999-2049.
- Parallel Processing Approach: Based on methods by Akash Pathaikara (2024)
This tool is provided for research and operational use. ERA5 data is subject to Copernicus License.
Last Updated: February 2026
Version: 1.0
Compatible with: ERA5, ERA5-Land, ERA5.1
Python: 3.8+
Tested on: Linux/Unix HPC systems