era5

ERA5 Parallel Downloader

Professional tool for downloading ERA5 reanalysis data with parallel processing, optimized for WRF/WPS preprocessing and HPC environments.

🌟 Features

⚡ Parallel Processing: Download multiple time periods simultaneously using multiprocessing
🔧 Flexible Configuration: YAML-based configuration for easy customization
📊 Comprehensive Logging: Detailed logs for monitoring and debugging
🔄 Resume Capability: Skip already downloaded files automatically
🎯 WRF-Optimized: Pre-configured with standard WRF pressure levels and variables
🚀 HPC Compatible: Designed for use on high-performance computing clusters
⚠️ Error Handling: Robust error handling with detailed error reporting
📈 Progress Tracking: Real-time progress updates during downloads

📋 Requirements

Python Packages

# Core dependencies
pip install cdsapi pyyaml

# Optional but recommended
pip install tqdm  # For enhanced progress bars

CDS API Setup

Register at Climate Data Store: https://cds.climate.copernicus.eu/
Accept Terms: Go to any ERA5 dataset page and accept the license terms
Get API Key: Visit https://cds.climate.copernicus.eu/user and copy your UID and API key
Configure your credentials:

Create ~/.cdsapirc file:

url: https://cds.climate.copernicus.eu/api
key: UID:API_KEY

Replace UID:API_KEY with your actual credentials (e.g., 12345:abcd-efgh-ijkl-mnop).

Important: Set proper permissions:

chmod 600 ~/.cdsapirc

📁 Directory Structure

/DATA/datasets/era5/
├── pressure_levels/      # 3D atmospheric data (GRIB files)
├── single_levels/        # 2D surface data (GRIB files)
├── logs/                 # Download logs
├── temp/                 # Temporary files (auto-cleaned)
├── era5_downloader.py    # Main download script
├── config_wrf.yaml       # Configuration for WRF preprocessing
├── config_surface_only.yaml  # Surface data only config
└── README.txt            # This file

🚀 Quick Start

1. Basic Usage (WRF Preprocessing)

Download data for WRF simulation:

python era5_downloader.py -c config_wrf.yaml

2. Surface Data Only

For land surface or climate studies:

python era5_downloader.py -c config_surface_only.yaml

3. Dry Run (Test Configuration)

Validate your configuration without downloading:

python era5_downloader.py -c config_wrf.yaml --dry-run

4. Debug Mode

For detailed debugging information:

python era5_downloader.py -c config_wrf.yaml --log-level DEBUG

⚙️ Configuration Guide

Configuration File Structure

The YAML configuration file has 5 main sections:

1. Temporal Settings

temporal:
  start_date: "2025-01-01"
  end_date: "2025-12-31"
  times:
    - "00:00"
    - "06:00"
    - "12:00"
    - "18:00"

Tips:

Use hourly data (00:00-23:00) for high-resolution WRF runs
Use 6-hourly (00:00, 06:00, 12:00, 18:00) for faster downloads
Consider your WRF time step when choosing temporal resolution

2. Spatial Settings

spatial:
  area: [60, -30, 20, 25]  # [North, West, South, East]
  grid: null  # Native 0.25° or "0.5/0.5", "1.0/1.0"
  pressure_levels: ["1000", "925", "850", ...]

Domain Guidelines:

WRF Domain 1: Should be 3-5° larger than your actual domain for proper boundary conditions
Nested Domains: Download for outermost domain only; WRF handles nesting
Grid Resolution:
- null: Native 0.25° (best quality, larger files)
- "0.5/0.5": Half degree (good balance)
- "1.0/1.0": One degree (fastest, suitable for large domains)

Common Domains:

# Iberian Peninsula (good for Spanish WRF runs)
area: [44, -10, 36, 4]

# Western Mediterranean
area: [45, -5, 35, 15]

# Europe
area: [72, -25, 34, 40]

# Global
area: [90, -180, -90, 180]

3. Variables

Pressure Levels (3D atmospheric data):

variables:
  pressure_levels:
    - geopotential           # Required for WRF
    - temperature            # Required for WRF
    - u_component_of_wind    # Required for WRF
    - v_component_of_wind    # Required for WRF
    - relative_humidity      # Required for WRF

Single Levels (2D surface data):

  single_levels:
    - 2m_temperature
    - surface_pressure       # Required for WRF
    - soil_temperature_level_1
    - volumetric_soil_water_layer_1
    # ... etc

Essential WRF Variables:

Pressure levels: geopotential, temperature, U/V winds, relative humidity
Surface: 2m temperature, surface pressure, soil temperatures/moisture, SST

All Available Variables: See https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels

4. Output Settings

output:
  base_dir: "/DATA/datasets/era5"
  format: "grib"  # or "netcdf"

Format Recommendations:

GRIB: Best for WRF (directly compatible with WPS ungrib)
NetCDF: Better for Python analysis (xarray, pandas)

5. Download Settings

download:
  levels:
    - pressure
    - surface
  chunk_size: "month"  # "day", "week", or "month"
  max_workers: 4
  skip_existing: true

Optimization Tips:

Period Length	Chunk Size	Workers	Reasoning
< 1 month	`day`	2-3	Fast turnaround, minimal overhead
1-6 months	`week`	3-5	Good balance
> 6 months	`month`	4-8	Most efficient for long periods

Worker Guidelines:

Local PC: 2-4 workers (avoid overwhelming CDS)
HPC: 4-8 workers (check cluster policy)
Note: CDS API has rate limits; more workers doesn't always mean faster

📊 Understanding Download Performance

Expected Download Times

Approximate times for different configurations (depends on CDS load):

Configuration	Data Volume	Time (4 workers)
3 days, full domain, hourly	~500 MB	15-30 min
1 month, region, hourly	~2 GB	1-2 hours
1 year, region, hourly	~25 GB	8-12 hours
1 year, region, 6-hourly	~10 GB	3-6 hours

Optimization Strategies

For Full Year Downloads:

temporal:
  times: ["00:00", "06:00", "12:00", "18:00"]  # 6-hourly instead of hourly
spatial:
  grid: "0.5/0.5"  # Coarser grid
download:
  chunk_size: "month"
  max_workers: 6

For High-Resolution Short Periods:

temporal:
  times: ["00:00", "01:00", ..., "23:00"]  # All hours
spatial:
  grid: null  # Native 0.25°
download:
  chunk_size: "day"
  max_workers: 3

🔍 Monitoring Downloads

Real-time Progress

The script provides detailed logging:

2026-02-11 14:30:15 - INFO - ERA5 PARALLEL DOWNLOAD STARTING
2026-02-11 14:30:15 - INFO - Period: 2025-01-01 to 2025-12-31
2026-02-11 14:30:15 - INFO - Generated 12 date chunks for download
2026-02-11 14:30:15 - INFO - DOWNLOADING PRESSURE LEVELS
2026-02-11 14:32:45 - INFO - [1/12] SUCCESS: 2025-01-01 to 2025-01-31
2026-02-11 14:34:20 - INFO - [2/12] SUCCESS: 2025-02-01 to 2025-02-28
...

Log Files

All logs are saved in /DATA/datasets/era5/logs/:

era5_download_20260211_143015.log

Each log contains:

Configuration summary
Individual download status
Error messages (if any)
Final summary with success/failure counts

Checking Download Status

# View real-time log
tail -f /DATA/datasets/era5/logs/era5_download_*.log

# Check downloaded files
ls -lh /DATA/datasets/era5/pressure_levels/
ls -lh /DATA/datasets/era5/single_levels/

# Count successful downloads
ls /DATA/datasets/era5/pressure_levels/ | wc -l

🛠️ Common Use Cases

Use Case 1: WRF Real-Time Forecasting

Download most recent month for operational WRF:

temporal:
  start_date: "2025-02-01"
  end_date: "2025-02-28"
  times: ["00:00", "03:00", "06:00", "09:00", "12:00", "15:00", "18:00", "21:00"]

spatial:
  area: [45, -10, 35, 5]  # Your domain + margin
  grid: null

download:
  chunk_size: "week"
  max_workers: 4
  levels: [pressure, surface]

Use Case 2: Climate Study (Long Period)

Download 10 years at 6-hourly resolution:

temporal:
  start_date: "2010-01-01"
  end_date: "2019-12-31"
  times: ["00:00", "06:00", "12:00", "18:00"]

spatial:
  area: [44, -10, 36, 4]
  grid: "0.5/0.5"  # Reduce size

download:
  chunk_size: "month"
  max_workers: 8
  levels: [surface]  # Surface only for climate stats

Use Case 3: High-Resolution Event Study

Download extreme weather event with maximum detail:

temporal:
  start_date: "2025-06-15"
  end_date: "2025-06-17"
  times: ["00:00", "01:00", ..., "23:00"]  # Hourly

spatial:
  area: [42, 0, 40, 3]  # Small domain
  grid: null  # Native resolution
  pressure_levels: ["1000", "925", "850", "700", "500", "300", "200"]  # Key levels

download:
  chunk_size: "day"
  max_workers: 2
  levels: [pressure, surface]

⚠️ Troubleshooting

Issue 1: CDS API Authentication Error

Error: Client is not available

Solutions:

Check ~/.cdsapirc exists and has correct format
Verify credentials are correct (copy from CDS website)
Check file permissions: chmod 600 ~/.cdsapirc
Accept dataset license terms on CDS website

Issue 2: Request Queue Timeout

Error: Request timeout or stuck in queue

Solutions:

Reduce max_workers (CDS limits concurrent requests per user)
Use larger chunk_size (fewer, larger requests are more efficient)
Try downloading during off-peak hours (avoid 9am-5pm CET)
Check CDS status: https://cds.climate.copernicus.eu/live/queue

Issue 3: Disk Space

Error: No space left on device

Solutions:

Check available space: df -h /DATA
ERA5 data is large:
- 1 year hourly full domain ≈ 25-50 GB
- GRIB is ~40% smaller than NetCDF
Use coarser grid or 6-hourly data
Download in smaller time chunks

Issue 4: Slow Downloads

Symptoms: Downloads taking much longer than expected

Solutions:

Check CDS load: High user demand slows everyone
Reduce grid resolution: Request native 0.25° is slower than regridded
Use monthly chunks: More efficient than daily for long periods
Limit workers: Sometimes 2-3 workers faster than 8 (less queue contention)
Check network: Run speedtest-cli to verify connection

Issue 5: Missing Variables

Error: Invalid variable name

Solutions:

Check variable names at: https://cds.climate.copernicus.eu/datasets
Some variables only available in certain datasets:
- Wave variables: use reanalysis-era5-single-levels only
- Model levels: use reanalysis-era5-complete
Variable names are case-sensitive and use underscores

🔄 Integration with WRF/WPS

Step 1: Download ERA5 Data

python era5_downloader.py -c config_wrf.yaml

Step 2: Prepare for WPS

Your downloaded files are ready for WPS ungrib:

cd /DATA/datasets/era5

# Link to WPS directory
ln -sf pressure_levels/*.grib /path/to/WPS/
ln -sf single_levels/*.grib /path/to/WPS/

Step 3: Configure WPS

In namelist.wps, set:

&ungrib
 out_format = 'WPS',
 prefix = 'ERA5',
/

Step 4: Link Vtable

cd /path/to/WPS
ln -sf ungrib/Variable_Tables/Vtable.ERA-interim.pl Vtable
# Note: ERA-interim Vtable works for ERA5

Step 5: Run Ungrib

./link_grib.csh /DATA/datasets/era5/pressure_levels/era5_pl_*.grib
./link_grib.csh /DATA/datasets/era5/single_levels/era5_sl_*.grib
./ungrib.exe

📚 Additional Resources

ERA5 Documentation

Dataset Overview: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels
ERA5 Paper: Hersbach et al. (2020), Q.J.R. Meteorol. Soc.
Variable List: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation

WRF Documentation

WRF Users Guide: https://www2.mmm.ucar.edu/wrf/users/
WPS Tutorial: https://www2.mmm.ucar.edu/wrf/OnLineTutorial/
ERA5 for WRF: Check WRF forum for latest best practices

CDS API

API Documentation: https://cds.climate.copernicus.eu/how-to-api
Python Examples: https://github.com/ecmwf/cdsapi
Forum: https://forum.ecmwf.int/

🤝 Contributing & Support

Reporting Issues

If you encounter problems:

Check the troubleshooting section above
Review log files in /DATA/datasets/era5/logs/
Test with --dry-run flag
Try a minimal configuration first

Customization

The script is designed to be easily customized:

Modify variables section for different applications
Adjust chunk_size and max_workers for your environment
Create new config files for different projects
Add custom post-processing functions

Performance Tuning

For HPC environments, consider:

download:
  max_workers: 8  # Increase on HPC
  chunk_size: "month"  # Efficient for clusters

And run with:

# Submit as SLURM job
sbatch --time=12:00:00 --cpus-per-task=8 download_era5.sh

📝 Citation

If you use this tool in research, please cite:

ERA5: Hersbach et al. (2020), The ERA5 global reanalysis. Q.J.R. Meteorol. Soc., 146: 1999-2049.
Parallel Processing Approach: Based on methods by Akash Pathaikara (2024)

📄 License

This tool is provided for research and operational use. ERA5 data is subject to Copernicus License.

Last Updated: February 2026
Version: 1.0
Compatible with: ERA5, ERA5-Land, ERA5.1
Python: 3.8+
Tested on: Linux/Unix HPC systems

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
GUIA_CATALA.txt		GUIA_CATALA.txt
LICENSE		LICENSE
README.md		README.md
config_surface_only.yaml		config_surface_only.yaml
config_wrf.yaml		config_wrf.yaml
config_year_2025.yaml		config_year_2025.yaml
era5_downloader.py		era5_downloader.py
install.sh		install.sh
requirements.txt		requirements.txt
slurm_download_era5.sh		slurm_download_era5.sh
validate_config.py		validate_config.py

Folders and files

Latest commit

History

Repository files navigation

era5

ERA5 Parallel Downloader

🌟 Features

📋 Requirements

Python Packages

CDS API Setup

📁 Directory Structure

🚀 Quick Start

1. Basic Usage (WRF Preprocessing)

2. Surface Data Only

3. Dry Run (Test Configuration)

4. Debug Mode

⚙️ Configuration Guide

Configuration File Structure

1. Temporal Settings

2. Spatial Settings

3. Variables

4. Output Settings

5. Download Settings

📊 Understanding Download Performance

Expected Download Times

Optimization Strategies

🔍 Monitoring Downloads

Real-time Progress

Log Files

Checking Download Status

🛠️ Common Use Cases

Use Case 1: WRF Real-Time Forecasting

Use Case 2: Climate Study (Long Period)

Use Case 3: High-Resolution Event Study

⚠️ Troubleshooting

Issue 1: CDS API Authentication Error

Issue 2: Request Queue Timeout

Issue 3: Disk Space

Issue 4: Slow Downloads

Issue 5: Missing Variables

🔄 Integration with WRF/WPS

Step 1: Download ERA5 Data

Step 2: Prepare for WPS

Step 3: Configure WPS

Step 4: Link Vtable

Step 5: Run Ungrib

📚 Additional Resources

ERA5 Documentation

WRF Documentation

CDS API

🤝 Contributing & Support

Reporting Issues

Customization

Performance Tuning

📝 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages