Skip to content

atoledano-meteo/era5

Repository files navigation

era5

ERA5 Parallel Downloader

Professional tool for downloading ERA5 reanalysis data with parallel processing, optimized for WRF/WPS preprocessing and HPC environments.

🌟 Features

  • ⚡ Parallel Processing: Download multiple time periods simultaneously using multiprocessing
  • 🔧 Flexible Configuration: YAML-based configuration for easy customization
  • 📊 Comprehensive Logging: Detailed logs for monitoring and debugging
  • 🔄 Resume Capability: Skip already downloaded files automatically
  • 🎯 WRF-Optimized: Pre-configured with standard WRF pressure levels and variables
  • 🚀 HPC Compatible: Designed for use on high-performance computing clusters
  • ⚠️ Error Handling: Robust error handling with detailed error reporting
  • 📈 Progress Tracking: Real-time progress updates during downloads

📋 Requirements

Python Packages

# Core dependencies
pip install cdsapi pyyaml

# Optional but recommended
pip install tqdm  # For enhanced progress bars

CDS API Setup

  1. Register at Climate Data Store: https://cds.climate.copernicus.eu/
  2. Accept Terms: Go to any ERA5 dataset page and accept the license terms
  3. Get API Key: Visit https://cds.climate.copernicus.eu/user and copy your UID and API key
  4. Configure your credentials:

Create ~/.cdsapirc file:

url: https://cds.climate.copernicus.eu/api
key: UID:API_KEY

Replace UID:API_KEY with your actual credentials (e.g., 12345:abcd-efgh-ijkl-mnop).

Important: Set proper permissions:

chmod 600 ~/.cdsapirc

📁 Directory Structure

/DATA/datasets/era5/
├── pressure_levels/      # 3D atmospheric data (GRIB files)
├── single_levels/        # 2D surface data (GRIB files)
├── logs/                 # Download logs
├── temp/                 # Temporary files (auto-cleaned)
├── era5_downloader.py    # Main download script
├── config_wrf.yaml       # Configuration for WRF preprocessing
├── config_surface_only.yaml  # Surface data only config
└── README.txt            # This file

🚀 Quick Start

1. Basic Usage (WRF Preprocessing)

Download data for WRF simulation:

python era5_downloader.py -c config_wrf.yaml

2. Surface Data Only

For land surface or climate studies:

python era5_downloader.py -c config_surface_only.yaml

3. Dry Run (Test Configuration)

Validate your configuration without downloading:

python era5_downloader.py -c config_wrf.yaml --dry-run

4. Debug Mode

For detailed debugging information:

python era5_downloader.py -c config_wrf.yaml --log-level DEBUG

⚙️ Configuration Guide

Configuration File Structure

The YAML configuration file has 5 main sections:

1. Temporal Settings

temporal:
  start_date: "2025-01-01"
  end_date: "2025-12-31"
  times:
    - "00:00"
    - "06:00"
    - "12:00"
    - "18:00"

Tips:

  • Use hourly data (00:00-23:00) for high-resolution WRF runs
  • Use 6-hourly (00:00, 06:00, 12:00, 18:00) for faster downloads
  • Consider your WRF time step when choosing temporal resolution

2. Spatial Settings

spatial:
  area: [60, -30, 20, 25]  # [North, West, South, East]
  grid: null  # Native 0.25° or "0.5/0.5", "1.0/1.0"
  pressure_levels: ["1000", "925", "850", ...]

Domain Guidelines:

  • WRF Domain 1: Should be 3-5° larger than your actual domain for proper boundary conditions
  • Nested Domains: Download for outermost domain only; WRF handles nesting
  • Grid Resolution:
    • null: Native 0.25° (best quality, larger files)
    • "0.5/0.5": Half degree (good balance)
    • "1.0/1.0": One degree (fastest, suitable for large domains)

Common Domains:

# Iberian Peninsula (good for Spanish WRF runs)
area: [44, -10, 36, 4]

# Western Mediterranean
area: [45, -5, 35, 15]

# Europe
area: [72, -25, 34, 40]

# Global
area: [90, -180, -90, 180]

3. Variables

Pressure Levels (3D atmospheric data):

variables:
  pressure_levels:
    - geopotential           # Required for WRF
    - temperature            # Required for WRF
    - u_component_of_wind    # Required for WRF
    - v_component_of_wind    # Required for WRF
    - relative_humidity      # Required for WRF

Single Levels (2D surface data):

  single_levels:
    - 2m_temperature
    - surface_pressure       # Required for WRF
    - soil_temperature_level_1
    - volumetric_soil_water_layer_1
    # ... etc

Essential WRF Variables:

  • Pressure levels: geopotential, temperature, U/V winds, relative humidity
  • Surface: 2m temperature, surface pressure, soil temperatures/moisture, SST

All Available Variables: See https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels

4. Output Settings

output:
  base_dir: "/DATA/datasets/era5"
  format: "grib"  # or "netcdf"

Format Recommendations:

  • GRIB: Best for WRF (directly compatible with WPS ungrib)
  • NetCDF: Better for Python analysis (xarray, pandas)

5. Download Settings

download:
  levels:
    - pressure
    - surface
  chunk_size: "month"  # "day", "week", or "month"
  max_workers: 4
  skip_existing: true

Optimization Tips:

Period Length Chunk Size Workers Reasoning
< 1 month day 2-3 Fast turnaround, minimal overhead
1-6 months week 3-5 Good balance
> 6 months month 4-8 Most efficient for long periods

Worker Guidelines:

  • Local PC: 2-4 workers (avoid overwhelming CDS)
  • HPC: 4-8 workers (check cluster policy)
  • Note: CDS API has rate limits; more workers doesn't always mean faster

📊 Understanding Download Performance

Expected Download Times

Approximate times for different configurations (depends on CDS load):

Configuration Data Volume Time (4 workers)
3 days, full domain, hourly ~500 MB 15-30 min
1 month, region, hourly ~2 GB 1-2 hours
1 year, region, hourly ~25 GB 8-12 hours
1 year, region, 6-hourly ~10 GB 3-6 hours

Optimization Strategies

For Full Year Downloads:

temporal:
  times: ["00:00", "06:00", "12:00", "18:00"]  # 6-hourly instead of hourly
spatial:
  grid: "0.5/0.5"  # Coarser grid
download:
  chunk_size: "month"
  max_workers: 6

For High-Resolution Short Periods:

temporal:
  times: ["00:00", "01:00", ..., "23:00"]  # All hours
spatial:
  grid: null  # Native 0.25°
download:
  chunk_size: "day"
  max_workers: 3

🔍 Monitoring Downloads

Real-time Progress

The script provides detailed logging:

2026-02-11 14:30:15 - INFO - ERA5 PARALLEL DOWNLOAD STARTING
2026-02-11 14:30:15 - INFO - Period: 2025-01-01 to 2025-12-31
2026-02-11 14:30:15 - INFO - Generated 12 date chunks for download
2026-02-11 14:30:15 - INFO - DOWNLOADING PRESSURE LEVELS
2026-02-11 14:32:45 - INFO - [1/12] SUCCESS: 2025-01-01 to 2025-01-31
2026-02-11 14:34:20 - INFO - [2/12] SUCCESS: 2025-02-01 to 2025-02-28
...

Log Files

All logs are saved in /DATA/datasets/era5/logs/:

era5_download_20260211_143015.log

Each log contains:

  • Configuration summary
  • Individual download status
  • Error messages (if any)
  • Final summary with success/failure counts

Checking Download Status

# View real-time log
tail -f /DATA/datasets/era5/logs/era5_download_*.log

# Check downloaded files
ls -lh /DATA/datasets/era5/pressure_levels/
ls -lh /DATA/datasets/era5/single_levels/

# Count successful downloads
ls /DATA/datasets/era5/pressure_levels/ | wc -l

🛠️ Common Use Cases

Use Case 1: WRF Real-Time Forecasting

Download most recent month for operational WRF:

temporal:
  start_date: "2025-02-01"
  end_date: "2025-02-28"
  times: ["00:00", "03:00", "06:00", "09:00", "12:00", "15:00", "18:00", "21:00"]

spatial:
  area: [45, -10, 35, 5]  # Your domain + margin
  grid: null

download:
  chunk_size: "week"
  max_workers: 4
  levels: [pressure, surface]

Use Case 2: Climate Study (Long Period)

Download 10 years at 6-hourly resolution:

temporal:
  start_date: "2010-01-01"
  end_date: "2019-12-31"
  times: ["00:00", "06:00", "12:00", "18:00"]

spatial:
  area: [44, -10, 36, 4]
  grid: "0.5/0.5"  # Reduce size

download:
  chunk_size: "month"
  max_workers: 8
  levels: [surface]  # Surface only for climate stats

Use Case 3: High-Resolution Event Study

Download extreme weather event with maximum detail:

temporal:
  start_date: "2025-06-15"
  end_date: "2025-06-17"
  times: ["00:00", "01:00", ..., "23:00"]  # Hourly

spatial:
  area: [42, 0, 40, 3]  # Small domain
  grid: null  # Native resolution
  pressure_levels: ["1000", "925", "850", "700", "500", "300", "200"]  # Key levels

download:
  chunk_size: "day"
  max_workers: 2
  levels: [pressure, surface]

⚠️ Troubleshooting

Issue 1: CDS API Authentication Error

Error: Client is not available

Solutions:

  1. Check ~/.cdsapirc exists and has correct format
  2. Verify credentials are correct (copy from CDS website)
  3. Check file permissions: chmod 600 ~/.cdsapirc
  4. Accept dataset license terms on CDS website

Issue 2: Request Queue Timeout

Error: Request timeout or stuck in queue

Solutions:

  1. Reduce max_workers (CDS limits concurrent requests per user)
  2. Use larger chunk_size (fewer, larger requests are more efficient)
  3. Try downloading during off-peak hours (avoid 9am-5pm CET)
  4. Check CDS status: https://cds.climate.copernicus.eu/live/queue

Issue 3: Disk Space

Error: No space left on device

Solutions:

  1. Check available space: df -h /DATA
  2. ERA5 data is large:
    • 1 year hourly full domain ≈ 25-50 GB
    • GRIB is ~40% smaller than NetCDF
  3. Use coarser grid or 6-hourly data
  4. Download in smaller time chunks

Issue 4: Slow Downloads

Symptoms: Downloads taking much longer than expected

Solutions:

  1. Check CDS load: High user demand slows everyone
  2. Reduce grid resolution: Request native 0.25° is slower than regridded
  3. Use monthly chunks: More efficient than daily for long periods
  4. Limit workers: Sometimes 2-3 workers faster than 8 (less queue contention)
  5. Check network: Run speedtest-cli to verify connection

Issue 5: Missing Variables

Error: Invalid variable name

Solutions:

  1. Check variable names at: https://cds.climate.copernicus.eu/datasets
  2. Some variables only available in certain datasets:
    • Wave variables: use reanalysis-era5-single-levels only
    • Model levels: use reanalysis-era5-complete
  3. Variable names are case-sensitive and use underscores

🔄 Integration with WRF/WPS

Step 1: Download ERA5 Data

python era5_downloader.py -c config_wrf.yaml

Step 2: Prepare for WPS

Your downloaded files are ready for WPS ungrib:

cd /DATA/datasets/era5

# Link to WPS directory
ln -sf pressure_levels/*.grib /path/to/WPS/
ln -sf single_levels/*.grib /path/to/WPS/

Step 3: Configure WPS

In namelist.wps, set:

&ungrib
 out_format = 'WPS',
 prefix = 'ERA5',
/

Step 4: Link Vtable

cd /path/to/WPS
ln -sf ungrib/Variable_Tables/Vtable.ERA-interim.pl Vtable
# Note: ERA-interim Vtable works for ERA5

Step 5: Run Ungrib

./link_grib.csh /DATA/datasets/era5/pressure_levels/era5_pl_*.grib
./link_grib.csh /DATA/datasets/era5/single_levels/era5_sl_*.grib
./ungrib.exe

📚 Additional Resources

ERA5 Documentation

WRF Documentation

CDS API

🤝 Contributing & Support

Reporting Issues

If you encounter problems:

  1. Check the troubleshooting section above
  2. Review log files in /DATA/datasets/era5/logs/
  3. Test with --dry-run flag
  4. Try a minimal configuration first

Customization

The script is designed to be easily customized:

  • Modify variables section for different applications
  • Adjust chunk_size and max_workers for your environment
  • Create new config files for different projects
  • Add custom post-processing functions

Performance Tuning

For HPC environments, consider:

download:
  max_workers: 8  # Increase on HPC
  chunk_size: "month"  # Efficient for clusters

And run with:

# Submit as SLURM job
sbatch --time=12:00:00 --cpus-per-task=8 download_era5.sh

📝 Citation

If you use this tool in research, please cite:

  • ERA5: Hersbach et al. (2020), The ERA5 global reanalysis. Q.J.R. Meteorol. Soc., 146: 1999-2049.
  • Parallel Processing Approach: Based on methods by Akash Pathaikara (2024)

📄 License

This tool is provided for research and operational use. ERA5 data is subject to Copernicus License.


Last Updated: February 2026
Version: 1.0
Compatible with: ERA5, ERA5-Land, ERA5.1
Python: 3.8+
Tested on: Linux/Unix HPC systems

About

Descàrrega ERA5 Grup Meteorologia UB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors