Skip to content

Optimize AIFS deterministic forecast chunk/shard layout and fix metadata#474

Closed
mrshll wants to merge 1 commit intoecmwf-aifsfrom
claude/review-pr-449-fixes-aHhkp
Closed

Optimize AIFS deterministic forecast chunk/shard layout and fix metadata#474
mrshll wants to merge 1 commit intoecmwf-aifsfrom
claude/review-pr-449-fixes-aHhkp

Conversation

@mrshll
Copy link
Member

@mrshll mrshll commented Feb 27, 2026

Summary

This PR optimizes the Zarr chunk and shard layout for the ECMWF AIFS deterministic forecast dataset and corrects variable metadata to align with CF conventions and dataset naming standards.

Key Changes

Chunk and Shard Optimization:

  • Changed init_time chunks from 4 to 1 (one initialization per shard, per forecast dataset best practices)
  • Increased spatial chunk sizes from 64×64 to 240×240 pixels for better I/O efficiency
  • Updated shard dimensions from 28×384×384 to 1×720×720, reducing compressed shard size from ~192MB to ~24MB
  • Adjusted spatial shard multipliers (6→3) to maintain 2 shards over the 721-pixel latitude dimension

Metadata Corrections:

  • Simplified dataset ID from ecmwf-aifs-deterministic-forecast-15-day-0-25-degree to ecmwf-aifs-deterministic-forecast (removes redundant resolution/duration info)
  • Simplified dataset name to ECMWF AIFS deterministic forecast (lowercase, removed resolution/duration)
  • Fixed precipitable_water_atmosphere variable metadata:
    • Changed short_name from tcw (Total Column Water) to pwat (Precipitable Water)
    • Changed long_name from Total column water to Precipitable water
    • This aligns with CF conventions and NOAA dataset naming standards

Test Updates:

  • Updated assertion style in dynamical_dataset_test.py to use np.testing.assert_allclose() for floating-point comparisons
  • Enhanced test coverage with snapshot assertions for both temperature and precipitation values at the test point after operational updates

Documentation:

  • Simplified dataset integration guide by removing redundant variable naming section and consolidating chunk/shard guidance
  • Removed outdated example references in documentation

Dependency Changes:

  • Removed unused ecmwf-api-client dependency from dev requirements
  • Cleaned up linting rule configuration for scripts

Implementation Details

The chunk/shard optimization was guided by the chunk/shard layout tool to achieve better compression ratios while maintaining efficient access patterns. The metadata corrections ensure consistency across the dataset repository and proper CF compliance.

https://claude.ai/code/session_01C5o4hepUqwYHeEeYk5rqjy

- Fix chunk/shard sizes: 1 init_time per chunk/shard, larger spatial chunks
  (240x240) per chunk_shard_size tool output
- Make precipitable_water_atmosphere metadata consistent across datasets
  (short_name=pwat, long_name=Precipitable water) and remove stale CF
  compliance exceptions
- Restructure dataset_integration_guide.md: remove ECMWF-specific examples,
  revert chunk/shard docs to simple wording, simplify storage config section,
  remove pre-submission checks (duplicate of AGENTS.md)
- Use np.testing.assert_allclose in integration test and add post-update
  snapshot values for temperature_2m and precipitation_rate_surface
- Remove ecmwf-api-client dev dependency and scripts ruff ignore rules
- Regenerate zarr template

https://claude.ai/code/session_01C5o4hepUqwYHeEeYk5rqjy
@mrshll mrshll closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants