Skip to content

Conversation

@glwagner
Copy link
Member

@glwagner glwagner commented Jan 8, 2026

This PR adds a case for the "idealized GATE" simulation of tropical deep convection over the ocean which is implemented in the System for Atmospheric Modeling, described by Khairoutdinov et al 2009. Some performance results were reported by Bogenschutz et al 2025 for the doubly-periodic version of SCREAM in which 3.6 simulated days per day were achieved with 384 A100s.

Using much simpler microphysics and no radiation scheme, the same resolution for us gets ~1.2 SDPD on a single H200 at single precision --- 24 hours of simulation time takes ~17 hours of wall time on one GPU. Here is an animation from that simulation:

gate.mp4

We should work towards running this case with P3 microphysics (preliminary draft work on #395) and interactive radiation to refine the benchmark. I believe this can also serve as a canonical case for performance benchmarking.

I am unsure how to do contribute this case to the repo. At full resolution the grid looks like

julia> grid
2048×2048×181 RectilinearGrid{Float32, Periodic, Periodic, Bounded} on CUDAGPU with 5×5×5 halo
├── Periodic x  [0.0, 204800.0) regularly spaced with Δx=100.0
├── Periodic y  [0.0, 204800.0) regularly spaced with Δy=100.0
└── Bounded  z  [0.0, 27000.0]  variably spaced with min(Δz)=50.0, max(Δz)=300.0

and the simulation state consumes ~96 GB:

⚡ glw/gate2 ~/Breeze.jl nvidia-smi
Thu Jan  8 14:56:59 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    Off |   00000000:8D:00.0 Off |                    0 |
| N/A   29C    P0            117W /  700W |   95374MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

even at quarter resolution (512x512x181) this is a large simulation. Possibly we cannot afford to run the full 24 hours of this case in CI, even infrequently. Perhaps the thing to do is just to test that the setup works by building it at low resolution and running a few time-steps. And we will report results by manually uploading figures / movies to the documentation (if that is desired).

@glwagner glwagner marked this pull request as draft January 8, 2026 15:02
@glwagner glwagner marked this pull request as draft January 8, 2026 15:02
@giordano
Copy link
Collaborator

giordano commented Jan 9, 2026

Do you mind if I do some git history surgery to remove the commits from #369? 🥲

@glwagner
Copy link
Member Author

glwagner commented Jan 9, 2026

Do you mind if I do some git history surgery to remove the commits from #369? 🥲

no not at all. I can also just put this file into a clean branch.

@giordano
Copy link
Collaborator

giordano commented Jan 9, 2026

Surgery done 😁

@navidcy
Copy link
Member

navidcy commented Jan 10, 2026

Did you mean to include this in the list of example in docs/make.jl?

@giordano
Copy link
Collaborator

See all the discussion above starting from

I am unsure how to do contribute this case to the repo

Summary: this simulation is particularly expensive, even at a reduced resolution it'd be impractical for running it on CI.

@glwagner
Copy link
Member Author

glwagner commented Jan 14, 2026

See all the discussion above starting from

I am unsure how to do contribute this case to the repo

Summary: this simulation is particularly expensive, even at a reduced resolution it'd be impractical for running it on CI.

So the question is: do we just place this script, untested, in the repo somewhere (eg in validation/)? Or should we put together a pipeline that ensures the script does run, even if we can't evaluate the fidelity of the results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants