Skip to content

Conversation

@taimoorsohail
Copy link
Collaborator

@taimoorsohail taimoorsohail commented Mar 7, 2025

This is an attempt to implement a checkpointer for the coupled simulation. The checkpointer should checkpoint all components of the coupled model (ocean, sea ice, atmosphere, radiation) required to restart the coupled simulation.

Work in progress still with @navidcy

Potentially superseding #374.

@navidcy
Copy link
Member

navidcy commented Mar 7, 2025

@taimoorsohail now I get:

julia> include("checkpointer_mwe.jl")
┌ Warning: Are you totally, 100% sure that you want to build a simulation on
│ 
│ 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
│ 
│ rather than on an ImmersedBoundaryGrid?
└ @ ClimaOcean.OceanSimulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/taimoor-ClimaOcean.jl/src/OceanSimulations/ocean_simulation.jl:141
[ Info: I went in your new method

!

@navidcy
Copy link
Member

navidcy commented Mar 7, 2025

@taimoorsohail now I get:

julia> include("checkpointer_mwe.jl")
┌ Warning: Are you totally, 100% sure that you want to build a simulation on
│ 
│ 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
│ 
│ rather than on an ImmersedBoundaryGrid?
└ @ ClimaOcean.OceanSimulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/taimoor-ClimaOcean.jl/src/OceanSimulations/ocean_simulation.jl:141
[ Info: I went in your new method

!

Another way to see that is

julia> using ClimaOcean
[ Info: Oceananigans will use 12 threads

julia> methods(Checkpointer)
# 3 methods for type constructor:
 [1] Checkpointer(coupled_model::OceanSeaIceModel; schedule, dir, prefix, overwrite_existing, verbose, cleanup, properties)
     @ ClimaOcean.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/taimoor-ClimaOcean.jl/src/OutputWriters.jl:9
 [2] Checkpointer(model; schedule, dir, prefix, overwrite_existing, verbose, cleanup, properties)
     @ ~/.julia/packages/Oceananigans/3CoZp/src/OutputWriters/checkpointer.jl:77
 [3] Checkpointer(schedule::T, dir::String, prefix::String, properties::P, overwrite_existing::Bool, verbose::Bool, cleanup::Bool) where {T, P}
     @ ~/.julia/packages/Oceananigans/3CoZp/src/OutputWriters/checkpointer.jl:11

which shows that now there is a method coming from ClimaOcean!

@taimoorsohail
Copy link
Collaborator Author

Great! I am getting the same, I fixed the bug in the precompiling also.

@taimoorsohail
Copy link
Collaborator Author

I've added an atmosphere_time callback to see if there is an issue in the atmosphere clock using our "hacky" checkpoint. Indeed, the simulation and atmosphere time become desynced upon picking up the checkpointer:

[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 1.667 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 258.344 ms
[ Info: simulation run for 10 iterations; you should have a checkpointer at 8
┌ Warning: Tendencies for η do not exist in checkpoint and could not be restored.
└ @ Oceananigans.OutputWriters ~/.julia/packages/Oceananigans/3CoZp/src/OutputWriters/checkpointer.jl:276
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (77.671 ms)
[ Info: Executing initial time step...
[ Info: Iter: 9, **simulation time: 1.500 minutes, atmosphere time: 1.833 minutes**, Δt: 10 seconds, max|u|: (2.91e-02, 1.27e-02, 7.94e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 46.141 seconds
[ Info:     ... initial time step complete (367.514 ms).
[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 2 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 267.602 ms

@taimoorsohail
Copy link
Collaborator Author

I've added an atmosphere_time callback to see if there is an issue in the atmosphere clock using our "hacky" checkpoint. Indeed, the simulation and atmosphere time become desynced upon picking up the checkpointer:

[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 1.667 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 258.344 ms
[ Info: simulation run for 10 iterations; you should have a checkpointer at 8
┌ Warning: Tendencies for η do not exist in checkpoint and could not be restored.
└ @ Oceananigans.OutputWriters ~/.julia/packages/Oceananigans/3CoZp/src/OutputWriters/checkpointer.jl:276
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (77.671 ms)
[ Info: Executing initial time step...
[ Info: Iter: 9, **simulation time: 1.500 minutes, atmosphere time: 1.833 minutes**, Δt: 10 seconds, max|u|: (2.91e-02, 1.27e-02, 7.94e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 46.141 seconds
[ Info:     ... initial time step complete (367.514 ms).
[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 2 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 267.602 ms

And starting from scratch by picking up a checkpoint:

┌ Warning: Tendencies for η do not exist in checkpoint and could not be restored.
└ @ Oceananigans.OutputWriters ~/.julia/packages/Oceananigans/3CoZp/src/OutputWriters/checkpointer.jl:276
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (74.165 ms)
[ Info: Executing initial time step...
[ Info: Iter: 9, simulation time: 1.500 minutes, atmosphere time: 10 seconds, Δt: 10 seconds, max|u|: (2.91e-02, 1.27e-02, 7.94e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 44.409 seconds
[ Info:     ... initial time step complete (542.189 ms).
[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 20 seconds, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 264.473 ms

@taimoorsohail
Copy link
Collaborator Author

taimoorsohail commented Mar 7, 2025

Hey @navidcy I tried to implement a time syncing with the atmosphere (see my latest commit). It isn't working but if you figure it out let me know.

@glwagner
Copy link
Member

glwagner commented Mar 7, 2025

I think this PR might conflict with #355

@taimoorsohail
Copy link
Collaborator Author

I'm not convinced syncing the atmosphere time with the ocean is the best way to implement the checkpointer anyway, as this would need to be added for any additional components used to drive the ocean in the future.

Also, I haven't yet figured out how the radiation time steps as that would also need to be synced upon picking up.

@taimoorsohail
Copy link
Collaborator Author

taimoorsohail commented Mar 11, 2025

Moved to #401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

output 💾 user interface When humans and machines miscommunicate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants