Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
573a366
add SplitExplicit tests for checkpointer
navidcy Mar 20, 2025
41cb115
add SplitExplicit tests for checkpointer
navidcy Mar 20, 2025
0a52d4e
expose some checkpointer functionality; don't store properties as a C…
navidcy Mar 20, 2025
a776730
use julia v1.10.9
navidcy Mar 20, 2025
b66ae1b
pass properties to write_output!
navidcy Mar 20, 2025
be22062
bump patch release
navidcy Mar 20, 2025
0b67377
validate_properties -> validate_checkpointed_properties
navidcy Mar 20, 2025
fe8e70a
add default properties kwarg to write_output!
navidcy Mar 20, 2025
83f4ea2
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Mar 22, 2025
cbf323d
remove stray spaces and add backticks
navidcy Mar 25, 2025
40dccba
update docstring
navidcy Mar 25, 2025
38ef5b6
code alignment
navidcy Mar 25, 2025
e526910
merge main
navidcy Mar 25, 2025
43a644f
Merge branch 'ncc/checkopointer-shenanigans-2' of github.com:CliMA/Oc…
navidcy Mar 25, 2025
3aeeed7
set_clock! + clock.last_stage_Δt, clock.last_Δt in tick!(clock, Δt)
navidcy Mar 25, 2025
c0eb07b
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Mar 25, 2025
c3c007f
add set_clock! for OceananigansModels
navidcy Mar 25, 2025
a21ff8a
clock.last_Δt = Δt is part of tick!(clock, Δt)
navidcy Mar 25, 2025
3329188
add set_clock!(::Simulation, clock)
navidcy Mar 25, 2025
2f6ef7d
add docs for align_time_step
navidcy Mar 25, 2025
96286b1
wip
navidcy Apr 24, 2025
2686f8c
merge main
navidcy Apr 24, 2025
d5420c7
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy May 23, 2025
7607706
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jun 26, 2025
847c953
Update Project.toml
navidcy Jun 26, 2025
635550b
Update src/OutputWriters/checkpointer.jl
navidcy Jun 26, 2025
0ca81dd
Apply suggestions from code review
navidcy Jun 26, 2025
666dfcc
Update checkpointer.jl
navidcy Jun 26, 2025
97e3ede
Update simulation.jl
navidcy Jun 26, 2025
0992038
Update clock.jl
navidcy Jun 26, 2025
a9e6657
Update clock.jl
navidcy Jun 26, 2025
e4e07e0
clock from main
navidcy Jun 26, 2025
3da6865
updates in clock
navidcy Jun 26, 2025
b91aa9b
clock from main
navidcy Jun 26, 2025
0ac4427
updates in clock
navidcy Jun 26, 2025
cb01cf4
updates in clock
navidcy Jun 26, 2025
2ac952f
clock from main
navidcy Jun 26, 2025
a673011
Update clock.jl
navidcy Jun 26, 2025
28f6e39
import AbstractModel
navidcy Jun 27, 2025
9cface9
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 6, 2025
023df35
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 9, 2025
db85355
Update runge_kutta_3.jl
navidcy Jul 9, 2025
8c9f32d
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 9, 2025
ac6d76f
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 10, 2025
4319b16
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 16, 2025
05dc781
Merge branch 'main' into ncc/checkopointer-shenanigans-2
navidcy Jul 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/Models/Models.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ using Oceananigans.Utils: Time

import Oceananigans: initialize!
import Oceananigans.Architectures: architecture
import Oceananigans.TimeSteppers: reset!, set_clock!
import Oceananigans.Solvers: iteration
import Oceananigans.Simulations: timestepper
import Oceananigans.TimeSteppers: reset!, set_clock!
Expand Down
77 changes: 53 additions & 24 deletions src/OutputWriters/checkpointer.jl
Original file line number Diff line number Diff line change
@@ -1,24 +1,34 @@
using Glob

using Oceananigans
using Oceananigans: fields, prognostic_fields
using Oceananigans: AbstractModel, fields, prognostic_fields
using Oceananigans.Fields: offset_data
using Oceananigans.TimeSteppers: QuasiAdamsBashforth2TimeStepper

import Oceananigans.Fields: set!

mutable struct Checkpointer{T, P} <: AbstractOutputWriter
mutable struct Checkpointer{T} <: AbstractOutputWriter
schedule :: T
dir :: String
prefix :: String
properties :: P
overwrite_existing :: Bool
verbose :: Bool
cleanup :: Bool
end

required_checkpoint_properties(model) = [:grid, :clock]

# Certain properties are required for `set!` to pickup from a checkpoint.
function required_checkpointed_properties(model)
properties = [:grid, :clock]

if has_ab2_timestepper(model)
push!(properties, :timestepper)
end

return properties
end

"""
Checkpointer(model;
schedule,
Expand All @@ -30,16 +40,18 @@ required_checkpoint_properties(model) = [:grid, :clock]
properties = required_checkpoint_properties(model))

Construct a `Checkpointer` that checkpoints the model to a JLD2 file on `schedule.`
The `model.clock.iteration` is included in the filename to distinguish between multiple checkpoint files.
The `model.clock.iteration` is included in the filename to distinguish between multiple
checkpoint files.

To restart or "pickup" a model from a checkpoint, specify `pickup = true` when calling `run!`, ensuring
that the checkpoint file is in directory `dir`. See [`run!`](@ref) for more details.
To restart or "pickup" a model from a checkpoint, specify `pickup = true` when
calling `run!`, ensuring that the checkpoint file is in directory `dir`.
See [`run!`](@ref) for more details.

Note that extra model `properties` can be specified, but removing crucial properties
such as `:timestepper` will render restoring from the checkpoint impossible.
such as `:timestepper` might render restoring from the checkpoint impossible.

The checkpointer attempts to serialize as much of the model to disk as possible,
but functions or objects containing functions cannot be serialized at this time.
but note that functions or objects containing functions cannot be serialized.

Keyword arguments
=================
Expand Down Expand Up @@ -93,7 +105,7 @@ function Checkpointer(model; schedule,

mkpath(dir)

return Checkpointer(schedule, dir, prefix, properties, overwrite_existing, verbose, cleanup)
return Checkpointer(schedule, dir, prefix, overwrite_existing, verbose, cleanup)
end

#####
Expand Down Expand Up @@ -158,32 +170,43 @@ end
##### Writing checkpoints
#####

function write_output!(c::Checkpointer, model)
function write_output!(c::Checkpointer, model, addr=checkpointer_address(model))
filepath = checkpoint_path(model.clock.iteration, c)
c.verbose && @info "Checkpointing to file $filepath..."
addr = checkpointer_address(model)

t1 = time_ns()

jldopen(filepath, "w") do file
file["$addr/checkpointed_properties"] = c.properties
serializeproperties!(file, model, c.properties, addr)
model_fields = prognostic_fields(model)
field_names = keys(model_fields)
for name in field_names
full_address = "$addr/$name"
serializeproperty!(file, full_address, model_fields[name])
end
end
write_output!(c, model, filepath, "w")

t2, sz = time_ns(), filesize(filepath)

c.verbose && @info "Checkpointing done: time=$(prettytime((t2 - t1) * 1e-9)), size=$(pretty_filesize(sz))"

c.cleanup && cleanup_checkpoints(c)

return nothing
end

function write_output!(c, model, filepath::AbstractString, mode::AbstractString;
properties = default_checkpointed_properties(model))
@show properties
@show model

properties = validate_checkpointed_properties(model, properties)
addr = checkpointer_address(model)

jldopen(filepath, mode) do file
file["$addr/checkpointed_properties"] =
serializeproperties!(file, model, properties, addr)
model_fields = prognostic_fields(model)
field_names = keys(model_fields)
for name in field_names
full_address = "$addr/$name"
serializeproperty!(file, full_address, model_fields[name])
end
end
end

function cleanup_checkpoints(checkpointer)
filepaths = glob(checkpoint_superprefix(checkpointer.prefix) * "*.jld2", checkpointer.dir)
latest_checkpoint_filepath = latest_checkpoint(checkpointer, filepaths)
Expand All @@ -197,12 +220,13 @@ end

# Should this go in Models?
"""
set!(model, filepath::AbstractString)
set!(model::AbstractModel, filepath::AbstractString)

Set data in `model.velocities`, `model.tracers`, `model.timestepper.Gⁿ`, and
`model.timestepper.G⁻` to checkpointed data stored at `filepath`.
"""
function set!(model, filepath::AbstractString)
function set!(model::AbstractModel, filepath::AbstractString)

addr = checkpointer_address(model)

jldopen(filepath, "r") do file
Expand All @@ -225,7 +249,8 @@ function set!(model, filepath::AbstractString)
end
end

set_time_stepper!(model.timestepper, model.architecture, file, model_fields, addr)
set_time_stepper!(model.timestepper, file, model_fields, addr)
@show model.timestepper.Gⁿ[:u]

if !isnothing(model.particles)
copyto!(model.particles.properties, file["$addr/particles"])
Expand All @@ -251,7 +276,11 @@ function set_time_stepper_tendencies!(timestepper, arch, file, model_fields, add
parent_data = on_architecture(arch, file["$addr/timestepper/Gⁿ/$name/data"])

tendencyⁿ_field = timestepper.Gⁿ[name]

@apply_regionally copyto!(parent(tendencyⁿ_field), parent_data)
if name==:u
@show tendencyⁿ_field
end

# Tendency "n-1"
parent_data = on_architecture(arch, file["$addr/timestepper/G⁻/$name/data"])
Expand Down
13 changes: 6 additions & 7 deletions src/Simulations/run.jl
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ function time_step!(sim::Simulation)
sim.Δt
end

initial_time_step = !(sim.initialized)
@show initial_time_step = !(sim.initialized)
initial_time_step && initialize!(sim)

if initial_time_step && sim.verbose
Expand Down Expand Up @@ -191,13 +191,13 @@ we_want_to_pickup(pickup::String) = true
we_want_to_pickup(pickup) = throw(ArgumentError("Cannot run! with pickup=$pickup"))

"""
initialize!(sim::Simulation, pickup=false)
initialize!(sim::Simulation)

Initialize a simulation:

- Update the auxiliary state of the simulation (filling halo regions, computing auxiliary fields)
- Evaluate all diagnostics, callbacks, and output writers if sim.model.clock.iteration == 0
- Add diagnostics that "depend" on output writers
- Update the auxiliary state of the simulation (filling halo regions, computing auxiliary fields).
- Evaluate all diagnostics, callbacks, and output writers if `sim.model.clock.iteration == 0`.
- Add diagnostics that "depend" on output writers.
"""
function initialize!(sim::Simulation)
if sim.verbose
Expand All @@ -207,7 +207,7 @@ function initialize!(sim::Simulation)

model = sim.model
initialize!(model)
update_state!(model)
update_state!(model, compute_tendencies=true)

# Output and diagnostics initialization
[add_dependencies!(sim.diagnostics, writer) for writer in values(sim.output_writers)]
Expand Down Expand Up @@ -253,4 +253,3 @@ function initialize!(sim::Simulation)

return nothing
end

13 changes: 12 additions & 1 deletion src/Simulations/simulation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ using Oceananigans.DistributedComputations: Distributed, all_reduce
using Oceananigans.OutputWriters: JLD2Writer, NetCDFWriter

import Oceananigans.Utils: prettytime
import Oceananigans.TimeSteppers: reset!
import Oceananigans.TimeSteppers: reset!, set_clock!
import Oceananigans.OutputWriters: write_output!
import Oceananigans.Solvers: iteration

Expand Down Expand Up @@ -34,6 +34,7 @@ end
stop_iteration = Inf,
stop_time = Inf,
wall_time_limit = Inf,
align_time_step = true,
minimum_relative_step = 0)

Construct a `Simulation` for a `model` with time step `Δt`.
Expand All @@ -48,6 +49,14 @@ Keyword arguments

- `stop_time`: Stop the simulation once this much model clock time has passed. Default: `Inf`.

- `align_time_step`: When `true` it implies that the simulation will automatically adjust the
time-step to meet a constraint imposed by various schedules like `ScheduledTimes`,
`TimeInterval`, `AveragedTimeInterval`, as well as a `stop_time` criterion.
If `false`, i.e., no time-step alignment, then the simulation might blithely step passed
the specified time. Default: `true`.
By `align_time_step = false` we ensure that the time-step does _not_ change within
`time_step!(simulation)`

- `wall_time_limit`: Stop the simulation if it's been running for longer than this many
seconds of wall clock time. Default: `Inf`.

Expand Down Expand Up @@ -196,6 +205,8 @@ function reset!(sim::Simulation)
return nothing
end

set_clock!(sim::Simulation, new_clock) = set_clock!(sim.model, new_clock)

#####
##### Default stop criteria callback functions
#####
Expand Down
Loading