Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
751b6bc
Add convenience constructor for TimeAverageOperator
xkykai Oct 14, 2025
3ee1acd
add JLD2
xkykai Oct 14, 2025
eb464a8
Replace time_averaging.jl with fieldtimeseries_averaging.jl and add s…
xkykai Oct 14, 2025
3fbf665
Merge branch 'xk/halfdeg-gm-calibration' of https://github.com/CliMA/…
xkykai Oct 14, 2025
8f5ee6b
change to kwarg
xkykai Oct 14, 2025
f2c7942
Add new example for averaging ECCO data and enhance save function
xkykai Oct 14, 2025
dba1af3
create custom save path for each time period
xkykai Oct 15, 2025
40b7631
dry run for GM calibration
xkykai Oct 16, 2025
167283d
Add Glob, MPI, MPIPreferences, and TOML dependencies to Project.toml
xkykai Oct 16, 2025
26547e6
Export TimeAverageOperator, AveragedFieldTimeSeries, and save_average…
xkykai Oct 16, 2025
4a3bbe8
fix start_date bug in ECCO averaging
xkykai Oct 16, 2025
443db27
Add GM calibration script for single-GPU configuration
xkykai Oct 16, 2025
c6f045c
Add HPC configuration for single-GPU calibration
xkykai Oct 16, 2025
728e7fc
move functions around
xkykai Oct 16, 2025
18bfc6d
Fix typo in include statement for gcloud_configuration.jl
xkykai Oct 16, 2025
1abee8a
Fix model_interface path in GM calibration script
xkykai Oct 16, 2025
e521fec
Add time parameter to HPC configuration in GM calibration script
xkykai Oct 16, 2025
470c538
Fix function reference for backend_worker_kwargs in gcloud_configurat…
xkykai Oct 16, 2025
70cdc82
Add partition parameter to HPC configuration in GM calibration script
xkykai Oct 16, 2025
019176b
Refactor gcloud configuration and calibration script to include sbatc…
xkykai Oct 16, 2025
8ac0035
Fix indexing in extract_field_section and update simulation parameter…
xkykai Oct 16, 2025
8cdcf5a
Add logging for regridding model data in regrid_model_data function
xkykai Oct 16, 2025
cb89ab5
Fix parameter passing in run_gm_calibration_omip_dry_run to use corre…
xkykai Oct 16, 2025
e8a109c
Add output directory logging and fix return value in GM calibration f…
xkykai Oct 16, 2025
26292f5
fix syntac
xkykai Oct 17, 2025
890f78f
Refactor regrid_model_data to load precomputed target grid and regrid…
xkykai Oct 17, 2025
74ead30
Add NaNStatistics dependency to Project.toml
xkykai Oct 17, 2025
d49cc2f
move data processing function around, apply .!isnan to memeber data
xkykai Oct 17, 2025
df7ec5c
Remove logging of section sizes in process_member_data function
xkykai Oct 17, 2025
c542319
Update process_member_data to accept vertical weighting parameter for…
xkykai Oct 17, 2025
59a02d8
add CairoMakie
xkykai Oct 17, 2025
319791e
Add TimeAverageBuoyancyOperator for buoyancy-specific time averaging
xkykai Oct 18, 2025
fe912bc
Refactor include statements for clarity and update data processing to…
xkykai Oct 18, 2025
d28af71
Add SeawaterPolynomials dependency to Project.toml
xkykai Oct 18, 2025
4cab8c6
Fix location retrieval in TimeAverageBuoyancyOperator for accurate bu…
xkykai Oct 18, 2025
c96f15a
Update DataWrangling module exports and rename compute_buoyancy! func…
xkykai Oct 18, 2025
8228f81
Fix grid parameter usage in buoyancy computation functions for consis…
xkykai Oct 18, 2025
38b4984
Add buoyancy model initialization and save averaged buoyancy field ti…
xkykai Oct 18, 2025
9183d7b
change regridding method to avoid bathymetry issues
xkykai Oct 18, 2025
2e1076f
Add data plotting functions and integrate into model analysis workflow
xkykai Oct 18, 2025
453bcdf
fixing a grave error!
xkykai Oct 18, 2025
3a17759
new way of computing b
xkykai Oct 18, 2025
24cd696
gearing up for the actual run
xkykai Oct 18, 2025
e65a703
add diagnostics
xkykai Oct 19, 2025
2239a11
fix flushing to text file for @distributed and allow different sampli…
xkykai Oct 19, 2025
f378308
refactor forward_model function to include simulation and sampling le…
xkykai Oct 19, 2025
7064737
add functionality to compute and save 5-year averaged temperature, sa…
xkykai Oct 19, 2025
3033ed1
add functionality to compute and save 5-year averaged temperature, sa…
xkykai Oct 19, 2025
c6f581d
calibration script for @distributed
xkykai Oct 19, 2025
6b80b21
add ColorSchemes and Distributed dependencies to Project.toml
xkykai Oct 19, 2025
5b0a1a0
remove unused variables and update forward model to actual run!
xkykai Oct 19, 2025
3834d81
refactor: update output directory naming and rename forward model fun…
xkykai Oct 19, 2025
1a22b72
add zonal averaging functionality
xkykai Oct 19, 2025
b7f084e
add ArgParse dependency to Project.toml
xkykai Oct 19, 2025
e2b1024
add command line argument parsing for simulation length and zonal ave…
xkykai Oct 19, 2025
2078f00
add command line argument parsing for simulation length and zonal ave…
xkykai Oct 19, 2025
ec99b56
refactor: update analysis function to correctly handle parameter extr…
xkykai Oct 20, 2025
066f666
fix: update file path retrieval in regrid_model_data function to use …
xkykai Oct 20, 2025
ca78857
calibration script for perfect model
xkykai Oct 20, 2025
ebb5e94
fix: update salinity and temperature dataset from EN4Monthly to ECCO4…
xkykai Oct 21, 2025
d03eef4
calibrate new run with ECCO 4 initialization
xkykai Oct 21, 2025
7b1ff27
functionality for one degree omip calibration
xkykai Oct 21, 2025
9840fea
use the right time for sea ice initialization
xkykai Oct 21, 2025
a6fd004
fix sea ice initialization to correct date for 1 deg omip
xkykai Oct 21, 2025
a9ed510
fix: update output directory path to include simulation length in cal…
xkykai Oct 21, 2025
4019f26
fix: update directory path for forcing data to ECCO data in calibrati…
xkykai Oct 21, 2025
0be5dda
fix: update directory path for forcing data to ECCO data in one degre…
xkykai Oct 21, 2025
ff80e46
fix: add obl_closure parameter to gm_forward_model and run_gm_calibra…
xkykai Oct 21, 2025
3964952
use CATKE
xkykai Oct 21, 2025
cdd7d29
updated calibration: 5 year spin up 1 year avg, CATKE, equator to pole
xkykai Oct 24, 2025
c4238e6
feat: add observation_covariance argument to command line and adjust …
xkykai Oct 24, 2025
0c20b05
fix: update output directory naming to include covariance type
xkykai Oct 24, 2025
ed9d950
add pickup argument for simulation spinup and change initialization time
xkykai Oct 25, 2025
8069fd3
fix: correct pickup argument initialization in calibration script
xkykai Oct 25, 2025
6246e88
fix: update output directory naming to include pickup argument condition
xkykai Oct 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,23 +1,34 @@
authors = ["Gregory Wagner <[email protected]>", "Xin Kai Lee <[email protected]>"]
name = "ClimaOceanCalibration"
uuid = "b5b0db0a-afc0-4eaa-813c-9b0f9c9ed209"
authors = ["Gregory Wagner <[email protected]>", "Xin Kai Lee <[email protected]>"]
version = "0.1.0"

[deps]
ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"
CFTime = "179af706-886a-5703-950a-314cd64e0468"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
ClimaCalibrate = "4347a170-ebd6-470c-89d3-5c705c0cacc2"
ClimaOcean = "0376089a-ecfe-4b0e-a64f-9c555d74d754"
ClimaSeaIce = "6ba0ff68-24e6-4315-936c-2e99227c95a4"
ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
EnsembleKalmanProcesses = "aa8a2aa5-91d8-4396-bcef-d4f2ec43552d"
Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
NaNStatistics = "b946abbf-3ea7-4610-9019-9858bfdeaf2d"
Oceananigans = "9e8cae18-63c1-5223-a75c-80ca9d6e9a09"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
PkgDev = "149e707d-584d-56d3-88ec-740c18e106ff"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
SeawaterPolynomials = "d496a93d-167e-4197-9f49-d3af4ff8fe40"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
XESMF = "2e0b0046-e7a1-486f-88de-807ee8ffabe5"

[compat]
Expand All @@ -35,4 +46,4 @@ julia = "1.10"

[extras]
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
62 changes: 62 additions & 0 deletions examples/GM_calibration/average_ECCO_data.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
using ClimaOcean
using Oceananigans
using Oceananigans.Units
using Oceananigans.Architectures: on_architecture
using SeawaterPolynomials.TEOS10
using ClimaOcean.DataWrangling
using Printf
using Dates
using CUDA
using ClimaOceanCalibration.DataWrangling
using JLD2
using XESMF

arch = GPU()

grid = jldopen(joinpath(pwd(), "examples", "GM_calibration", "grids_and_regridder.jld2"), "r") do file
return on_architecture(arch, file["target_grid"])
end

dataset = ECCO4Monthly()

dir = joinpath(homedir(), "ECCO_data")
mkpath(dir)
start_dates = [DateTime(year) for year in 1992:2017]
sampling_length = 1 # year
buoyancy_model = SeawaterBuoyancy(equation_of_state=TEOS10EquationOfState())

for start_date in start_dates
@info "Processing data starting from $(start_date)..."
end_date = start_date + Year(sampling_length) - Month(1)

T = Metadata(:temperature; dataset, dir, start_date, end_date)
S = Metadata(:salinity; dataset, dir, start_date, end_date)

T_data = FieldTimeSeries(T, grid, time_indices_in_memory=12)
S_data = FieldTimeSeries(S, grid, time_indices_in_memory=12)

T_averaging = TimeAverageOperator(T_data)
T_averaged_fts = AveragedFieldTimeSeries(T_averaging(T_data), T_averaging, nothing)

S_averaging = TimeAverageOperator(S_data)
S_averaged_fts = AveragedFieldTimeSeries(S_averaging(S_data), S_averaging, nothing)

b_averaging = TimeAverageBuoyancyOperator(T_data)
b_averaged_fts = AveragedFieldTimeSeries(b_averaging(T_data, S_data, buoyancy_model), b_averaging, nothing)

prefix = "$(sampling_length)yearaverage_2degree"
date_str = replace(string(start_date), ":" => "-")

dirname = prefix * date_str

SAVE_PATH = joinpath(pwd(), "calibration_data", "ECCO4Monthly", dirname)
mkpath(SAVE_PATH)

T_filepath = joinpath(SAVE_PATH, "T.jld2")
S_filepath = joinpath(SAVE_PATH, "S.jld2")
b_filepath = joinpath(SAVE_PATH, "b.jld2")

save_averaged_fieldtimeseries(T_averaged_fts, T, filename=T_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(S_averaged_fts, S, filename=S_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(b_averaged_fts, nothing, filename=b_filepath, overwrite_existing=true)
end
111 changes: 111 additions & 0 deletions examples/GM_calibration/average_EN4_data.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
using ClimaOcean
using Oceananigans
using Oceananigans.Units
using Oceananigans.Architectures: on_architecture
using SeawaterPolynomials.TEOS10
using ClimaOcean.DataWrangling
using Printf
using Dates
using CUDA
using ClimaOceanCalibration.DataWrangling
using JLD2
using XESMF

arch = GPU()

grid = jldopen(joinpath(pwd(), "examples", "GM_calibration", "grids_and_regridder.jld2"), "r") do file
return on_architecture(arch, file["target_grid"])
end

dataset = EN4Monthly()

dir = joinpath(homedir(), "EN4_data")
mkpath(dir)
start_dates = [DateTime(1902), DateTime(1912), DateTime(1922), DateTime(1942),
DateTime(1952), DateTime(1972),
DateTime(1992), DateTime(2002), DateTime(2012)]

# seems that T fields for 1939 1971 1985 is problematic

buoyancy_model = SeawaterBuoyancy(equation_of_state=TEOS10EquationOfState())

for start_date in start_dates
@info "Processing data starting from $(start_date)..."
end_date = start_date + Year(10) - Month(1)

T = Metadata(:temperature; dataset, dir, start_date, end_date)
S = Metadata(:salinity; dataset, dir, start_date, end_date)

T_data = FieldTimeSeries(T, grid, time_indices_in_memory=20)
S_data = FieldTimeSeries(S, grid, time_indices_in_memory=20)

T_averaging = TimeAverageOperator(T_data)
T_averaged_fts = AveragedFieldTimeSeries(T_averaging(T_data), T_averaging, nothing)

S_averaging = TimeAverageOperator(S_data)
S_averaged_fts = AveragedFieldTimeSeries(S_averaging(S_data), S_averaging, nothing)

b_averaging = TimeAverageBuoyancyOperator(T_data)
b_averaged_fts = AveragedFieldTimeSeries(b_averaging(T_data, S_data, buoyancy_model), b_averaging, nothing)

prefix = "10yearaverage_2degree"
date_str = replace(string(start_date), ":" => "-")

dirname = prefix * date_str

SAVE_PATH = joinpath(pwd(), "calibration_data", "EN4Monthly", dirname)
mkpath(SAVE_PATH)

T_filepath = joinpath(SAVE_PATH, "T.jld2")
S_filepath = joinpath(SAVE_PATH, "S.jld2")
b_filepath = joinpath(SAVE_PATH, "b.jld2")

save_averaged_fieldtimeseries(T_averaged_fts, T, filename=T_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(S_averaged_fts, S, filename=S_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(b_averaged_fts, nothing, filename=b_filepath, overwrite_existing=true)
end

# seems that T fields for 1939 1971 1985 is problematic
start_dates = [DateTime(1902), DateTime(1907), DateTime(1912), DateTime(1917), DateTime(1922),
DateTime(1927), DateTime(1932), DateTime(1942),
DateTime(1947), DateTime(1952), DateTime(1957), DateTime(1962),
DateTime(1972), DateTime(1977),
DateTime(1987), DateTime(1992), DateTime(1997), DateTime(2002),
DateTime(2007), DateTime(2012)]

for start_date in start_dates
@info "Processing data starting from $(start_date)..."
end_date = start_date + Year(5) - Month(1)

T = Metadata(:temperature; dataset, dir, start_date, end_date)
S = Metadata(:salinity; dataset, dir, start_date, end_date)

T_data = FieldTimeSeries(T, grid, time_indices_in_memory=20)
S_data = FieldTimeSeries(S, grid, time_indices_in_memory=20)

T_averaging = TimeAverageOperator(T_data)
T_averaged_fts = AveragedFieldTimeSeries(T_averaging(T_data), T_averaging, nothing)

S_averaging = TimeAverageOperator(S_data)
S_averaged_fts = AveragedFieldTimeSeries(S_averaging(S_data), S_averaging, nothing)

b_averaging = TimeAverageBuoyancyOperator(T_data)
b_averaged_fts = AveragedFieldTimeSeries(b_averaging(T_data, S_data, buoyancy_model), b_averaging, nothing)

prefix = "5yearaverage_2degree"
date_str = replace(string(start_date), ":" => "-")

dirname = prefix * date_str

SAVE_PATH = joinpath(pwd(), "calibration_data", "EN4Monthly", dirname)
mkpath(SAVE_PATH)

T_filepath = joinpath(SAVE_PATH, "T.jld2")
S_filepath = joinpath(SAVE_PATH, "S.jld2")
b_filepath = joinpath(SAVE_PATH, "b.jld2")

save_averaged_fieldtimeseries(T_averaged_fts, T, filename=T_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(S_averaged_fts, S, filename=S_filepath, overwrite_existing=true)
save_averaged_fieldtimeseries(b_averaged_fts, nothing, filename=b_filepath, overwrite_existing=true)
end

134 changes: 134 additions & 0 deletions examples/GM_calibration/calibrate_gm_distributed.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
const ensemble_size = 5
using Distributed
using ArgParse

function parse_commandline()
s = ArgParseSettings()

@add_arg_table! s begin
"--simulation_length"
help = "Length of calibration simulation in years"
arg_type = Int
default = 6
"--sampling_length"
help = "Length of sampling period in years"
arg_type = Int
default = 1
"--zonal_average"
help = "Whether to perform zonal averaging in loss function"
arg_type = Bool
default = false
"--observation_covariance"
help = "Type of covariance to use (observations vs predetermined)"
arg_type = Bool
default = true
"--pickup"
help = "Pickup files for simulation spinup"
arg_type = Bool
default = false
end
return parse_args(s)
end

args = parse_commandline()

# Add workers with pre-set environment variables
nprocs = ensemble_size
addprocs(nprocs)
@everywhere @info "Worker $(myid())"
@everywhere ENV["CUDA_VISIBLE_DEVICES"] = myid() - 1

# Now load CUDA on all workers
@everywhere using CUDA
# Verify each worker sees exactly one GPU
@everywhere println("Worker $(myid()) sees GPU: $(CUDA.NVML.index(CUDA.NVML.Device(CUDA.uuid(CUDA.device()))))")

@everywhere begin
using ClimaCalibrate
using ClimaOcean
using ClimaOceanCalibration.DataWrangling
using Oceananigans
using EnsembleKalmanProcesses
using EnsembleKalmanProcesses.ParameterDistributions
using LinearAlgebra
using JLD2
using Glob
using Statistics
import ClimaCalibrate: generate_sbatch_script
include(joinpath(pwd(), "examples", "GM_calibration", "data_processing.jl"))
include(joinpath(pwd(), "examples", "GM_calibration", "model_interface.jl"))

args = $args

const simulation_length = args["simulation_length"]
const sampling_length = args["sampling_length"]
const zonal_average = args["zonal_average"]
const observation_covariance = args["observation_covariance"]
const pickup = args["pickup"] ? Dict("ocean" => joinpath(pwd(), "pickups", "ocean_pickup.jld2"), "sea_ice" => joinpath(pwd(), "pickups", "seaice_pickup.jld2")) : nothing

obl_closure = ClimaOcean.OceanSimulations.default_ocean_closure()

if obl_closure isa RiBasedVerticalDiffusivity
obl_str = "RiBased"
else
obl_str = "CATKE"
end

if observation_covariance
cov_str = "obscov"
else
cov_str = "diagcov"
end

const output_dir = joinpath(pwd(), "calibration_runs", "gm_$(simulation_length)yr_$(sampling_length)yravg_ecco_$(obl_str)_$(cov_str)$(zonal_average ? "_zonalavg" : "")$(pickup !== nothing ? "_pickup" : "")")
ClimaCalibrate.forward_model(iteration, member) = gm_forward_model(iteration, member; simulation_length, sampling_length, obl_closure, pickup)
ClimaCalibrate.observation_map(iteration) = gm_construct_g_ensemble(iteration, zonal_average)
end

n_iterations = 10

κ_skew_prior = constrained_gaussian("κ_skew", 5e2, 3e2, 0, Inf)
κ_symmetric_prior = constrained_gaussian("κ_symmetric", 5e2, 3e2, 0, Inf)

priors = combine_distributions([κ_skew_prior, κ_symmetric_prior])

obs_paths = abspath.(glob("$(sampling_length)yearaverage_2degree*", joinpath("calibration_data", "ECCO4Monthly")))

calibration_target_obs_path = abspath(joinpath("calibration_data", "ECCO4Monthly", "$(sampling_length)yearaverage_2degree2007-01-01T00-00-00"))

Y = hcat(process_observation.(obs_paths, no_tapering, zonal_average)...)

const output_dim = size(Y, 1)

n_trials = size(Y, 2)

if observation_covariance
# the noise estimated from the samples (will have rank n_trials-1)
internal_cov = tsvd_cov_from_samples(Y) # SVD object

# the "5%" model error (diagonal)
model_error_frac = 0.05
data_mean = vec(mean(Y,dims=2))
model_error_cov = Diagonal((model_error_frac*data_mean).^2)

# regularize the model error diagonal (in case of zero entries)
model_error_cov += 1e-6*I

# Combine...
covariance = SVDplusD(internal_cov, model_error_cov)
else
T_variance = 0.7^2
S_variance = 0.1^2
N_data = output_dim ÷ 2
covariance = Diagonal(vcat(fill(T_variance, N_data), fill(S_variance, N_data)))
end

Y_obs = Observation(Dict("samples" => process_observation(calibration_target_obs_path, taper_interior_ocean, zonal_average),
"covariances" => covariance,
"names" => basename(calibration_target_obs_path)))

utki = EnsembleKalmanProcess(Y_obs, TransformUnscented(priors))

backend = ClimaCalibrate.WorkerBackend

ClimaCalibrate.calibrate(ClimaCalibrate.WorkerBackend, utki, n_iterations, priors, output_dir)
Loading