Skip to content

Conversation

@francispoulin
Copy link

@francispoulin francispoulin commented Aug 15, 2024

Following up on #106, this is a first attempt to create a regional model with ECCO-derived restoring at the boundaries. We decided to try focusing on the ACC in the southern ocean.

It does not run yet, but after it does, it would be good to know if people agree this is a good example to include. If yes, then we need to turn this into an example.

@francispoulin
Copy link
Author

The error that I get is copied below.

@simone-silvestri @glwagner

[ Info: In-painting ecco salinity
[ Info: In-painting ecco salinity
ERROR: a bounds error was thrown during kernel execution on thread (65, 1, 1) in block (3, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).

ERROR: LoadError: KernelException: exception thrown during kernel execution on device NVIDIA A100-SXM4-40GB
Stacktrace:
  [1] check_exceptions()
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/exceptions.jl:39
  [2] device_synchronize(; blocking::Bool, spin::Bool)

@simone-silvestri
Copy link
Collaborator

what if you use the CPU and start julia with --check-bounds=yes?

@francispoulin
Copy link
Author

Thanks @simone-silvestri for the suggestion. Will try it now.

@francispoulin
Copy link
Author

francispoulin commented Aug 15, 2024

Things go a lot further but there is a problem with the lines that defines coupled_model. It seems that the matrix it has and wants are not the same size.

It seems this is with assemble_atmosphere_ocean_fluxes.

julia> include("acc_regional_simulation.jl")
[ Info: Regridding bathymetry from existing file /u/fpoulin/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/Bathymetry/ETOPO_2022_v1_60s_N90W180_surface.nc.
┌ Warning: The westernmost meridian of `target_grid` 0.0 does not coincide with the closest meridian of the bathymetry grid, -1.4210854715202004e-14.
└ @ ClimaOcean.Bathymetry ~/software/ClimaOcean.jl/src/Bathymetry.jl:147
[ Info: In-painting ecco temperature
[ Info: In-painting ecco temperature
[ Info: In-painting ecco salinity
[ Info: In-painting ecco salinity
┌ Warning: This simulation will run forever as stop iteration = stop time = wall time limit = Inf.
└ @ Oceananigans.Simulations ~/.julia/packages/Oceananigans/dvdXO/src/Simulations/simulation.jl:55
[ Info: In-painting ecco temperature
[ Info: In-painting ecco salinity
ERROR: LoadError: BoundsError: attempt to access 21×46 Matrix{Float64} at index [22, 40]
Stacktrace:
  [1] getindex
    @ ./essentials.jl:14 [inlined]
  [2] net_downwelling_radiation
    @ ~/software/ClimaOcean.jl/src/OceanSeaIceModels/CrossRealmFluxes/tabulated_albedo.jl:156 [inlined]
  [3] macro expansion
    @ ~/software/ClimaOcean.jl/src/OceanSeaIceModels/CrossRealmFluxes/atmosphere_ocean_fluxes.jl:275 [inlined]
  [4] cpu__assemble_atmosphere_ocean_fluxes!
    @ ~/.julia/packages/KernelAbstractions/QE5mt/src/macros.jl:287 [inlined]
  [5] cpu__assemble_atmosphere_ocean_fluxes!(__ctx__::KernelAbstractions.CompilerMetadata{…}, centered_velocity_fluxes::@NamedTuple{…}, net_tracer_fluxes::@NamedTuple{…}, grid::ImmersedBoundaryGrid{…}, clock::Clock{…}, ocean_temperature::SubArray{…}, ocean_salinity::SubArray{…}, ocean_temperature_units::ClimaOcean.OceanSeaIceModels.CrossRealmFluxes.DegreesCelsius, similarity_theory_fields::@NamedTuple{…}, downwelling_radiation::@NamedTuple{…}, prescribed_freshwater_flux::@NamedTuple{…}, atmos_grid::Oceananigans.Grids.ZRegularLLG{…}, atmos_times::StepRangeLen{…}, atmos_backend::JRA55NetCDFBackend, atmos_time_indexing::Oceananigans.OutputReaders.Cyclical{…}, runoff_args::Tuple{…}, radiation_properties::Radiation{…}, ocean_reference_density::Float64, ocean_heat_capacity::Float64, freshwater_density::Float64)
    @ ClimaOcean.OceanSeaIceModels.CrossRealmFluxes ./none:0
  [6] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{…}, ndrange::Nothing, iterspace::KernelAbstractions.NDIteration.NDRange{…}, args::Tuple{…}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
    @ KernelAbstractions ~/.julia/packages/KernelAbstractions/QE5mt/src/cpu.jl:140
  [7] __run(obj::KernelAbstractions.Kernel{…}, ndrange::Nothing, iterspace::KernelAbstractions.NDIteration.NDRange{…}, args::Tuple{…}, dynamic::KernelAbstractions.NDIteration.DynamicCheck, static_threads::Bool)
    @ KernelAbstractions ~/.julia/packages/KernelAbstractions/QE5mt/src/cpu.jl:107
  [8] (::KernelAbstractions.Kernel{…})(::@NamedTuple{…}, ::Vararg{…}; ndrange::Nothing, workgroupsize::Nothing)
    @ KernelAbstractions ~/.julia/packages/KernelAbstractions/QE5mt/src/cpu.jl:46
  [9] (::KernelAbstractions.Kernel{…})(::@NamedTuple{…}, ::Vararg{…})
    @ KernelAbstractions ~/.julia/packages/KernelAbstractions/QE5mt/src/cpu.jl:39
 [10] launch!(::CPU, ::ImmersedBoundaryGrid{…}, ::Oceananigans.Utils.KernelParameters{…}, ::typeof(ClimaOcean.OceanSeaIceModels.CrossRealmFluxes._assemble_atmosphere_ocean_fluxes!), ::@NamedTuple{…}, ::Vararg{…}; include_right_boundaries::Bool, reduced_dimensions::Tuple{}, location::Nothing, active_cells_map::Nothing, kwargs::@Kwargs{})
    @ Oceananigans.Utils ~/.julia/packages/Oceananigans/dvdXO/src/Utils/kernel_launching.jl:168
 [11] launch!(::CPU, ::ImmersedBoundaryGrid{…}, ::Oceananigans.Utils.KernelParameters{…}, ::Function, ::@NamedTuple{…}, ::Vararg{…})
    @ Oceananigans.Utils ~/.julia/packages/Oceananigans/dvdXO/src/Utils/kernel_launching.jl:154
 [12] compute_atmosphere_ocean_fluxes!(coupled_model::OceanSeaIceModel{…})
    @ ClimaOcean.OceanSeaIceModels.CrossRealmFluxes ~/software/ClimaOcean.jl/src/OceanSeaIceModels/CrossRealmFluxes/atmosphere_ocean_fluxes.jl:77
 [13] update_state!(coupled_model::OceanSeaIceModel{…}, callbacks::Vector{…}; compute_tendencies::Bool)
    @ ClimaOcean.OceanSeaIceModels ~/software/ClimaOcean.jl/src/OceanSeaIceModels/ocean_only_model.jl:42
 [14] update_state!
    @ ClimaOcean.OceanSeaIceModels ~/software/ClimaOcean.jl/src/OceanSeaIceModels/ocean_only_model.jl:30 [inlined]
 [15] update_state!(coupled_model::OceanSeaIceModel{…})
    @ ClimaOcean.OceanSeaIceModels ~/software/ClimaOcean.jl/src/OceanSeaIceModels/ocean_only_model.jl:30
 [16] OceanSeaIceModel(ocean::Simulation{…}, sea_ice::ClimaOcean.OceanSeaIceModels.MinimumTemperatureSeaIce{…}; atmosphere::ClimaOcean.OceanSeaIceModels.PrescribedAtmospheres.PrescribedAtmosphere{…}, radiation::Radiation{…}, similarity_theory::Nothing, ocean_reference_density::Float64, ocean_heat_capacity::Float64, clock::Clock{…})
    @ ClimaOcean.OceanSeaIceModels ~/software/ClimaOcean.jl/src/OceanSeaIceModels/ocean_sea_ice_model.jl:82
 [17] top-level scope
    @ REPL[15]:1
Some type information was truncated. Use `show(err)` to see complete types.

@francispoulin
Copy link
Author

@simone-silvestri , any advice on what is going wrong here?

@simone-silvestri
Copy link
Collaborator

It looks like there is a bug in the TabulatedAlbedo function. i.e., there is no check to make sure that we stay within bounds when interpolating in the table.
I will open a PR to fix this issue. In the meantime, if you want to proceed with the implementation without incurring in this problem, you can use radiation = Radiation(ocean_albedo = LatitudeDependentAlbedo())

@francispoulin
Copy link
Author

Thanks @simone-silvestri , I will give that a try!

Make changes so that it runs
@francispoulin
Copy link
Author

@simone-silvestri : I tried it and it seems like a function is not defined.

I added this at the beginning and now it seems to be running!

using ClimaOcean.OceanSeaIceModels.CrossRealmFluxes: LatitudeDependentAlbedo

@simone-silvestri
Copy link
Collaborator

Ah nice. I think we can export that type.

@codecov
Copy link

codecov bot commented Aug 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (b3ae3fe) to head (8cc2719).
Report is 5 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #142   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files         34      34           
  Lines       1962    1983   +21     
=====================================
- Misses      1962    1983   +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@francispoulin
Copy link
Author

It's running on a CPU (i.e. slow) and still on the initial time step.

I made all these changes on the branch and can revert back to what we had previously as other fixes come along.

Maybe I'll have something to share tomorrow.

@francispoulin
Copy link
Author

I started the job yesterday and it hasn't updated the output files in over 24 hours. I think something has gone wrong. Below is the currently display that I have. It hasn't stopped and still running on a CPU. Maybe we need to try it on a GPU or have more output to see what has gone wrong? Any suggestions?

julia> include("acc_regional_simulation.jl")
[ Info: Regridding bathymetry from existing file /u/fpoulin/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/Bathymetry/ETOPO_2022_v1_60s_N90W180_surface.nc.
┌ Warning: The westernmost meridian of `target_grid` 0.0 does not coincide with the closest meridian of the bathymetry grid, -1.4210854715202004e-14.
└ @ ClimaOcean.Bathymetry ~/software/ClimaOcean.jl/src/Bathymetry.jl:147
[ Info: In-painting ecco temperature
[ Info: In-painting ecco temperature
[ Info: In-painting ecco salinity
[ Info: In-painting ecco salinity
┌ Warning: This simulation will run forever as stop iteration = stop time = wall time limit = Inf.
└ @ Oceananigans.Simulations ~/.julia/packages/Oceananigans/dvdXO/src/Simulations/simulation.jl:55
[ Info: In-painting ecco temperature
[ Info: In-painting ecco salinity
[ Info: Initializing simulation...
[ Info: Time: 0 seconds, Iteration 0, Δt 5 minutes, max(vel): (0.00e+00, 0.00e+00, 0.00e+00), max(T): 29.70, min(T): -1.94, wtime: 2.679 minutes 
[ Info:     ... simulation initialization complete (14.029 seconds)
[ Info: Executing initial time step...
┌ Warning: Simulation stopped during initialization.
└ @ Oceananigans.Simulations ~/.julia/packages/Oceananigans/dvdXO/src/Simulations/run.jl:129

@francispoulin
Copy link
Author

My correction. It is still running on one CPU. It is at 4 days after 7 days of computing. Not a great ratio.

What needs to be done so we need to do to run this on a GPU? @simone-silvestri

@simone-silvestri
Copy link
Collaborator

Wow, that seems quite slow! What if you move it on the GPU?

@francispoulin
Copy link
Author

Wow, that seems quite slow! What if you move it on the GPU?

Sorry @simone-silvestri for the late reply.

I am happy to try it again on a GPU but last time there was an error. I can try it again and let you know what the error is.

@francispoulin
Copy link
Author

@simone-silvestri
I ran it on a GPU and found the following error. It suggests I try passing -g2 when I run Julia. I can try that but I believe I tried this before and didn't see much more.

ERROR: a bounds error was thrown during kernel execution on thread (225, 1, 1) in block (7, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).

@francispoulin
Copy link
Author

I stand corrected, there is more information. To me this actually looks very different even.

julia> include("acc_regional_simulation.jl")
Precompiling Oceananigans
  166 dependencies successfully precompiled in 109 seconds
Precompiling ClimaOcean
        Info Given ClimaOcean was explicitly requested, output will be shown live 
WARNING: using Units.day in module ECCO conflicts with an existing identifier.
  204 dependencies successfully precompiled in 143 seconds. 168 already precompiled.
  2 dependencies had output during precompilation:
┌ ClimaOcean
│  [Output was shown above]
└  
┌ Accessors → AccessorsUnitfulExt
│  [pid 760940] waiting for IO to finish:
│   Handle type        uv_handle_t->data
│   fs_event           0x25d5fe0->0x7fdd5f8fbeb0
│   timer              0x2385440->0x7fdd5f8fbee0
│  This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
└  
[ Info: Regridding bathymetry from existing file /u/fpoulin/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/Bathymetry/ETOPO_2022_v1_60s_N90W180_surface.nc.
┌ Warning: The westernmost meridian of `target_grid` 0.0 does not coincide with the closest meridian of the bathymetry grid, -1.4210854715202004e-14.
└ @ ClimaOcean.Bathymetry ~/software/ClimaOcean.jl/src/Bathymetry.jl:147
[ Info: In-painting ecco temperature
[ Info: In-painting ecco temperature
[ Info: In-painting ecco salinity
[ Info: In-painting ecco salinity
ERROR: a bounds error was thrown during kernel execution on thread (1, 1, 1) in block (3, 1, 1).
Stacktrace:
 [1] indexed_iterate at ./tuple.jl:92
 [2] indexed_iterate at ./tuple.jl:92
 [3] stateindex at /u/fpoulin/software/ClimaOcean.jl/src/ClimaOcean.jl:40
 [4] ECCORestoring at /u/fpoulin/software/ClimaOcean.jl/src/DataWrangling/ecco_restoring.jl:210
 [5] DiscreteForcing at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Forcings/discrete_forcing.jl:51
 [6] hydrostatic_free_surface_tracer_tendency at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_tendency_kernel_functions.jl:133
 [7] macro expansion at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_tendencies.jl:240
 [8] gpu_compute_hydrostatic_free_surface_Gc! at /u/fpoulin/.julia/packages/KernelAbstractions/QE5mt/src/macros.jl:95
 [9] gpu_compute_hydrostatic_free_surface_Gc! at ./none:0

ERROR: a bounds error was thrown during kernel execution on thread (1, 1, 1) in block (67, 1, 1).
Stacktrace:
 [1] indexed_iterate at ./tuple.jl:92
 [2] indexed_iterate at ./tuple.jl:92
 [3] stateindex at /u/fpoulin/software/ClimaOcean.jl/src/ClimaOcean.jl:40
 [4] ECCORestoring at /u/fpoulin/software/ClimaOcean.jl/src/DataWrangling/ecco_restoring.jl:210
 [5] DiscreteForcing at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Forcings/discrete_forcing.jl:51
 [6] hydrostatic_free_surface_tracer_tendency at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_tendency_kernel_functions.jl:133
 [7] macro expansion at /u/fpoulin/.julia/packages/Oceananigans/dvdXO/src/Models/HydrostaticFreeSurfaceModels/compute_hydrostatic_free_surface_tendencies.jl:240
Unhandled Task ERROR: KernelException: exception thrown during kernel execution on device NVIDIA A100-SXM4-40GB

@glwagner
Copy link
Member

Can you make an MWE for this and open an issue?

@francispoulin
Copy link
Author

Can you make an MWE for this and open an issue?

I will certainly give it a try and see what part of it is causing the issue. This will likely take me a day or two to get to.

@francispoulin
Copy link
Author

@simone-silvestri , I realize it's been a few months but I am still keen to this this example up and running.

I can try this all again this week but if you had time to meet for an hour, I wonder if that would help?

@simone-silvestri
Copy link
Collaborator

Sure, I ll text on slack.

Resort to what we have in main, exactly. Surely this must work?
@francispoulin
Copy link
Author

I changed the make.jl file to be the same that we currently have on main. From what I understand, this is doing what main. Even though the tests pass on main, they don't pass here. Below is a copy of the error and has to do with the data wrangling.

Any ideas how this can happen?

JRA55 and data wrangling utilities: Error During Test at /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/test_jra55.jl:6
  Got exception outside of a @test
  RequestError: HTTP/1.1 404 Not Found while requesting http://esgf-node.ornl.gov/thredds/fileServer/user_pub_work/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-5-0/atmos/3hrPt/tas/gr/v20200916/tas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-5-0_gr_195801010000-195812312100.nc
  Stacktrace:
    [1] #3
      @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Downloads/src/Downloads.jl:271 [inlined]
    [2] open(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, typeof(ClimaOcean.DataWrangling.download_progress), Bool, Nothing, Nothing, String}, args::String; kwargs::@Kwargs{write::Bool, lock::Bool})
      @ Base ./io.jl:410
    [3] open_nolock
      @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/ArgTools/src/ArgTools.jl:35 [inlined]
    [4] arg_write(f::Function, arg::String)
      @ ArgTools /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/ArgTools/src/ArgTools.jl:103
    [5] #download#2
      @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Downloads/src/Downloads.jl:258 [inlined]
    [6] download
      @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Downloads/src/Downloads.jl:247 [inlined]
    [7] macro expansion
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/src/DataWrangling/JRA55/JRA55_metadata.jl:204 [inlined]
    [8] macro expansion
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/depot/default/packages/Oceananigans/WzSJS/src/DistributedComputations/distributed_macros.jl:31 [inlined]
    [9] download_dataset(metadata::Metadata{MultiYearJRA55, StepRange{DateTime, Hour}, Nothing})
      @ ClimaOcean.DataWrangling.JRA55 /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/src/DataWrangling/JRA55/JRA55_metadata.jl:198
   [10] JRA55FieldTimeSeries(metadata::Metadata{MultiYearJRA55, StepRange{DateTime, Hour}, Nothing}, architecture::CPU, FT::Type; latitude::Nothing, longitude::Nothing, backend::JRA55NetCDFBackend{Nothing}, time_indexing::Oceananigans.OutputReaders.Cyclical{Nothing})
      @ ClimaOcean.DataWrangling.JRA55 /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/src/DataWrangling/JRA55/JRA55_field_time_series.jl:327
   [11] JRA55FieldTimeSeries(variable_name::Symbol, architecture::CPU, FT::Type; dataset::MultiYearJRA55, start_date::DateTime, end_date::DateTime, dir::String, kw::@Kwargs{backend::JRA55NetCDFBackend{Nothing}})
      @ ClimaOcean.DataWrangling.JRA55 /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/src/DataWrangling/JRA55/JRA55_field_time_series.jl:311
   [12] JRA55FieldTimeSeries
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/src/DataWrangling/JRA55/JRA55_field_time_series.jl:300 [inlined]
   [13] (::var"#1#2"{CPU, DateTime, MultiYearJRA55})(dir::String)
      @ Main /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/test_jra55.jl:147
   [14] mktempdir(fn::var"#1#2"{CPU, DateTime, MultiYearJRA55}, parent::String; prefix::String)
      @ Base.Filesystem ./file.jl:819
   [15] mktempdir(fn::Function, parent::String)
      @ Base.Filesystem ./file.jl:815
   [16] macro expansion
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/test_jra55.jl:145 [inlined]
   [17] macro expansion
      @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [18] top-level scope
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/test_jra55.jl:7
   [19] include(fname::String)
      @ Main ./sysimg.jl:38
   [20] top-level scope
      @ /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/runtests.jl:69
   [21] include(fname::String)
      @ Main ./sysimg.jl:38
   [22] top-level scope
      @ none:6
   [23] eval
      @ ./boot.jl:430 [inlined]
   [24] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:296
   [25] _start()
      @ Base ./client.jl:531
Test Summary:                      | Pass  Error  Total   Time
JRA55 and data wrangling utilities |   35      1     36  49.6s
ERROR: LoadError: Some tests did not pass: 35 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/test_jra55.jl:6
in expression starting at /central/scratch/esm/slurm-buildkite/climaocean-ci/4788/climaocean-ci/test/runtests.jl:68
ERROR: Package ClimaOcean errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/Types.jl:68
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/Operations.jl:2128
 [3] test
   @ /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/Operations.jl:2011 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::@Kwargs{io::IOContext{IO}})
   @ Pkg.API /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/API.jl:481
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IO}, kwargs::@Kwargs{})
   @ Pkg.API /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/API.jl:159
 [6] test(pkgs::Vector{Pkg.Types.PackageSpec})
   @ Pkg.API /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/API.jl:148
 [7] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::@Kwargs{})
   @ Pkg.API /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/API.jl:174
 [8] test()
   @ Pkg.API /central/groups/esm/software/julia/julia-1.11.5/share/julia/stdlib/v1.11/Pkg/src/API.jl:165
 [9] top-level scope
   @ none:1
🚨 Error: The command exited with status 1

@francispoulin
Copy link
Author

Hello @glwagner, @navidcy and @simone-silvestri ,

I tried running these tests on my laptop and serve and I obtained the same error as is found on the server above on both. Note that only one test fails and that is test_jra55.jl, as you can see below.

Test Summary:                      | Pass  Error  Total  Time
JRA55 and data wrangling utilities |   35      1     36  2.0s
ERROR: LoadError: Some tests did not pass: 35 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/fpoulin/Software/ClimaOcean.jl/test/test_jra55.jl:6
in expression starting at /home/fpoulin/Software/ClimaOcean.jl/test/runtests.jl:68

This file was last updated last week. Could it be this is a known problem? I see that Navid and Simone were involved in the recent update.

I want to see if everything passes without this one test.  I will return this afterwards.
Since many tests failed I am trying what we had before.  Very strange!
@francispoulin
Copy link
Author

To follow up, the file that it's trying to download is the following.

http://esgf-node.ornl.gov/thredds/fileServer/user_pub_work/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-5-0/atmos/3hrPt/tas/gr/v20200916/tas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-5-0_gr_195801010000-195812312100.nc

I looked for the file and see that the id is below, and does match with what appears after fileServer above.

Any idea why we get a 404 error when we try and download this and other files?

@glwagner
Copy link
Member

To follow up, the file that it's trying to download is the following.

http://esgf-node.ornl.gov/thredds/fileServer/user_pub_work/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-5-0/atmos/3hrPt/tas/gr/v20200916/tas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-5-0_gr_195801010000-195812312100.nc

I looked for the file and see that the id is below, and does match with what appears after fileServer above.

Any idea why we get a 404 error when we try and download this and other files?

Is the link broken?

@francispoulin
Copy link
Author

To follow up, the file that it's trying to download is the following.

http://esgf-node.ornl.gov/thredds/fileServer/user_pub_work/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-5-0/atmos/3hrPt/tas/gr/v20200916/tas_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-5-0_gr_195801010000-195812312100.nc

I looked for the file and see that the id is below, and does match with what appears after fileServer above.
Any idea why we get a 404 error when we try and download this and other files?

Is the link broken?

Hello @glwagner , I guess so. Last week, when I tried it on my machines i thought I was getting a 404 error but now it seems to have a 500 error.

[ Info: Downloading ECCO data: temperature in /central/scratch/esm/slurm-buildkite/climaocean-ci/4794/depot/default/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO/v2/monthly...
ERROR: LoadError: RequestError: HTTP/1.1 500 Internal Server Error while requesting https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/monthly/THETA/THETA.1440x720x50.199301.nc

@glwagner
Copy link
Member

it looks like there were some vestigial commented out things in make.jl which may have lead to failures.

francispoulin and others added 2 commits September 15, 2025 09:19
Co-authored-by: Gregory L. Wagner <[email protected]>
Co-authored-by: Gregory L. Wagner <[email protected]>
@francispoulin
Copy link
Author

Thanks Greg! Sorry, I thought I had returned things to normal, but clearly not the case.

I accepted two of your three suggestions, and want to see if all tests pass without this example being added to the docs. If yes, then I'll add the example in to see what changes.

@francispoulin
Copy link
Author

@glwagner , I see two checks have failed. Bummer.

When I click on either of them I get a page not found. Any suggestion how I can see the error?

@glwagner
Copy link
Member

Meh, we might have to adjust buildkite security settings.

The error says:

ERROR: LoadError: UndefVarError: `outputpath` not defined

does the example run for you?

@francispoulin
Copy link
Author

Meh, we might have to adjust buildkite security settings.

The error says:

ERROR: LoadError: UndefVarError: `outputpath` not defined

does the example run for you?

It does not run for me on either my laptop or server.

I am happy to test the buildkite settings, if that would help.

@francispoulin
Copy link
Author

Interestingly, when the new example is competely removed (thank you again @glwagner), the docs still fail. Sadly, I still can't see the errors.

I am trying to build the docs on a server and I will let you know if I come across errors of any kind.

@francispoulin
Copy link
Author

When I tried running the make.jl script on another server to make the docs, I don't get an error. I haven't looked at any of them but it seems to have created all of them, see below, including the new example without a single error.

Question: why does it fail on these two checks on github?

shell> ls -lrt src/literated/
total 90056
-rw-rw-r-- 1 fpoulin fpoulin   276168 Sep 15 16:06 panantarctic_bathymetry.png
-rw-rw-r-- 1 fpoulin fpoulin  1590517 Sep 15 16:54 acc_snapshot.png
-rw-rw-r-- 1 fpoulin fpoulin 18169005 Sep 15 16:54 panantarctic_regional_surface.mp4
-rw-rw-r-- 1 fpoulin fpoulin    18135 Sep 15 16:54 panantarctic_regional_simulation.md
-rw-rw-r-- 1 fpoulin fpoulin  1747402 Sep 15 17:01 single_column_profiles.mp4
-rw-rw-r-- 1 fpoulin fpoulin   596778 Sep 15 17:01 single_column_os_papa_simulation.md
-rw-rw-r-- 1 fpoulin fpoulin   598273 Sep 15 18:05 global_snapshot.png
-rw-rw-r-- 1 fpoulin fpoulin 50084758 Sep 15 18:06 one_degree_global_ocean_surface.mp4
-rw-rw-r-- 1 fpoulin fpoulin   696041 Sep 15 18:06 one_degree_simulation.md
-rw-rw-r-- 1 fpoulin fpoulin   505603 Sep 15 18:07 bathymetry.png
-rw-rw-r-- 1 fpoulin fpoulin  2891474 Sep 15 19:13 snapshot.png
-rw-rw-r-- 1 fpoulin fpoulin 15000775 Sep 15 19:13 near_global_ocean_surface.mp4
-rw-rw-r-- 1 fpoulin fpoulin    17508 Sep 15 19:13 near_global_ocean_simulation.md

@glwagner
Copy link
Member

It's an out-of-memory error on GPU

To display the buildkite publically we probably need to fiddle with buildkite settings

Let's try a coarser grid to see if this allows us to avoid the memory errors.
@francispoulin
Copy link
Author

Ah, good to know!

I'm trying a coarser grid, 720x120x40, to see if this avoids the error.

@francispoulin
Copy link
Author

Funny that when I reduce the resolution, and don't do anything else differently, now we have more errors. See below.

I'll return the parameters since this clearly did not help.

��� Warning: Opening file with JLD2.MmapIO failed, falling back to IOStream
��� @ JLD2 /central/scratch/esm/slurm-buildkite/climaocean-ci/4801/depot/default/packages/JLD2/WDhXU/src/JLD2.jl:162
Field utilities: Error During Test at /central/scratch/esm/slurm-buildkite/climaocean-ci/4801/climaocean-ci/test/test_ecco2_daily.jl:51
  Got exception outside of a @test
  EOFError: read end of file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build docs Add this label to built the docs in a PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants