Skip to content

Conversation

@minghangli-uni
Copy link
Collaborator

@minghangli-uni minghangli-uni commented May 19, 2025

@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 4da9e9f), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/4da9e9f5e4aa5476d49d6c3f7f75c9f8b5ba7cfd, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42453812782.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15105536984/artifacts/3149110395.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum

@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit a074d19), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/a074d19e65c5021ec89f058c44c46ea94710bbca, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42454456313.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15105678216/artifacts/3149176954.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum

@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 8d841fb), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/8d841fba1ef0882c20a597eb86fc3fdd49773481, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42454975288.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15105956459/artifacts/3149228305.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum

@chrisb13
Copy link
Collaborator

I'm curious, what's being tested here?

@minghangli-uni
Copy link
Collaborator Author

The restart repro check didn’t pass, so this test PR was created to see which commit caused the issue. I’ve done some local testing and found that the problem doesn’t come from these commits. I’ll go ahead and close this PR now.

 Disable Leith and enable biharmonic Smagorinsky eddy viscosity
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 009619a), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/009619a5f49b8a08677a9e464bb74f02a5ebd5ca, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42526399720.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15128981080/artifacts/3156889036.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index aeeb670..783ceb6 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -77,7 +77,7 @@
       "50446027CAA9AB98"
     ],
     "v": [
-      "8E526F6EFFC67885"
+      "E526F6EFFC67885"
     ],
     "v2": [
       "517D541A24081CD4"

 Update MEKE
module MOM_thickness_diffuse
 no GM parameterisation
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit f183832), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/f183832e00a9d46c2b8f1b7717d2bd8b0f58523e, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42526840936.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15129148831/artifacts/3156938815.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 00d5f4a..e916ca9 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -71,7 +71,7 @@
       "5C485F131CCDC800"
     ],
     "v": [
-      "3A27200B41FC9B53"
+      "BA27200B41FC9B53"
     ],
     "v2": [
       "39BD864082977728"

 Update horizontal mixing coefficients
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 206816e), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/206816e66e0c8fcf1c08a41df57373657ea43f53, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42527207945.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15129267072/artifacts/3156976825.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


206816e66e0c8fcf1c08a41df57373657ea43f53]$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 00d5f4a..e916ca9 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -71,7 +71,7 @@
       "5C485F131CCDC800"
     ],
     "v": [
-      "3A27200B41FC9B53"
+      "BA27200B41FC9B53"
     ],
     "v2": [
       "39BD864082977728"

 Update background viscosity to 0
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 5bc0a88), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/5bc0a885ee7c396eb41e86d2ff9ed6a564d31847, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42527895994.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15129524710/artifacts/3157051820.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


5bc0a885ee7c396eb41e86d2ff9ed6a564d31847]$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index dab0ebe..a2c3c87 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -71,7 +71,7 @@
       "DD0E21894A301A0B"
     ],
     "v": [
-      "68598C05F188CC3C"
+      "E8598C05F188CC3C"
     ],
     "v2": [
       "C3FB4B9584ABDF32"

 Change to Bodner et al. 2023 formulation of the restratifying mixed-layer restratification parameterisation.
module MOM_diabatic_driver
 Implicit energetics PBL scheme to determine diffusivity and viscosity in the PBL
module MOM_CVMix_KPP
 Disabled
module MOM_CVMix_conv
 Disabled
module MOM_CVMix_shear
 Disabled
module MOM_CVMix_ddiff
 Disabled
module MOM_kappa_shear
 Jackson's shear-driven turbulence
module MOM_energetic_PBL
 Enable ePBL and disable KPP
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 75c5b46), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/75c5b463aac29004c35a3620e7ca74bb1a3c67fb, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42528169235.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15129632315/artifacts/3157079335.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


Okay, here comes the difference.

$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 2c39d08..5aaa4c7 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -2,97 +2,97 @@
   "schema_version": "1-0-0",
   "output": {
     "CAu": [
-      "C7091D9B8BC6A54B"
+      "33451EF77A2A2BAA"
     ],
     "CAv": [
-      "3008DD0559286C00"
+      "3CF5BE5DD88916D8"
     ],
     "DTBT": [
-      "4055967F14CE8D6A"
+      "4055967F14CA35F4"
     ],
     "First_direction": [
       "0"
     ],
     "Kd_shear": [
-      "37B3F02F5A5A29C7"
+      "94ED1C811A550588"
     ],
     "Kv_shear": [
-      "8703190C2A6F6D2C"
+      "F50E3B10B3BC9DEE"
     ],
     "Kv_shear_Bu": [
-      "B46ED37E9271CC7C"
+      "4BFD95586C0CAE17"
     ],
     "MEKE": [
-      "56B6A5FA8E59855A"
+      "56B2433E19247818"
     ],
     "MEKE_Kh": [
-      "9AF6B13C86F26B4E"
+      "9C23EDD213F037F5"
     ],
     "MLD": [
-      "FC73B7DB7BD3EDCB"
+      "FC4C5F41C937C873"
     ],
     "MLD_MLE_filtered": [
-      "FDCE13D9337C772E"
+      "FDCCD4A4EC177BC1"
     ],
     "MLD_MLE_filtered_slow": [
-      "F07212455CF0DC5A"
+      "F07D1BC7F3F994DD"
     ],
     "MLE_Bflux": [
-      "2ADF11F05653F8A2"
+      "2ADEBE506C9B8D26"
     ],
     "SFC_BFLX": [
-      "B42825F75DDD296E"
+      "B444FDA3B6ACC7EC"
     ],
     "Salt": [
-      "8A3F75C54541F3FB"
+      "8A3BAE762B3B9164"
     ],
     "Temp": [
-      "86CEE8ABA3BD9C6C"
+      "86E4A0CF913E6288"
     ],
     "age": [
-      "9A5333D3EEAE02F2"
+      "6801604871396F01"
       "E80BDA97FDFF1C5"
     ],
     "sfc": [
-      "BDF5B71A9A3DF1E4"
+      "BDC208E2323D2E32"
     ],
     "u": [
-      "8833E5E411042495"
+      "74E329C0CFD4D35D"
     ],
     "u2": [
-      "664A836A7F21694E"
+      "6E1E7305FA92D364"
     ],
     "ubtav": [
-      "58768E8A84380DD7"
+      "5864BD2BD327F2D5"
     ],
     "v": [
-      "374C06F1533F5CC3"
+      "38F70AE2CAA88358"
     ],
     "v2": [
-      "1ED656C1D5CE7C9E"
+      "D4633ABF5B89AD8D"
     ],
     "vbtav": [
-      "3027B84901AB965A"
+      "B0376E653BBA130D"
     ]
   }
 }

@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit a95c155), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/a95c155c212bde1605fd74847a82f02e1da04faf, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42529798313.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15130186769/artifacts/3157249280.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


a95c155c212bde1605fd74847a82f02e1da04faf]$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 281fb79..3890118 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -71,7 +71,7 @@
       "E059E117C7685596"
     ],
     "v": [
-      "57E8B269851AE4A0"
+      "D7E8B269851AE4A0"
     ],
     "v2": [
       "8A60009DE06D6747"

@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 06f5534), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/06f5534d694ce3a795219bc5b847f8fba1487f72, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42530773479.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15130501693/artifacts/3157350565.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


06f5534d694ce3a795219bc5b847f8fba1487f72]$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 92f95cb..a8c1d1c 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -2,85 +2,85 @@
   "schema_version": "1-0-0",
   "output": {
     "CAu": [
-      "AC7CC6C8BBB60F5F"
+      "BEB2B0DD5D0BC95E"
     ],
     "CAv": [
-      "18D4343A9EEDD590"
+      "6F1433E4536CBC0D"
     ],
     "DTBT": [
-      "4055967F1358B213"
+      "4055967F134525AC"
     ],
     "First_direction": [
       "0"
     ],
     "Kd_shear": [
-      "76E08CBFD40DA1DD"
+      "94AAFC15538789AC"
     ],
     "Kv_shear": [
-      "3CF025C646396AE3"
+      "8F9B21E71FC860A"
     ],
     "Kv_shear_Bu": [
-      "61485FA5CC821C22"
+      "70499FF0970426B9"
     ],
     "MEKE": [
-      "498CD376334108B0"
+      "498D27FAC56A1DC2"
     ],
     "MEKE_Kh": [
-      "7CC91A88958766B8"
+      "7DBD2E34DBEF927F"
     ],
     "MLD_MLE_filtered": [
-      "B70714DBDEC02AC3"
+      "B7047C45F608C01B"
     ],
     "Salt": [
-      "5C5A1E67DF658C82"
+      "5C564CD1FAE7249E"
     ],
     "Temp": [
-      "833A42B40925A9A1"
+      "11D79EF785AF3EDC"
     ],
     "age": [
-      "D74ACC06E8E1F8B8"
+      "9083FFD46769B932"
     ],
     "ave_ssh": [
-      "20BADDCE692875FB"
+      "20BAE0B99EEF7080"
     ],
     "diffu": [
-      "2620080388EEA0CE"
+      "9373F2443BEB71C"
     ],
     "diffv": [
-      "568539B8EFF50FAA"
+      "FA2AE347E93C00E5"
     ],
     "frazil": [
-      "4CBCB95FD4B514D5"
+      "4CCE05C471EEF36D"
     ],
     "h": [
-      "C9BB230891F3A20B"
+      "C9BB57F8EB74E5FD"
     ],
     "h_ML": [
-      "FFD0A293410A81B"
+      "FFC651385081DEB"
     ],
     "p_surf_EOS": [
       "E80BDA97FDFF1C5"
     ],
     "sfc": [
-      "BBECB62E04BF116C"
+      "BBD916D888ECB2AE"
     ],
     "u": [
-      "12F7BEDA13978EFC"
+      "CFBEEF0FA9B9124D"
     ],
     "u2": [
-      "E70015765889546D"
+      "647DDD69461825A8"
     ],
     "ubtav": [
-      "5030B3A0D31850B9"
+      "D0115D9F53ECD3AA"
     ],
     "v": [
-      "5D378C035745F853"
+      "350743692D99FE24"
     ],
     "v2": [
-      "11CCA838DECDDF6F"
+      "75013545DF362C7C"
     ],
     "vbtav": [
-      "2C718D7CC89DBCEF"
+      "2BEE3BA7FAC13635"
     ]
   }
 }

This reverts commit 06f5534.
kappa shear breaks the restart repro for 1deg config

I'll do a sanity check for Bodner et al 2023
@minghangli-uni
Copy link
Collaborator Author

!test repro

@github-actions
Copy link

github-actions bot commented May 20, 2025

❌ The Bitwise Reproducibility Check Failed ❌

When comparing:

  • restart-repro-check (checksums created using commit 5ae86ef), against
  • dev-MC_100km_jra_ryf (checksums in commit 3724a14)
Further information

The experiment can be found on Gadi at /scratch/tm70/repro-ci/experiments/access-om3-configs/5ae86ef907638b2fa5fb944f6e04e9476680a716, and the test results at https://github.com/ACCESS-NRI/access-om3-configs/runs/42531367014.

The checksums generated by this !test command are found in the testing/checksum directory of https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/15130731937/artifacts/3157410557.

The checksums compared against are found here https://github.com/ACCESS-NRI/access-om3-configs/tree/3724a14919e6bf2139fece7382e9acb8a1cd2ff2/testing/checksum


5ae86ef907638b2fa5fb944f6e04e9476680a716]$ git diff --no-index restart-1d-1-checksum.json restart-2d-0-checksum.json
diff --git a/restart-1d-1-checksum.json b/restart-2d-0-checksum.json
index 002741d..4a2d604 100644
--- a/restart-1d-1-checksum.json
+++ b/restart-2d-0-checksum.json
@@ -83,7 +83,7 @@
       "EA6CB98CBB6F0E34"
     ],
     "v": [
-      "C4FA51B95430B482"
+      "44FA51B95430B482"
     ],
     "v2": [
       "6DF9AE600ADDFCBA"

@minghangli-uni
Copy link
Collaborator Author

minghangli-uni commented May 22, 2025

Okay, the above tests confirm that MOM_kappa_shear breaks the restart repro for the 100km configuration ONLY, but not for the 25km configuration.

! === module MOM_kappa_shear ===
! Parameterization of shear-driven turbulence following Jackson, Hallberg and Legg, JPO 2008
USE_JACKSON_PARAM = True        !   [Boolean] default = False
                                ! If true, use the Jackson-Hallberg-Legg (JPO 2008) shear mixing
                                ! parameterization.
VERTEX_SHEAR = True             !   [Boolean] default = False
                                ! If true, do the calculations of the shear-driven mixing at the cell vertices
                                ! (i.e., the vorticity points)
MAX_RINO_IT = 25                !   [nondim] default = 50
                                ! The maximum number of iterations that may be used to estimate the Richardson
                                ! number driven mixing.
USE_RESTRICTIVE_TOLERANCE_CHECK = True !   [Boolean] default = False
                                ! If true, uses the more restrictive tolerance check to determine if a timestep
                                ! is acceptable for the KS_it outer iteration loop.  False uses the original
                                ! less restrictive check.

Below are the three repro tests for the 25 km config based on #556 with only the module use and module load modified to enable symmetric memory for MOM. The full test results are available /scratch/tm70/ml0072/tmp/test-model-repro_0.25deg_mom_sym_mem for anyone who would like to take a look.

================================== short test summary info ===================================
FAILED ../../packages/python3/py311/lib/python3.11/site-packages/model_config_tests/test_bit_reproducibility.py::TestBitReproducibility::test_bit_repro_historical - AssertionError: There was an error running experiment: exp_default_runtime
============ 1 failed, 2 passed, 55 deselected, 3 warnings in 1284.91s (0:21:24) =============

@access-hive-bot
Copy link

This pull request has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-twg-announce/401/70

@minghangli-uni
Copy link
Collaborator Author

I just had a look at the most recent comment, the restart repo did not fail but exp_default_runtime, sorry for the false alarm in today's osit meeting!

@minghangli-uni
Copy link
Collaborator Author

closing it as completed

@dougiesquire dougiesquire deleted the restart-repro-check branch September 24, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants