Skip to content

Fix serial-MPI non-reproducibility for gswp3 CASA-CNP configuration#567

Merged
SeanBryan51 merged 3 commits intomainfrom
fix-serial-mpi-non-reproducibility-for-casa-cnp
Jan 14, 2026
Merged

Fix serial-MPI non-reproducibility for gswp3 CASA-CNP configuration#567
SeanBryan51 merged 3 commits intomainfrom
fix-serial-mpi-non-reproducibility-for-casa-cnp

Conversation

@SeanBryan51
Copy link
Collaborator

@SeanBryan51 SeanBryan51 commented Mar 13, 2025

Currently running serial and MPI runs for the gswp3 configuration (see MPI and serial configurations) with CASA-CNP enabled1 shows bitwise differences between serial and MPI runs in the CASA restart and CASA NetCDF output file (all other outputs, e.g. standard CABLE outputs and restarts, are bitwise identical between serial and MPI). This change fixes a few bugs in the MPI master driver and the CASA-CNP code so that we have bitwise reproducibility between serial and MPI in the CASA output and restart files for this configuration.

Type of change

Please delete options that are not relevant.

  • Bug fix

Checklist

  • I have checked my code/text and corrected any misspellings

Testing

  • Are the changes bitwise-compatible with the main branch? If working on an optional feature, are the results bitwise-compatible when this feature is off? If yes, copy benchcab output showing successful completion of the bitwise compatibility tests or equivalent results below this line.
2026-01-14 16:14:21,532 - INFO - benchcab.benchcab.py:380 - Running comparison tasks...
2026-01-14 16:14:21,559 - INFO - benchcab.benchcab.py:381 - tasks: 168 (models: 2, sites: 42, science configurations: 4)
2026-01-14 16:17:14,896 - INFO - benchcab.benchcab.py:391 - 0 failed, 168 passed

Please add a reviewer when ready for review.


📚 Documentation preview 📚: https://cable--567.org.readthedocs.build/en/567/

Footnotes

  1. Note: CASA-CNP was enabled without a CASA restart file (i.e. cable_user%CASA_fromZero = .TRUE.).

@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch 2 times, most recently from a0c4c5d to 877e0d9 Compare March 13, 2025 05:16
@SeanBryan51 SeanBryan51 changed the title Fix serial-MPI non-reproducibility for CASA-CNP Fix serial-MPI non-reproducibility for gswp3 CASA-CNP configuration Mar 13, 2025
@SeanBryan51 SeanBryan51 marked this pull request as ready for review March 14, 2025 02:57
@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch 2 times, most recently from cf11792 to 6e609f4 Compare March 18, 2025 02:49
@SeanBryan51
Copy link
Collaborator Author

Initialisation of CASA types was added in #590. Need to rebase.

Currently the phen type is not initialised properly in the MPI
implementation and results in uninitialised values being written to the
restart file. This change initialises the phen type on allocation so
it is initialised for both serial and MPI applications.
Some CASA variables required in the output file are not being
communicated back to the master process from the workers. This change
communicates the required variables from worker to master and is
required to restore bitwise reproducibility in the CASA netcdf output
file between serial and MPI runs.
Currently the MPI implementation does not output time-averaged pools and
fluxes (#566). This change implements the time-averaging functionality
which exists in the serial driver into the MPI master driver. This is
required to restore bitwise reproducibility in the CASA netcdf output
file across serial and MPI runs.
@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch from 6e609f4 to bcc3936 Compare January 14, 2026 03:59
@SeanBryan51
Copy link
Collaborator Author

SeanBryan51 commented Jan 14, 2026

Rebased off main. I've confirmed MPI and serial are still bitwise reproducible for CASA and CABLE outputs.

@Whyborn do you mind giving this a review?

@SeanBryan51 SeanBryan51 requested a review from Whyborn January 14, 2026 04:03
Copy link
Contributor

@Whyborn Whyborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and I get bit-wise compatibility now.

@SeanBryan51
Copy link
Collaborator Author

Re-ran benchcab runs, all good 🙂

@SeanBryan51 SeanBryan51 merged commit 5a9f830 into main Jan 14, 2026
5 checks passed
@SeanBryan51 SeanBryan51 deleted the fix-serial-mpi-non-reproducibility-for-casa-cnp branch January 14, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants