-
Notifications
You must be signed in to change notification settings - Fork 328
Bring in the answer changing (for derecho_intel) ccs_config update #3111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… Derecho for intel, to the one used in cesm3_0_alpha06d
…u with mpi-serial
This might not actually come in directly to master, but may come in as a tag with other changes. Having a PR here helps me to document what I'm finding in terms of this change. And helps to document when answer changes happen, and show that submodule changes after this are bit-for-bit. Updated ccs_config versions break derecho_gnu with mpi-serial. So I'm using the test: SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.derecho_gnu.clm-ptsRLA to search for versions that work or don't work with that to figure out how to fix the ccs_config problem. For more notes on that see the ccs_config issue here: ESMCI/ccs_config_cesm#233 Right now I'm figuring out problems with ccs_config for derecho_gnu. When updating ccs_config so far I see in terms of derecho_gnu with mpi-serial working:
|
Latest testing shows:
Here are the tests that failed:
All but the 2nd, 5th and 6th in the list above are due to wallclock |
The runtimes for ctsm5.3.041 tests are much less than the wallclock limits, so it really shouldn't have run out of wallclock for these. Jim suggested that I just need to remove -debug from pio and not esmf, so I'll try that. And also I could just do the removal from pio for mpi-serial. PASS ERP_D_Ld3_P64x2.f10_f10_mg37.I2000Clm50BgcCru.derecho_gnu.clm-default RUN time=112 |
The tests that fail early all die with an error in UrbanBuilding temperature. Where endrun is called, but there's also a segfault in the error. So for example for the last one: ERP_P64x2_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_gnu.clm-extra_outputs cesm.log:
|
…ks for the failing tests
OK, I seem to have something working now for the list of failed tests. So running aux_clm over again. |
OK, testing looks as expected. All tests are b4b on Izumi and gnu tests are b4b on Derecho. All Intel tests are NOT b4b except the FUNIT and PFS tests because there are no history files to compare: FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.derecho_intel.GC.ctsm5341ccschangeacl_int/TestStatus:PASS FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.derecho_intel BASELINE The nvhpc tests on derecho also show differences: SMS.f10_f10_mg37.I2000Clm50BgcCrop.derecho_nvhpc.clm-crop This is likely expected because there were several updates to the nvhpc environment done in the ccs_config update including the compiler version: diff /glade/derecho/scratch/samrabin/tests_0424-161819de/SMS.f10_f10_mg37.I2000Clm50BgcCrop.derecho_nvhpc.clm-crop.GC.0424-161819de_nvh/.env_mach_specific.sh .
6c6
< module load cesmdev/1.0 ncarenv/23.09
---
> module load cesmdev/1.0 ncarenv/24.12
8c8
< module load conda/latest nco craype nvhpc/24.3 ncarcompilers/1.0.0 cmake cray-mpich/8.1.27 netcdf-mpi/4.9.2 parallel-netcdf/1.12.3 parallelio/2.6.2 esmf/8.6.0
---
> module load conda/latest nco craype cmake nvhpc/24.11 ncarcompilers/1.0.0 cray-mpich/8.1.29 netcdf-mpi/4.9.2 parallel-netcdf/1.14.0 parallelio/2.6.4 esmf/8.8.0 Looking at the differences for some Intel cases, for shorter cases it appears to be near roundoff level, but for at least 100 history variables. Looking at longer cases for example the 5 year case: ERS_Ly5_P128x1.f10_f10_mg37.IHistClm45BgcCrop.derecho_intel.clm-cropMonthOutput difference are large and cover most of the history file. This is all as would be expected for this change. |
Description of changes
Update of ccs_config to the point where the new intel-oneapi compiler is used for derecho_intel which changes answers for derecho_intel.
Specific notes
Contributors other than yourself, if any:
CTSM Issues Fixed (include github issue #):
Fixes #2476
Fixes #3120
Are answers expected to change (and if so in what way)? Yes, just for derecho_intel
Any User Interface Changes (namelist or namelist defaults changes)? No
Does this create a need to change or add documentation? Did you do so? No No
Testing performed, if any: will run regular testing
Right now just working on getting derecho_gnu to work for mpi-serial