forked from ESCOMP/CTSM
-
Notifications
You must be signed in to change notification settings - Fork 15
Closed
Description
Brief summary of bug
I have been working with @mvdebolskiy to try to understand a crash that happens at the end of year 1 on Dec. 31 with the following behavior.
code repository:
https://github.com/mvdebolskiy/CTSM.git
code branch:
updt-noresm-to-5.3.084
test case:
SMS_D_Ld366_P1024.ne30pg3_ne30pg3_mtn14.I2000Clm60Fates.betzy_gnu.cl\m-FatesColdNoComp.matvey/
- The code dies on Dec. 31 in EDCanopyStructureMod.F90 at first with an array out of bounds for arealayer(nclmax+5). It turns out that arealayer does not have to be an array - so I have made it a scalar to get past this point.
- Now it dies upon failure of the condition
(patch_area_counter > max_patch_iterations .and. area_not_balanced) - This happens at numerous gridcells - and I have picked just one gridcell failure to look at more carefully -
lat, lon = 1.33058, 27.50000 and the crash happens with pft=13. I'm using 1024 tasks - and task 134: contains the problem gridcell. - I have restart files written at both Dec. 30 and Dec. 31.
- On dec. 30:
134: DEBUG: lat, lon, year, mon, day, tod = 1.33058 27.50000 2000 12 30 1800
134:
134: DEBUG: area not balanced loop: pft, counter = 13 0
134: DEBUG: z1 = 3
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 1 720.9159737710 714.2857142857 6.6302594853
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 1 6.6302594853 631.1649323585
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 2 724.1804081137 714.2857142857 9.8946938280
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 1 9.8946938280 634.4244105726
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 3 726.7865900232 714.2857142857 12.5008757375
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 0.6813581681 0.6813581681
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 2 11.8195175694 716.2105380271
134: DEBUG: z2 = 4
13
On dec. 31:
134: DEBUG: lat, lon, year, mon, day, tod = 1.33058 27.50000 2000 12 31 1800
134:
134: DEBUG: area not balanced loop: pft, counter = 13 0
134: DEBUG: z1 = 3
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 1 8088.8092821351 714.2857142857 7374.5235678494
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 1 7374.5235678494 7661.6637274354
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 2 12887.8364490678 714.2857142857 12173.5507347821
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 37.8177039835 37.8177039835
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 2 5475.4951772349 5475.4951772349
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 3 6660.2378535637 7374.5235678494
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 3 15777.8579507025 714.2857142857 15063.5722364168
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 0.6853056261 0.6853056261
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 2 37.8177039835 37.8177039835
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 3 3603.6219102943 3603.6219102943
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 4 5475.4951772349 5475.4951772349
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 5 5945.9521392780 6660.2378535637
134: DEBUG: z2 = 4
134:
134:
134: DEBUG: area not balanced loop: pft, counter = 13 1
134: DEBUG: z1 = 4
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 1 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 2 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 3 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 1 0.0000000000 714.2857142857
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 4 15063.5722364168 714.2857142857 14349.2865221311
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 0.6853056261 0.6853056261
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 2 37.8177039835 37.8177039835
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 3 3603.6219102943 3603.6219102943
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 4 5475.4951772349 5475.4951772349
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 5 5231.6664249923 5945.9521392780
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 6 0.0000000000 0.0000000000
134: DEBUG: z2 = 5
134:
134:
134: DEBUG: area not balanced loop: pft, counter = 13 2
134: DEBUG: z1 = 5
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 1 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 2 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 3 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 4 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 5 14349.2865221311 714.2857142857 13635.0008078454
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 0.6853056261 0.6853056261
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 2 37.8177039835 37.8177039835
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 3 3603.6219102943 3603.6219102943
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 4 5475.4951772349 5475.4951772349
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 5 4517.3807107066 5231.6664249923
134: DEBUG: z2 = 6
......
and demotion continues until the following last iteration of the loop which causes an abort
.......
134:
134: DEBUG: area not balanced loop: pft, counter = 13 10
134: DEBUG: z1 = 13
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 1 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 2 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 3 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 4 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 5 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 6 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 7 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 8 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 9 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 10 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: target_area is less than nearzero - returning
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 11 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 12 714.2857142857 714.2857142857 0.0000000000
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 1 0.0000000000 482.6192892933
134: DEBUG: canopy_structure i_lyr, arealayer, currentPatch%area, target_area = 13 8635.0008078455 714.2857142857 7920.7150935598
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 1 0.6853056261 0.6853056261
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 2 37.8177039835 37.8177039835
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 3 3603.6219102943 3603.6219102943
134: DEBUG: demotion partial: cohort#, pd_area, cohort_area 4 4278.5901736559 4992.8758879416
134: DEBUG: demotion total : cohort#, pd_area, cohort_area 5 0.0000000000 0.0000000000
134: DEBUG: z2 = 14
134:
134: PATCH AREA CHECK NOT CLOSING
The problem seems to be that the arealayer for pft 14 coming into this routine on Dec. 31 is huge and there are not enough demotion calls to totally resolve it. I'm still trying to spin up on FATES - so maybe I am missing how this is happening. @mvdebolskiy - do you have other data to add since we last talked?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels