Parallel I/O output module enhancements#655
Parallel I/O output module enhancements#655SeanBryan51 wants to merge 23 commits intoparallelio-playgroundfrom
Conversation
4b2f171 to
8abda2c
Compare
|
Hi @Whyborn, I've opened this PR to highlight the output module specific diffs (includes the aggregators and the list data structure for output variables). The implementation still contains a bunch of TODO's, but it would be awesome if I could get your feedback at this stage. I've been testing it only in serial (testing in parallel will only be possible once the met data can be read in via parallel I/O) and currently gives the same answers as before for Qh. However, I did have to change the NetCDF format in the old output module from classic to NetCDF4/HDF5 to easily compare outputs between new and old. This was a hack as the new NetCDF API has the NetCDF4 format hardcoded whenever creating or opening new files. I'll make that configurable so that both classic and NetCDF4 is supported. |
|
First pass comments:
class(aggregator_t), dimension(:), allocatable :: Qh_aggregators
...
Qh_aggregators = create_aggregators(canopy%fh, ["mean", "max"])
call cable_output_add_variable(
...
aggregator=Qh_aggregators
...
)This way it makes the aggregators available for use where necessary (e.g. the
cable_output_shape_land_patch = create_decomposition(["land", "patch"])
cable_output_shape_land_patch_soil = create_decomposition(["land", "patch", "soil"])where the dimensions are defined earlier in initialisation with something like: call define_cable_dimension("soil", 6)
call define_cable_dimension("rad", 2)I think this makes the design more easily extensible? This makes it trivial for someone developing new code which may have new dimensions to output their own stuff.
type aggregator_store_t
class(aggregator_t), allocatable :: aggregators(:)
integer :: num_aggregators
end type aggregator_store_twhere
|
|
Thanks @Whyborn for your thoughts!
I agree that it would be nice if time related control were handled at a higher level (for example with the relevant science output module as you say). However, it wasn't obvious to me how this approach could allow for driving the intermediate aggregators required for
For two reasons:
Oh yep I remember you mentioning this in our discussions. I think this is something we could definitely add in the future - I was hesitant to introduce this now as I want to avoid diverging in functionality from the current output module which assumes a single aggregation method per variable. As for a list of objects being targets, I got around that problem by working with arrays of type aggregator_handle_t which contain a reference to an aggregator rather than the actual aggregator instance.
I like this suggestion, I agree it's definitely not that obvious what one needs to do to create a new
I agree, a timing module for wider use in the code would be great. The |
Does this allow aggregators to be driven independently? It seems to me like specifically doesn't allow this, because every aggregator which accumulates e.g. There has to be a way to trigger accumulation at specific times, for instances like this. This is why I think the aggregators have to be available to work with as standalone objects.
Yea I can see that, it might be more easily readable to have every call contain a instead of a construction like Although I don't see why the latter couldn't also be used to create the same documentation in the same way.
Yea that's what I meant, whether the
Are you sure? I thought arrays of polymorphic classes were part of the standard, I used them in some of my aggregator testing. You just need
There's the datetime-fortran which we already have a spack package for? I haven't actually looked at it's features yet though. |
|
Hi @Whyborn, I've made the updates we discussed last week. I think it's looking much better now with your suggestions on how aggregators are organised in the output module, thanks as always for the comments! Please let me know if there is anything else that catches your eye on the design. I'm going to try add support for specifying non-time varying data in the output and restart files. For this I'm thinking we could introduce a |
|
Just a couple of comments:
if (output%patch)
decomp = decomp_land_real32
reducer = "patch"
else
decomp = decomp_land_patch_real32
reducer = "none"
end if
call cable_output_add_variable(
name="Qh",
...
decomp=decomp,
reducer=reducer
)And then in the Then you wouldn't have to have all the buffers. The
|
I'm definitely happy to add the frequency limit to
That sounds good to me, I will rename the module containing the grid cell averaging stuff to be more general (and maybe put this under
The distinction of types for decompositions is required by PIO via the
I'm happy to introduce a string argument for reducer instead of the
I did consider using a function interface instead of a subroutine interface, however I opted for the subroutine approach with the temporary buffer as this avoids introducing a potentially large allocation when computing the grid cell average. I want to limit the number of unnecessary allocations and copy operations in the write procedures where possible. It might be possible to return a preallocated array from a function. I can look into this a bit more. Thank you again for the feedback on this! |
| ! TODO(Sean): this is a hack for determining if the current time step | ||
| ! is the last of the month. Better way to do this? | ||
| IF(ktau == 1) THEN | ||
| !MC - use met%year(1) instead of CABLE_USER%YearStart for non-GSWP forcing and leap years | ||
| IF ( TRIM(cable_user%MetType) .EQ. '' ) THEN | ||
| start_year = met%year(1) | ||
| ELSE | ||
| start_year = CABLE_USER%YearStart | ||
| ENDIF | ||
| END IF | ||
|
|
There was a problem hiding this comment.
If we read a met file during the initialisation (outside the time loop), we could then initialise "CABLE_USER%YearStart = met%year(1)" for the MetType = ''?
It would remove the need for this if condition and it makes sure that cable_user%YearStart is always defined and always has the same meaning.
Not sure that's what you mean in your TODO. It seems to apply more to the code in cable_timing_utils.F90 than here.
There was a problem hiding this comment.
Yep I definitely want to get back to this, I'll test out the solution you propose. I mean to try out the site case (which I believe corresponds to MetType == ' ') to confirm I haven't broken things here for this case.
There was a problem hiding this comment.
Not sure that's what you mean in your TODO. It seems to apply more to the code in cable_timing_utils.F90 than here.
You've interpreted that correctly, it was a reminder to myself to look into an alternative algorithm for monthly timings that doesn't require the start year of the simulation.
There was a problem hiding this comment.
As an alternate, we could consider an additional "ktau"-like variable that is reset every year. (maybe there is one in the code, I don't know). The time loop tells us when we change year (to check if this is true with the site simulations).
There was a problem hiding this comment.
What's the reason not to have cable_user%yearstart be something that is unconditionally required in the namelist, and have that be the final truth about the start point of the simulation? Having that able to be overwritten is bound to lead to confusion in the future.
There was a problem hiding this comment.
As an alternate, we could consider an additional "ktau"-like variable that is reset every year. (maybe there is one in the code, I don't know). The time loop tells us when we change year (to check if this is true with the site simulations).
I've seen this pattern used in other models (see #656). I agree that tracking information like this throughout the run makes sense
There was a problem hiding this comment.
What's the reason not to have
cable_user%yearstartbe something that is unconditionally required in the namelist, and have that be the final truth about the start point of the simulation? Having that able to be overwritten is bound to lead to confusion in the future.
That's a great point, it might make sense to instead error if met%year(1) does not match cable_user%yearstart, and rely on cable_user%yearstart for timing logic.
I don't really understand why met%year(1) is required for this case, I'll see what happens when we rely on cable_user%yearstart alone
There was a problem hiding this comment.
Isn't ktau already that? The timestep within the current year, with ktau_tot being the total timestep in the simulation?
There was a problem hiding this comment.
I believe ktau can extend beyond an year for site configurations (dependent on kend). For the MetType = 'gswp3' case that I'm testing at the moment, ktau is reset every year. This is probably a symptom of the many met forcing formats / configurations which are supported by the driver.
There was a problem hiding this comment.
It looks like ktau loops with do ktau = kstart, kend here and kend is set to be the number of steps in a year in almost every instance, except for gswp which determines it based off the length of the time dimension in the met file (this case block)
|
Thanks for the comments @ccarouge, I'll add in those suggestions! |
03eaa9b to
fb4febe
Compare
4724830 to
9263f1e
Compare
|
Just an update on this PR! I'm currently adding functionality for restarts to the new output module by adding a
More to come 🙂 |
This change allows the `cable_abort_module` to be used in modules where `cable_def_types_mod` or `cable_IO_vars_module` are a dependee of that module as the removal of `range_abort` avoids introducing cyclic module dependencies. The impact of this change is minimal as `range_abort` is only called from the `cable_abort_module` in the code base.
Co-authored-by: Lachlan Whyborn <lachlan.s.whyborn@gmail.com>
…_daily aggregators to canopy_type
9263f1e to
3094a2b
Compare
|
Hi @Whyborn, apologies it has been a while since I've followed up on this! I had to stop myself from tweaking small things 🙃 The output module now has functionality for writing restarts and has gone through a pretty aggressive restructuring. Since a lot of the previous commits on the output module are no longer relevant with the restructuring, I've blown away most of commit history and grouped the commits into these categories:
Please let me know what you think! (at a time that's convenient of course, it's a pretty dense PR 😅 ) The next thing on the plate after this is would be to introduce working variables for all diagnostics, and then add all output and restart variables and test for (approximate) bitwise reproducibility. |
|
@SeanBryan51 I'm going to take a look at this tomorrow- where would you recommend I start? Is there a logical place to start? |
|
I would say the commits from and including Add aggregator implementation would be relevant for review - commits prior are less relevant IMO. I.e: 3726642~1...parallelio-playground-output-module-enhancements |
Whyborn
left a comment
There was a problem hiding this comment.
First pass at the review, I think I'll have to go through everything again to get it right in my head I think
| type, public :: cable_output_variable_t | ||
| character(len=64) :: name | ||
| character(len=64) :: units | ||
| character(len=64) :: accumulation_frequency = "all" | ||
| character(len=64) :: reduction_method = "none" | ||
| character(len=64) :: aggregation_method | ||
| character(len=256) :: long_name | ||
| logical :: active | ||
| logical :: parameter = .false. | ||
| logical :: distributed = .true. | ||
| logical :: restart = .false. | ||
| logical :: patchout = .false. | ||
| integer :: var_type | ||
| real, dimension(2) :: range | ||
| type(cable_output_dim_t), allocatable :: data_shape(:) | ||
| class(aggregator_t), allocatable :: aggregator | ||
| end type |
There was a problem hiding this comment.
It seems to me that the distinction between parameter and restart variables would be better handled by having different types, rather than logical tags on this single type?
Was distributed the convenience used to handle things like the soil layer depth variable?
I think the individual NetCDF attributes shouldn't be components the derived type, rather a dictionary-like component that contains the key, value pairs (or just two array components that form the keys and values).
I think having something like
abstract type cable_variable
character, allocatable :: attr_keys
character, allocatable :: attr_vals
...
end type cable_variable
type, extends(cable_variable) :: cable_output_variable
type, extends(cable_variable) :: cable_parameter_variable
type, extends(cable_variable) :: cable_restart_variable
would make things easier to follow.
There was a problem hiding this comment.
It seems to me that the distinction between
parameterandrestartvariables would be better handled by having different types, rather thanlogicaltags on this single type?
Too many if statements? 😆
I'm just wondering if having type extension here would mean restart variables and output variables (both non-parameter and parameter) would be mutually exclusive to each other?
Was
distributedthe convenience used to handle things like the soil layer depth variable?
Yep. It's used for writing any global fields (same value across all MPI ranks)
I think the individual NetCDF attributes shouldn't be components the derived type, rather a dictionary-like component that contains the
key, valuepairs (or just two array components that form the keys and values).
Yeah that's a nice idea. It would be cool if the default constructor could handle it cleanly, I'm thinking something like this
cable_output_variable_t( &
name="foo", &
...
metadata=[ &
attribute("units", "<units>"), &
attribute("long_name", "<long name for variable foo>"), &
...
]
)Might be more readable than
cable_output_variable_t( &
name="foo", &
...
attr_keys=["units", "long_name", ...], &
attr_values=["<units>", "<long name for variable foo>", ...], &
)There was a problem hiding this comment.
I'm just wondering if having type extension here would mean restart variables and output variables (both non-parameter and parameter) would be mutually exclusive to each other?
Is that an issue? I don't see a reason for restart variables to ever be user-configurable, the functionality that needs to be shared with a regular output variable is writing metadata and distributed writes (and you could potentially add the distributed reads to a restart_variable functionality)?
There was a problem hiding this comment.
Did you have a plan for how to turn entries read from the output configuration file into cable_output_variables? I was imagining a dictionary of definitions that contains the information that does not change with the output configuration, and then use that combined with the specified options to build the profiles. The dictionary would be accessed like
get_var_def(<var_name>)
which would return a definition containing a pointer to the target data, fixed metadata (standard_name, long_name, units, maybe others?), the accumulation frequency and the allowed reductions (maybe even allowed_reduction: resultant_decomp pairs?). Then the creation of the output variables can simply be:
do i = 1, size(output_variables)
call create_output_variable(output_variables(i), get_var_def(output_variables(i)%var_name))
end dowhere output_variables is a list of definitions created from the output configuration file? It would be possible to create variable definitions closer to the science code, and decouple the usage of the output module to the implementation, which is one of my main concerns at the moment (that the usage and implementation are too tightly coupled)
There was a problem hiding this comment.
I'm just wondering if having type extension here would mean restart variables and output variables (both non-parameter and parameter) would be mutually exclusive to each other?
Is that an issue? I don't see a reason for restart variables to ever be user-configurable, the functionality that needs to be shared with a regular output variable is writing metadata and distributed writes (and you could potentially add the distributed reads to a
restart_variablefunctionality)?
It's not a huge issue, I just thought that was the direction we wanted to take considering all restart variables are a subset of the output variables.
My initial approach here was to actually declare the restart variables separately - as it might be easier to add support for reading restart variables here as well like you suggest.
Can you elaborate on how the variable definition and write subroutines would look with type extension? Would you split these subroutines up so that they are subclass specific, or would you keep it as is and downcast whenever subclass specific behaviour is needed?
Also, a pattern that I've noticed is that most of the conditional behaviour is based on the restart optional argument (which indicates whether the current file is a restart or not), rather than the %restart flag on each output variable instance (which indicates whether the variable should be written to the restart file). So I'm thinking the type extension at the cable_output_variable_t level wouldn't actually help things that much, maybe rather a type extension of cable_output_profile_t? It's probably overkill though - we could just remove the optional restart argument and have separate subroutines for each logical case.
As for distributed reads, the intention for that, with the current approach, was to have a module for reading restart fields which is separate from the output module.
There was a problem hiding this comment.
Did you have a plan for how to turn entries read from the output configuration file into
cable_output_variables?
It should be pretty easy I think. All diagnostics/fields are "registered" in memory, so all we have to do is discard the fields that weren't requested for a stream/profile and initialise the ones that were. We can also instantiate multiple copies of each cable_output_variable_t when we have variants of the same field (e.g. different reduction_method or aggregation_method).
There was a problem hiding this comment.
It should be pretty easy I think. All diagnostics/fields are "registered" in memory, so all we have to do is discard the fields that weren't requested for a stream/profile and initialise the ones that were. We can also instantiate multiple copies of each cable_output_variable_t when we have variants of the same field (e.g. different reduction_method or aggregation_method).
What do you mean by registered? I struggle to see how the current formulation (e.g. Qh would extend to many aggregations, frequencies and reduction methods
There was a problem hiding this comment.
What do you mean by registered? I struggle to see how the current formulation (e.g.
Qhwould extend to many aggregations, frequencies and reduction methods
See my reply here: #655 (comment)
| public :: associate_temp_buffer_real32 | ||
| public :: associate_temp_buffer_real64 | ||
|
|
||
| ! Decomposition pointers for each variable class and data type |
There was a problem hiding this comment.
Why are these defined here? Why don't we just define them when we use them, when we actually define the output variables? This hides information that will be necessary for future developers. If the decompositions are defined in the same location as the output variables, then all the information new developers need is in one place
There was a problem hiding this comment.
I'm not too worried about hiding them in this context, there will be developer documentation on how decompositions work and how to add support for new ones.
A downside with defining and specifying the decompositions with the declaration of output variables means we need to take into account the configuration options which impact the decomposition (e.g. the reduction_method, or whether we are writing restart a file) which makes the API much more bloated.
| accumulation_frequency=output_var%accumulation_frequency, & | ||
| var_name=output_var%name, & | ||
| file_name=global_profile%file_name & | ||
| ) |
There was a problem hiding this comment.
Why don't we also validate the reduction method here?
There was a problem hiding this comment.
It's done further up:
CABLE/src/util/io/output/cable_output_core.F90
Lines 101 to 103 in 3094a2b
There was a problem hiding this comment.
Yea I think it would make sense to group the validations that are "additional" to being simply one of the allowed names, so checking that the requested frequency is valid with the variable's accumulation frequency, and whether a given reduction method can be applied to a variable
There was a problem hiding this comment.
Please correct me if I've misunderstood - would moving the validation checks from cable_output_register_output_variables() to cable_output_profiles_init() make more sense?
There was a problem hiding this comment.
I don't think I looked hard enough at this, because I'm not sure I understand whats going on. What is cable_output_register_output_variables() for? Is that the definition of all possible output variables? I don't see how that could possibly be the case, due to the possible permutations of frequency, aggregation and reduction. It seems like we have the same loop in cable_output_register_output_variables() and cable_output_profiles_init(), except some of the variables have been culled before the loop in cable_output_profiles_init()? Is that right?
There was a problem hiding this comment.
Is that the definition of all possible output variables? I don't see how that could possibly be the case, due to the possible permutations of frequency, aggregation and reduction.
Yes you're right that it wouldn't be all possible output variables - just to be clear, I don't want to confuse this work with output redesign proposal, which would introduce the ability to specify aggregation and reduction methods on a per variable basis. In this work, variables with different aggregation methods are registered independently, as this is not currently configurable.
With the new output redesign proposal, the intention for the registered_output_variables list is to have a 1-to-1 relationship with all "working variables" which are used by the output module. This has a couple implications, namely: 1. each registered output variable entry would not specify an aggregation and reduction method (left undefined as this comes from the output configuration file), and 2. each registered output variable in registered_output_variables is unique in it's name (i.e. no duplicates). When an output variable is requested from the output configuration file, we loop over the registered_output_variables list and find the entry matching against the requested name component - once the entry is found, the entry is copied into the list of output variables for the current stream using Fortran's intrinsic assignment, here we also define the requested reduction and aggregation method. How does that sound?
There was a problem hiding this comment.
So the current design is strictly to allow it to work with the current way of managing CABLE output, through the namelist options? And then integrating the new output configuration file would come later?
…dule finalisation
876c111 to
64b3cf4
Compare
…tructor syntax for strings
…structor syntax for strings
3fe2db2 to
90433a0
Compare
📚 Documentation preview 📚: https://cable--655.org.readthedocs.build/en/655/