Muphys: Remove copies of arrays in false branches#1106
Muphys: Remove copies of arrays in false branches#1106iomaganaris wants to merge 191 commits intomainfrom
Conversation
…merge' into muphys_bug_fix
Clean up the graupel_only driver, and create an integration test to run through pytest. Co-authored-by: Will Sawyer <wsawyer@cscs.ch>
Co-authored-by: Will Sawyer <vectorflux@gmail.com> Co-authored-by: Will Sawyer <wsawyer@cscs.ch>
|
cscs-ci run default |
|
cscs-ci run distributed |
| if gtx_transformations.GT4PyAutoOptHook.TopLevelDataFlowPre not in optimization_hooks: | ||
| optimization_hooks[gtx_transformations.GT4PyAutoOptHook.TopLevelDataFlowPre] = ( | ||
| dace_hooks.graupel_run_self_copy_removal_inside_scan | ||
| ) |
There was a problem hiding this comment.
We can pass this from muphys_wrapper, no? Then we can move the dace_hooks file to muphys?
There was a problem hiding this comment.
Good idea, I am testing it.
|
cscs-ci run default |
|
cscs-ci run default |
|
cscs-ci run dace |
|
cscs-ci run distributed |
|
It works on GPU, but validation fails on CPU. I see errors in CPU validation also in muphys-ppp. It could be that we have missed propagating some strides, I will have a look. |
|
cscs-ci run dace |
|
cscs-ci run dace |
|
cscs-ci run dace |
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
The

graupelSDFG looks like the following:In both maps there are outputs whose values are determined based on if-statements that check if a mask or multiple masks are activated. In case they are not the values of the maps are updated with the inputs without any change.
Since we know that the inputs and outputs are the same pointers we can improve this patter by removing the copies in the false branches of the if-statements and replacing the intermediate temporary
AccessNodes with the globalAccessNodes that are used as outputs of the program.To be more specific, the
AccessNodes where this is applied are:q_in_2->q_out_2q_in_3->q_out_3q_in_4->q_out_4q_in_5->q_out_5te->t_outThis is the updated SDFG: