Skip to content

Muphys: Remove copies of arrays in false branches#1106

Open
iomaganaris wants to merge 191 commits intomainfrom
outer_mask_graupel_rmcopies2
Open

Muphys: Remove copies of arrays in false branches#1106
iomaganaris wants to merge 191 commits intomainfrom
outer_mask_graupel_rmcopies2

Conversation

@iomaganaris
Copy link
Collaborator

@iomaganaris iomaganaris commented Mar 12, 2026

The graupel SDFG looks like the following:
image
In both maps there are outputs whose values are determined based on if-statements that check if a mask or multiple masks are activated. In case they are not the values of the maps are updated with the inputs without any change.
Since we know that the inputs and outputs are the same pointers we can improve this patter by removing the copies in the false branches of the if-statements and replacing the intermediate temporary AccessNodes with the global AccessNodes that are used as outputs of the program.
To be more specific, the AccessNodes where this is applied are:

  • q_in_2 -> q_out_2
  • q_in_3 -> q_out_3
  • q_in_4 -> q_out_4
  • q_in_5 -> q_out_5
  • te -> t_out
    This is the updated SDFG:
image

fomics and others added 30 commits September 11, 2025 12:17
Clean up the graupel_only driver, and create an integration test to run through pytest. 

Co-authored-by: Will Sawyer <wsawyer@cscs.ch>
Co-authored-by: Will Sawyer <vectorflux@gmail.com>
Co-authored-by: Will Sawyer <wsawyer@cscs.ch>
@iomaganaris iomaganaris changed the title Outer mask graupel rmcopies2 Remove copies of arrays in false branches Mar 12, 2026
@edopao edopao changed the title Remove copies of arrays in false branches Muphys: Remove copies of arrays in false branches Mar 12, 2026
@iomaganaris iomaganaris changed the base branch from graupel_gpu_opt to main March 12, 2026 15:16
@iomaganaris
Copy link
Collaborator Author

cscs-ci run default

@iomaganaris
Copy link
Collaborator Author

cscs-ci run distributed

Comment on lines +61 to +64
if gtx_transformations.GT4PyAutoOptHook.TopLevelDataFlowPre not in optimization_hooks:
optimization_hooks[gtx_transformations.GT4PyAutoOptHook.TopLevelDataFlowPre] = (
dace_hooks.graupel_run_self_copy_removal_inside_scan
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pass this from muphys_wrapper, no? Then we can move the dace_hooks file to muphys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I am testing it.

@edopao
Copy link
Contributor

edopao commented Mar 18, 2026

cscs-ci run default

@edopao
Copy link
Contributor

edopao commented Mar 18, 2026

cscs-ci run default

@edopao
Copy link
Contributor

edopao commented Mar 18, 2026

cscs-ci run dace

@edopao
Copy link
Contributor

edopao commented Mar 18, 2026

cscs-ci run distributed

@edopao
Copy link
Contributor

edopao commented Mar 18, 2026

It works on GPU, but validation fails on CPU. I see errors in CPU validation also in muphys-ppp. It could be that we have missed propagating some strides, I will have a look.

@edopao
Copy link
Contributor

edopao commented Mar 19, 2026

cscs-ci run dace

@edopao
Copy link
Contributor

edopao commented Mar 20, 2026

cscs-ci run dace

@edopao
Copy link
Contributor

edopao commented Mar 20, 2026

cscs-ci run dace

@github-actions
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants