Skip to content

Domain decomposition and halo construction#540

Merged
msimberg merged 375 commits intomainfrom
halo_construction
Mar 10, 2026
Merged

Domain decomposition and halo construction#540
msimberg merged 375 commits intomainfrom
halo_construction

Conversation

@halungge
Copy link
Contributor

@halungge halungge commented Sep 6, 2024

Decompose (global) grid file:

  • uses pymetis to decompose the global grid (cells) into n patches
  • after decomposition halos for all dimensions (cell, edge, vertex) are constructed. Halo construction is done in a ICON like fashion: They consist halos of 2 cell levels (one upward and one downward pointing line) and the corresponding vertices and edges on these lines.

Omissions:

  • LAM grids need to be investigated further:

    • tests comparing decomposed vs. single_node computation are only run on the global grid.
    • for the LAM grids ICON reorders arrays to arrange halo points on the first boundary layers together with the boundary layers, it should be investigated whether that is essential in the model.
    • This PR does only take this into account on the computation of the start_index and end_index not in the halo construction.
  • the number of halo lines (in terms of cells) is hardcoded to 2, that could be made a parameter.

  • Not sure it all runs on GPU correctly... most probably there are some numpy cupy issues to fix.

Magdalena Luz added 26 commits September 18, 2025 11:34
# Conflicts:
#	model/common/src/icon4py/model/common/grid/base.py
#	model/common/src/icon4py/model/common/grid/grid_manager.py
#	model/common/src/icon4py/model/common/grid/horizontal.py
#	model/common/src/icon4py/model/common/grid/refinement.py
#	model/common/tests/common/decomposition/mpi_tests/test_mpi_decomposition.py
#	model/common/tests/common/grid/unit_tests/test_refinement.py
# Conflicts:
#	model/common/src/icon4py/model/common/grid/grid_manager.py
#	model/testing/src/icon4py/model/testing/grid_utils.py
# Conflicts:
#	model/atmosphere/diffusion/tests/diffusion/mpi_tests/test_parallel_diffusion.py
#	model/atmosphere/dycore/tests/dycore/mpi_tests/test_parallel_solve_nonhydro.py
#	model/common/src/icon4py/model/common/decomposition/definitions.py
#	model/common/tests/common/decomposition/fixtures.py
#	model/common/tests/common/decomposition/mpi_tests/test_mpi_decomposition.py
#	model/common/tests/common/decomposition/unit_tests/test_definitions.py
#	model/common/tests/common/grid/mpi_tests/test_parallel_icon.py
#	model/testing/src/icon4py/model/testing/definitions.py
#	model/testing/src/icon4py/model/testing/grid_utils.py
#	model/testing/src/icon4py/model/testing/serialbox.py
#	tools/src/icon4py/tools/py2fgen/wrappers/common.py
# Conflicts:
#	model/common/pyproject.toml
#	model/common/src/icon4py/model/common/decomposition/definitions.py
#	model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py
#	model/common/src/icon4py/model/common/grid/grid_manager.py
#	model/testing/src/icon4py/model/testing/grid_utils.py
#	tach.toml
Comment on lines +263 to +265
my_cell_indices = self._decomposition_info.global_index(dims.CellDim)
my_edge_indices = self._decomposition_info.global_index(dims.EdgeDim)
my_vertex_indices = self._decomposition_info.global_index(dims.VertexDim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly for later cleanup, make these and their friends in _read_coordinates class members?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, definitely for later.

@msimberg
Copy link
Contributor

msimberg commented Mar 9, 2026

cscs-ci run default

@msimberg
Copy link
Contributor

msimberg commented Mar 9, 2026

cscs-ci run distributed

@msimberg msimberg force-pushed the halo_construction branch from acdf71e to 301c041 Compare March 9, 2026 11:47
@msimberg msimberg force-pushed the halo_construction branch from 8114383 to 6df6af6 Compare March 9, 2026 13:58
@msimberg msimberg force-pushed the halo_construction branch from 914bcbc to f204644 Compare March 9, 2026 14:19
@msimberg
Copy link
Contributor

msimberg commented Mar 9, 2026

cscs-ci run default

@msimberg
Copy link
Contributor

msimberg commented Mar 9, 2026

cscs-ci run distributed

Copy link
Contributor

@jcanton jcanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very important

@msimberg
Copy link
Contributor

cscs-ci run default

@github-actions
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@msimberg
Copy link
Contributor

cscs-ci run distributed

@msimberg msimberg merged commit 6fc8c37 into main Mar 10, 2026
48 checks passed
@msimberg msimberg deleted the halo_construction branch March 10, 2026 10:55
jcanton added a commit that referenced this pull request Mar 10, 2026
* main:
  Domain decomposition and halo construction (#540)
msimberg added a commit that referenced this pull request Mar 13, 2026
#540 added mpi4py as required dependency to the typing group. It is in
principle required, but this makes mpi4py be installed by default with a
regular `uv sync` which is suboptimal. Removing it for now from the
typing group. mpi4py is still a required dependency in the distributed
group.
jcanton added a commit that referenced this pull request Mar 18, 2026
* main: (29 commits)
  Scheduled Halo Exchange (#980)
  Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114)
  Muphys: Lowering with single precision (#1101)
  Add single-rank lsq pseudoinv factory test (#1099)
  Cleanup Diffusion config (#1060)
  Fortran bindings: fix numpy allocation and cleanups (#1112)
  fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111)
  py2fgen: don't recompile if unchanged (#1110)
  CI for standalone_driver (#1070)
  Update mpi4py and pymetis groups to make them optional (#1100)
  Bump mshick/add-pr-comment from 2 to 3 (#1109)
  Use inout fields for full_muphys as well (#1108)
  Update GPU configuration for graupel (#1104)
  Move the mask of _q_t_update outside in graupel (#1093)
  Update gt4py to v1.1.7 (#1105)
  cleanup for ugly if condition of single node default in lsq coeffs (#1103)
  Domain decomposition and halo construction (#540)
  Muphys: Add flag to wait for graupel completion (#1095)
  Give each gt4py program a return type hint (#1087)
  Turn data download off for distributed CI (#1092)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants