Skip to content

Conversation

@mo-joshuacolclough
Copy link

@mo-joshuacolclough mo-joshuacolclough commented Oct 17, 2025

Add tests to ensure ectrans/transi works with split MPI communicators. Requires no changes to ectrans, but does require an extension to transi to allow for setting of the fiat default MPL communicator.

Added:

  • Test which performs a simple inv trans such as in the usage example.
  • trans_set_mpi_comm method to set the user communicator prior to setup of trans. Includes some error checking to ensure correct usage (prior to trans_init. And if it's not, ignore if it's called with the same communicator that has already been set).
  • A simple test to ensure trans_set_mpi_comm behaves as expected in transi.

Copy link
Author

@mo-joshuacolclough mo-joshuacolclough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(From discussion with @MarekWlasak)

@samhatfield
Copy link
Collaborator

Looks like a good start Josh, thanks. I should have mentioned before, but all PRs must be from branches started from develop (as well as targeting develop). It looks like you've started from main, hence the apparent conflicts with CMakeLists.txt. I'd recommend rebasing all of your commits on top of develop before continuing to save yourself some headaches later.

@mo-joshuacolclough
Copy link
Author

mo-joshuacolclough commented Oct 17, 2025

Thanks @samhatfield - we are working off of the tagged release we are using currently, to avoid any potential complications with using develop in our codebase... (Happy to deal with the headaches at a later date... :) ) Just while we get things working. I don't intend for this to get merged, just for sharing at the moment.

We are pleased so far that the changes needed seem to be quite minimal!

@samhatfield
Copy link
Collaborator

Thanks @samhatfield - we are working off of the tagged release we are using currently, to avoid any potential complications with using develop in our codebase... (Happy to deal with the headaches at a later date... :) ) Just while we get things working. I don't intend for this to get merged, just for sharing at the moment.

We are pleased so far that the changes needed seem to be quite minimal!

That's fair enough! Okay then, let's discuss what you've done so far in our next call.

Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress on building up some test cases.
You can keep the transi developments here for now. I find it useful as a discussion point. We can always split the PR in pieces later.

@wdeconinck
Copy link
Collaborator

wdeconinck commented Oct 26, 2025

Thanks @samhatfield - we are working off of the tagged release we are using currently, to avoid any potential complications with using develop in our codebase... (Happy to deal with the headaches at a later date... :) ) Just while we get things working. I don't intend for this to get merged, just for sharing at the moment.

We are pleased so far that the changes needed seem to be quite minimal!

I still believe forking from a develop branch will be less painful, especially because 1.6.2 is quite old. Already this (draft, granted) PR is not mergeable due to merge conflicts. Version 1.7.0 is now out. Perhaps you can test this tagged release in the stack; That feedback is also valuable.
And then if you can, rebase your changes either on 1.7.0 or develop.

@mo-joshuacolclough
Copy link
Author

Thanks @samhatfield @wdeconinck - I have merged develop into the branch to bring it up-to-date, and I've tidied the changes and used MPL where I can. Please can you take another look when you get the chance!

@samhatfield
Copy link
Collaborator

Thanks @mo-joshuacolclough! Yes we will take a look soon.

@wdeconinck
Copy link
Collaborator

Hi @mo-joshuacolclough this looks great to me now, and a good starting point to begin the work of making the MPI communicator configurable per trans handle.

@samhatfield
Copy link
Collaborator

samhatfield commented Nov 10, 2025

Couple of early comments:

  • If you want to link a target (such as the new transi tests) against MPI::MPI_C you'll need to add C to the COMPONENTS of MPI in the REQUIRED_PACKAGES argument of ecbuild_add_option( FEATURE MPI ...
  • I think that the proper way to set a new default communicator with MPL is through MPL_SETDFLT_COMM. This means that instead of setting LMPLUSERCOMM and MPLUSERCOMM, you should simply call MPL_SETDFLT_COMM(MPI_USER_COMM, DUMMY_COMM). This subroutine always returns the last default communicator as the second argument, but you can just pass a dummy argument and then ignore it. This affects trans_set_mpi_comm but also test_split_mpi_comm.
  • I think test_split_mpi_comm can be written without any explicit MPI calls and this is a strong preference for us. Going forward we want all MPI functionality to be wrapped under the MPL layer. For example:
    -call MPI_Init(ierror)
    -call MPI_Comm_rank(MPI_COMM_WORLD, world_rank, ierror)
    -call MPI_Comm_size(MPI_COMM_WORLD, world_num_ranks, ierror)
    +CALL MPL_INIT()
    +WORLD_NUM_RANKS = MPL_NPROC()
    
     split_colour = get_split_group()
    -split_key = world_rank    ! Key used here is the WORLD rank.
    -call MPI_Comm_split(MPI_COMM_WORLD, split_colour, split_key, split_comm, ierror)
    +SPLIT_KEY = MPL_MYRANK() ! Key used here is the WORLD rank.
    
    -print*,"=== Rank ", world_rank, ", Setup on group", split_colour, "==="
    +PRINT *, "=== Rank ", SPLIT_KEY, ", Setup on group", SPLIT_COLOUR, "==="
    
    -! Set MPL comm. Global variables from fiat MPL_DATA_MODULE.
    -! Acts like arguments to MPL_INIT.
    -LMPLUSERCOMM = .true.
    -MPLUSERCOMM = split_comm
    -call MPL_INIT()
    +CALL MPL_COMM_SPLIT(MPL_COMM, SPLIT_COLOUR, SPLIT_KEY, SPLIT_COMM, IERROR)
    +CALL MPL_SETDFLT_COMM(SPLIT_COMM, DUMMY_COMM)
    This requires a bugfix PR to FIAT to be merged: Make sure MPL_RANK is updated with new comm fiat#72.
    Note that the call to MPI_FINALIZE is also unnecessary.
  • get_split_group can be modified to avoid MPI calls as well, just making sure not to add +1 when calculating rank_ratio. Though this function is probably unnecessary; you could just say SPLIT_COLOUR = MERGE(0, 1, MPL_MYRANK() == 1) or something.
  • Style: we really have a messy situation with some upper, some lower case Fortran. transi sort of follows a more modern style, but the core library is still shouty Fortran, so I would say that test_split_mpi_comm.F90 should also be entirely uppercase.
  • As for the transi test, there are again many calls to raw MPI, but I'm not sure what the preferred style is there. @wdeconinck, MPL has no C bindings right, so how should people use transi in an MPI environment?

@samhatfield
Copy link
Collaborator

BTW I think I planted the seed of the LMPLUSERCOMM style in one of our meetings - if so sorry about that 😆 I literally only just discovered that MPL_SETDFLT_COMM exists 5 minutes ago.

@mo-joshuacolclough
Copy link
Author

Thanks Sam, can confirm that the test works with your changes - 0e68acf. Now updating the transi code.

@wdeconinck
Copy link
Collaborator

I'm of the opinion that whatever can be done with MPL is good to do with MPL especially in the Fortran tests.

However! ectrans should also be able to be used within codes that don't use MPL and be able to be set up with regular MPI communicators and be interoperable with MPI routines and MPI comms. MPL should not be the ground truth for how MPI is run in packages that use ectrans. For this reason, I found that also previous implementation using MPI in the driver layer in Fortran was OK in my opinion.

For transi, the use of MPL is really an implementation detail of ectrans, and standard MPI calls are advised. In the context of Atlas these mpi communicators are managed and split via eckit.
transi library itself should not explicitly link with MPI::MPI_C, but unit tests that set up transi for MPI should be able to link with MPI::MPI_C indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants