-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Extend CMSSW to a distributed application over MPI [16.0.x] #49930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: CMSSW_16_0_X
Are you sure you want to change the base?
Conversation
Let multiple CMSSW processes on the same or different machines coordinate event processing and transfer data products over MPI. The implementation is based on four CMSSW modules. Two are responsible for setting up the communication channels and coordinate the event processing: - a "remote controller" called MPIController; - a "remote source" called MPISource; and two are responsible for the transfer of data products: - a "sender" called MPISender; - a "receiver" called MPIReceiver. Data products can be serialised and transferred using the trivial serialisation from HeterogeneousCore/TrivialSerialisation - if available - or the ROOT-based serialisation. Various tests are used to validate the implementation: matching the local and remote event id, transferring SoA products with trivial serialisation, and transferring legacy product with ROOT serialisation. Co-authored-by: Andrea Bocci <[email protected]> Co-authored-by: Anna Polova <[email protected]> Co-authored-by: Mario Gonzalez <[email protected]>
|
A new Pull Request was created by @fwyzard for CMSSW_16_0_X. It involves the following packages:
@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here
|
|
cms-bot internal usage |
|
please test with #49882 |
|
+heterogeneous |
|
This pull request is fully signed and it will be integrated in one of the next CMSSW_16_0_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_16_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @ftenchini, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
|
type ngt |
|
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
|
|
Ops, looks like I deleted the branch by mistake :-/ |
|
please test |
|
-1 Failed Tests: UnitTests Failed Unit TestsI found 1 errors in the following unit tests: ---> test testMPISoATransfer had ERRORS Comparison SummarySummary:
|
8ba3e94 to
c4ae7ce
Compare
|
+heterogeneous |
|
This pull request is fully signed and it will be integrated in one of the next CMSSW_16_0_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_16_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
please test with #49882 |
|
-1 Failed Tests: UnitTests Failed Unit TestsI found 1 errors in the following unit tests: ---> test testMPISoATransfer had ERRORS Comparison SummarySummary:
|
|
urgent aimed at CMSSW_16_0_0 |
|
hold Need to merge #49882 first. |
|
Pull request has been put on hold by @fwyzard |
PR description:
Let multiple CMSSW processes on the same or different machines coordinate event processing and transfer data products over MPI.
The implementation is based on four CMSSW modules. Two are responsible for setting up the communication channels and coordinate the event processing:
Data products can be serialised and transferred using the trivial serialisation from HeterogeneousCore/TrivialSerialisation - if available - or the ROOT-based serialisation.
Various tests are used to validate the implementation: matching the local and remote event id, transferring SoA products with trivial serialisation, and transferring legacy product with ROOT serialisation.
PR validation:
See #32632.
If this PR is a backport please specify the original PR and why you need to backport that PR.
Backport of #32632 to 16.0.x for testing online in parallel to the 2026 data taking.