-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Porting Pixel Tracks to Alpaka [Not to Merge] #41117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34753 ERROR: Build errors found during clang-tidy run. |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34770 ERROR: Build errors found during clang-tidy run. |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/34807 ERROR: Build errors found during clang-tidy run. |
|
@AdrianoDee in case you didnt notice: you'll need to do code checks |
4fb1dd5 to
d83a814
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41117/37986
|
|
Pull request #41117 was updated. @Martin-Grunewald, @sunilUIET, @fwyzard, @makortel, @mandrenguyen, @consuegs, @nothingface0, @mmusich, @mdhildreth, @jfernan2, @perrotta, @fabiocos, @francescobrivio, @AdrianoDee, @civanch, @srimanob, @syuvivida, @davidlange6, @saumyaphor4252, @rvenditti, @antoniovagnerini, @antoniovilela, @miquork, @cmsbuild, @rappoccio, @tjavaid can you please check and sign again. |
PR description:
Common work with @borzari and @nothingface0.
This PR will allow to run Pixel Tracks Reconstruction in Alpaka. It's still a work in progress and needs to be properly tested. We are opening it so that it is (more) public and may be reviewed by experts.
Will updated the description accordingly while updating the PR.
This includes #40932 with the latest comments received addressed.
This is not to merge and it's here for testing purposes. It has been split in 8 smaller PRs, to be merged in sequence, to ease the review:
(@ericcano)
21st November
Tested with #43064, everything is fine. Some general clean-up renaming:
DataFormatsnow are in the formDataFormats/XYXSoA/;XYZHost,XYZDevice,XYZsSoACollection;*GPUobjects in Alpaka code either with*Deviceor nothing (e.g.GPUAlgo->Algo);CopyToHostmethods to avoid useless specialization forHosttoHostcopy;ASSERT_DEVICE_MATCHES_HOST_COLLECTIONeverywhereSET_PORTABLEHOSTCOLLECTION_READ_RULES;std::conditional_tfor collection Host/Device definition.The resolution problem was solved by @borzari spotting this (a great catch!):
15th November
This now includes #43064 up to 5f9c2e6.
19th October
We will use this PR as a proxy for the full development in order to be able to run the integration tests. Changing the status to "Ready to review" to be able to run the bot commands and checks.
Module Naming
For the moment we applied the following rule for the naming:
CUDAwe simply drop theCUDAsuffix;Alpakato the module name.Where 2. usually applies to SoA to legacy converters.
Additional workflows
An
alpakaprocess modifier is added togheter with a set of new workflows:*.55running Pixel only in Alpaka;*.554running Pixel only in Alpaka for profiling;*.557running Pixel only in Alpaka for CPU vs GPU validation;A note: in order to cohabit with the CUDA workflows, for the modules providing the conversion to legacy formats, we had to live with the
SwitchProducedCUDAlogic. For example, for the local reco configurations,siPixelRecHitsPreSplittingis defined as:and in order to be able to modify or replace it with
toModifyortoReplaceWith, thealpakamodifier acts on thecpubranch of theSwitchProducedCUDA.This was the only way we found to keep the same naming for the final AoS products.
Run3 Physics Results
Find here all the validation plots from MTV for Run3 ttbar.
Results are almost perfectly overlapping with the exception for the$d_{xy}$ resolution that is degradated (see e.g. here). We are investigating this and should have spotted the culprit.
Run3 Througput
Running a profiling workflow on Run3 data (Run 370293) on
fu-c2a02-37-02we see a degradation in performance (around 20% in througput).Note that when running a single EDM stream CUDA and Alpaka throughput are the same.
20th October
With 66f48f9 fixed tests (thanks to @ericcano). For the moment commented the
testOneHistoContainertests since the issue is solved in #43064.RecoTracker/PixelTrackFitting/testEigenGPUNoFit_tfails also in a cleanCMSSW_13_3_X_2023-10-18-1100.