-
Notifications
You must be signed in to change notification settings - Fork 4.6k
CPU vs. GPU for LST in HLT and updates to the offline #49832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47487
|
|
A new Pull Request was created by @VourMa for master. It involves the following packages:
@AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @jfernan2, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
assign heterogeneous |
| numWFIB.extend([prefixDet+34.7521])# HLTTiming75e33, ticl_v5, ticlv5TrackLinkingGNN | ||
| numWFIB.extend([prefixDet+34.753]) # HLTTiming75e33, alpaka,singleIterPatatrack | ||
| numWFIB.extend([prefixDet+34.754]) # HLTTiming75e33, alpaka,singleIterPatatrack,trackingLST | ||
| numWFIB.extend([prefixDet+34.7541]) # HLTTiming75e33, alpakaValidationLST,singleIterPatatrack,trackingLST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this go rather in the gpu matrix? How do I test this from the bot with a GPU backend available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. Should be fixed in the last push.
6fd4c49 to
2cf3f6b
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47496
|
|
Pull request #49832 was updated. @AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @fwyzard, @gabrielmscampos, @jfernan2, @makortel, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please check and sign again. |
|
enable gpu |
|
test parameters:
|
|
@cmsbuild, please test |
the gpu matrix didn't run, despite #49832 (comment) + #49832 (comment). Not sure what's the right way of configuring it. |
|
test parameters:
|
|
@cmsbuild, please test |
|
Are the tests stuck on this PR? |
|
yes @VourMa , tests were stuck. I just have force rebuild the pending tests |
Thanks a lot, @smuzaffar! |
|
-1 Failed Tests: RelVals-NVIDIA_H100 HLT P2 Timing: chart Failed RelVals-NVIDIA_H100ValueError: Undefined workflows: 34634.704 Comparison SummarySummary:
|
|
I do not see where I might have missed a |
I think here |
Oh, OK, thanks! |
🤷♂️ |
The relevant PR has been made: cms-sw/cms-bot#2663 |
|
test parameters:
|
|
@cmsbuild, please test |
|
-1 Failed Tests: UnitTests RelVals-NVIDIA_L40S HLT P2 Timing: chart Failed Unit TestsI found 1 errors in the following unit tests: ---> test RecoTrackerLSTCore-standalone-compilation had ERRORS Failed RelVals-NVIDIA_L40S
Comparison SummarySummary:
|
|
The failed RelVals are due to the recent, usual error: while the failed unit test is unrelated and fixed in #49895. |
The goal of this PR is to introduce two HLT workflows to monitor the agreement between LST on CPU and LST on GPU:
alpakaValidationLST,singleIterPatatrack,trackingLST.singleIterPatatrack,phase2CAExtension,trackingLST,seedingLST,trackingMkFitCommon,hltTrackingMkFitInitialStep.The additional CPU reconstruction (
SerialSync) and comparison plots are implemented with a new procModifier,alpakaValidationLST. This procModifier needs to be run only in the procModifier combinations mentioned above to take effect, otherwise it produces neither the additional products nor the comparison plots. It is also included in thealpakaValidationmodifier chain.The analyzer that produces the comparison plots has been improved with a new parameter option to skip luminosity and PU plots.
With the introduction of the
alpakaValidationLSTmodifier, the offline workflow testing LST on CPU vs. LST on GPU can be made explicit. The code is changed so that the heterogeneous workflow0.712(previously0.704) runs the offline reconstruction without any additional CPU reconstruction, while a new workflow,0.713, runs the comparison. Workflow0.703has also been renamed to0.711. The workflow numbering changes are made so that the offline LST workflows follow the numbering conventions for Alpaka workflows.Some screenshots of the content of the DQM file:


