-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Remove obsolete CUDA-using modules from DQM/SiPixelHeterogeneous #49697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These modules seem to have been supersed by more generic ones: - SiPixelCompareVertexSoA -> SiPixelCompareVertices - SiPixel*CompareRecHitsSoA -> SiPixel*CompareRecHits - SiPixel*CompareTrackSoA -> SiPixel*CompareTracks - SiPixel*MonitorRecHitsSoA -> SiPixel*MonitorRecHitsSoAAlpaka - SiPixel*MonitorTrackSoA -> SiPixel*MonitorTrackSoAAlpaka - SiPixelMonitorVertexSoA -> SiPixelMonitorVertexSoAAlpaka
|
cms-bot internal usage |
|
FYI @cms-sw/heterogeneous-l2 |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49697/47255
|
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@cmsbuild, @ctarricone, @gabrielmscampos, @nothingface0, @rseidita can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
test parameters:
|
|
@cmsbuild, please test |
|
|
||
| # Run-3 sequence | ||
| monitorpixelSoASource = cms.Sequence(siPixelPhase1MonitorRecHitsSoA * siPixelPhase1MonitorTrackSoA * siPixelMonitorVertexSoA) | ||
| monitorpixelSoASource = cms.Sequence() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me if these empty sequences serve any purpose anymore other than being placeholders that are toReplaceWith() below with various modifiers.
| monitorpixelSoACompareSource = cms.Sequence(siPixelPhase1MonitorRawDataACPU * | ||
| siPixelPhase1MonitorRawDataAGPU * | ||
| siPixelPhase1MonitorRecHitsSoACPU * | ||
| siPixelPhase1MonitorRecHitsSoAGPU * | ||
| siPixelPhase1CompareRecHitsSoA * | ||
| siPixelPhase1MonitorTrackSoAGPU * | ||
| siPixelPhase1MonitorTrackSoACPU * | ||
| siPixelPhase1CompareTrackSoA * | ||
| siPixelMonitorVertexSoACPU * | ||
| siPixelMonitorVertexSoAGPU * | ||
| siPixelCompareVertexSoA * | ||
| siPixelPhase1RawDataErrorComparator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me if the remaining 3 modules in this Sequence would be useful, or if it would be better to remove them as well.
|
-1 Failed Tests: RelVals-NVIDIA_T4 Failed RelVals-NVIDIA_T4ValueError: Undefined workflows: 29834.751, 29834.404, 29834.402, 29834.704, 29834.403 Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
|
I wonder what this means in practice |
I opened an issue #49700 |
|
@cmsbuild, please test cms-sw/cms-bot#2636 was merged |
|
CPU comparison differences are related to #47071
In RelVal-INPUT tests the workflow the input file for On NVIDIA H100 the runTheMatrix tests fail with that could be an infrastructure problem. |
|
ignore tests-rejected with external-failure |
|
please test |
|
-1 Size: This PR adds an extra 16KB to repository Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
Several of the 34634.X workflows continue to fail on NVIDIA H100 as in #49697 (comment). Maybe it's not an infrastructure problem (since other workflows succeed), but something else that would be better investigated separately from this PR. |
Those workflows succeed in IBs though |
|
@cms-sw/dqm-l2 Could you please review and sign? This PR resolves the IB failures in the 4 workflows listed in #49697 (comment). |
|
+dqm
|
|
This pull request is fully signed and it will be integrated in one of the next master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
@cms-sw/tracking-pog-l2 can you review the DQM sequences highlighted by @makortel comments ? |
|
+1 |
PR description:
This PR removes CUDA-depending modules from
DQM/SiPixelHeterogeneous. The inclusion of these modules in runTheMatrix workflow configurations came up in failures followingCUDADataFormatsdictionary removal in #49656 (comment). Since all direct CUDA components are slated for removal (#45844), this PR suggests to remove them. These components seem to have been superseded by more generic ones in #45206.Resolves cms-sw/framework-team#1742
PR validation:
Workflows 11634.5 and 34434.5 succeeded.