Skip to content

Conversation

@leobeltra
Copy link

PR description:

This PR removes the workaround introduced in #47931 for the issue in GPU relvals found in #47808.

The problem has been addressed by reverting some changes introduced in #47306. In particular, size_ has been deleted from SoAParametersImpl.

When CUDA 13.1 is released, it will be possible to add size_ for other purposes, and the bug should be fixed (since a bug report has been opened and closed).

PR validation:

Workflows 12834.402 and 29661.402 work.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48854/46002

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2025

A new Pull Request was created by @leobeltra for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaInterface (heterogeneous)
  • RecoTracker/PixelSeeding (reconstruction)
  • RecoVertex/PixelVertexFinding (reconstruction)

@cmsbuild, @fwyzard, @jfernan2, @makortel, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @fabiocos, @felicepantaleo, @gpetruc, @makortel, @martinamalberti, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Sep 5, 2025

allow @leobeltra test rights

@leobeltra
Copy link
Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2025

-1

Failed Tests: Build
Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fc624c/47991/summary.html
COMMIT: d33308f
CMSSW: CMSSW_15_1_X_2025-09-05-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48854/47991/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

Copying tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/plugins/RecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync/libRecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderByDensity_tROCmAsync/libdeviceVertexFinderByDensity_tROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/plugins/RecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync/libRecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/plugins/RecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync/libRecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync_rocm.a
Copying tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderDBSCAN_tROCmAsync/libdeviceVertexFinderDBSCAN_tROCmAsync_rocm.a to productstore area:
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/plugins/RecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync/libRecoPixelVertexingPixelVertexFindingPluginsPortableROCmAsync_rocm.a] Error 1
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderByDensity_tROCmAsync/libdeviceVertexFinderByDensity_tROCmAsync_rocm.a': No such file or directory
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderDBSCAN_tROCmAsync/libdeviceVertexFinderDBSCAN_tROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderByDensity_tROCmAsync/libdeviceVertexFinderByDensity_tROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderByDensity_tROCmAsync/libdeviceVertexFinderByDensity_tROCmAsync_rocm.a] Error 1
>> Deleted: tmp/el8_amd64_gcc12/src/RecoVertex/PixelVertexFinding/test/deviceVertexFinderDBSCAN_tROCmAsync/libdeviceVertexFinderDBSCAN_tROCmAsync_rocm.a


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not remove this file.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 5, 2025

enable gpu

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 6, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48854/46008

@leobeltra leobeltra force-pushed the workaround#47931_removed branch from 44f9b6e to 01bde69 Compare September 6, 2025 14:42
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2025

Pull request #48854 was updated. @jfernan2, @mandrenguyen can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 9, 2025

+heterogeneous

The changes look OK, assuming all the tests pass.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 9, 2025

assign heterogeneous

:-)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2025

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fc624c/48572/summary.html
COMMIT: 82287c0
CMSSW: CMSSW_16_0_X_2025-10-09-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48854/48572/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4172438
  • DQMHistoTests: Total failures: 502
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4171916
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 226 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 261 differences found in the comparisons
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 146621
  • DQMHistoTests: Total failures: 27311
  • DQMHistoTests: Total nulls: 14
  • DQMHistoTests: Total successes: 119296
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: found differences in 1 / 10 workflows

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 218 differences found in the comparisons
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 146621
  • DQMHistoTests: Total failures: 34375
  • DQMHistoTests: Total nulls: 9
  • DQMHistoTests: Total successes: 112237
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: found differences in 2 / 10 workflows

NVIDIA_T4 Comparison Summary

Summary:

@jfernan2
Copy link
Contributor

+1

@leobeltra
Copy link
Author

kind ping

@fwyzard
Copy link
Contributor

fwyzard commented Oct 28, 2025

unhold

@fwyzard
Copy link
Contributor

fwyzard commented Oct 28, 2025

please test

Let's refresh the test results since they are over two weeks old.
Otherwise, the PR should be good to be merged.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fc624c/48868/summary.html
COMMIT: 82287c0
CMSSW: CMSSW_16_0_X_2025-10-28-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48854/48868/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4172318
  • DQMHistoTests: Total failures: 49
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4172249
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 226 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 6a7bc96 into cms-sw:master Oct 29, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants