Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented May 13, 2025

Drop support for the NVTX2 library.

Update PyTorch and its extensions to find the NVTX3 headers.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el8_aarch64_gcc12

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_1_X/master.

@iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented May 13, 2025

cms-bot internal usage

@fwyzard fwyzard changed the title [do not merge] Update CUDA to version 12.9.0 and cuDNN to version 9.9.0 Update CUDA to version 12.9.0 and cuDNN to version 9.9.0 May 13, 2025
@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el8_amd64_gcc14

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el10_amd64_gcc14

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

GPU tests already passed in #9833 .

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46080/summary.html
COMMIT: 9f730e3
CMSSW: CMSSW_15_1_X_2025-05-08-2300/el10_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9856/46080/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

Requested to quit.
Requested to quit.
* The action "build-external+rdma-core+57.0-b28ed27528b25e00c98373c700e92f5f" was not completed successfully because Failed to build rdma-core. Log file in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el10_amd64_gcc14/external/rdma-core/57.0-b28ed27528b25e00c98373c700e92f5f/log. Final lines of the log file:
warning: Macro expanded in comment on line 404: %{pkginstroot}/lib64

error: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/SPECS/external/rdma-core/57.0-b28ed27528b25e00c98373c700e92f5f/spec: line 418: Patch number not specified: patch -p1
0<  (%patch)

* The action "install-external+rdma-core+57.0-b28ed27528b25e00c98373c700e92f5f" was not completed successfully because The following dependencies could not complete:
build-external+rdma-core+57.0-b28ed27528b25e00c98373c700e92f5f
* The action "download-upload-store-external+rdma-core+57.0-b28ed27528b25e00c98373c700e92f5f-1-1.el10_amd64_gcc14.rpm" was not completed successfully because The following dependencies could not complete:


@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46077/summary.html
COMMIT: 9f730e3
CMSSW: CMSSW_15_1_X_2025-05-12-2300/el8_aarch64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9856/46077/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

external/onnxruntime_external_deps.cmake:550 (include)
CMakeLists.txt:614 (include)


-- Configuring incomplete, errors occurred!
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.4kd5xI (%build)


RPM build errors:
Macro expanded in comment on line 404: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}



@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46081/summary.html
COMMIT: 9f730e3
CMSSW: CMSSW_15_1_X_2025-05-13-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9856/46081/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 14 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4038163
  • DQMHistoTests: Total failures: 41
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4038102
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46079/summary.html
COMMIT: 9f730e3
CMSSW: CMSSW_15_1_X_2025-05-12-1100/el8_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9856/46079/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

external/onnxruntime_external_deps.cmake:550 (include)
CMakeLists.txt:614 (include)


-- Configuring incomplete, errors occurred!
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.9ehWHx (%build)

RPM build warnings:
Macro expanded in comment on line 404: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}




@fwyzard
Copy link
Contributor Author

fwyzard commented May 14, 2025

The onnxruntime failure should be addressed by #9863 .

@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_cuda_12.9.0_for_ARM branch from 9f730e3 to 4089780 Compare May 14, 2025 16:09
@cmsbuild
Copy link
Contributor

Pull request #9856 was updated.

@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_cuda_12.9.0_for_ARM branch from 4089780 to a267965 Compare May 14, 2025 16:11
@fwyzard
Copy link
Contributor Author

fwyzard commented May 14, 2025

please test

@cmsbuild
Copy link
Contributor

Pull request #9856 was updated.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46190/summary.html
COMMIT: d19948e
CMSSW: CMSSW_15_1_X_2025-05-08-2300/el10_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9856/46190/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46190/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46190/git-merge-result

Unit Tests

I found 4 errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS
---> test test_edmPickEvents had ERRORS
---> test testAOTTools had ERRORS
and more ...

@iarspider
Copy link
Contributor

@fwyzard FYI - this PR has also reached a limit on number of commit statuses.
@smuzaffar can we make bot post a comment if we reach commit status limit + add a label that bot would detect at start and not waste too many api calls on such PR?

@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

by the way, what does it mean "number of commit statuses" exactly ?

@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

what I mean is, how many "commit statuses" does a result like #9856 (comment) take ?

@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_cuda_12.9.0_for_ARM branch from d19948e to 8801d64 Compare May 16, 2025 11:16
@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

Rebased to include the latest updates.

@cmsbuild
Copy link
Contributor

Pull request #9856 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

please test for el10_amd64_gcc14

@smuzaffar
Copy link
Contributor

by the way, what does it mean "number of commit statuses" exactly ?

@fwyzard , the Pr checks you see at the bottom of PR are commit statuses. e.g. cms/9856/el10_amd64_gcc14/optional is one staus which might have been update multiple times. bot update each status multiple times during PR tests . The combined commit status limit is 1000 per commit. So with many tests and many archs we can reach this limit. We are looking in to other solutions ( e.g. storing the status somewhere else and then update once all tests are done) to avoid this

@smuzaffar
Copy link
Contributor

@smuzaffar can we make bot post a comment if we reach commit status limit + add a label that bot would detect at start and not waste too many api calls on such PR?

No, bot might not have rights to push to user repo/branch. We should work on storing commit status at cmssdt server

@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

No problem.
I'm not complaining, just trying to understand how quickly the tests would reach the limit.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 16, 2025

No, bot might not have rights to push to user repo/branch. We should work on storing commit status at cmssdt server

But how about posting a comment on the GitHub PR ?

@smuzaffar
Copy link
Contributor

But how about posting a comment on the GitHub PR ?

yes that bot can do

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46213/summary.html
COMMIT: 8801d64
CMSSW: CMSSW_15_1_X_2025-05-08-2300/el10_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9856/46213/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46213/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-972068/46213/git-merge-result

Unit Tests

I found 4 errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS
---> test test_edmPickEvents had ERRORS
---> test testAOTTools had ERRORS
and more ...

Comparison Summary

Summary:

  • You potentially added 1131 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 101635 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4015624
  • DQMHistoTests: Total failures: 562507
  • DQMHistoTests: Total nulls: 506
  • DQMHistoTests: Total successes: 3452591
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 432.935 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 13034.0 ): 0.879 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 139.001,... ): 0.001 KiB HLT/Filters
  • DQMHistoSizes: changed ( 140.045,... ): -0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.035 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.014,... ): 6.954 KiB HLT/Filters
  • DQMHistoSizes: changed ( 145.014,... ): 4.246 KiB L1TEMU/L1TdeStage2EMTF
  • DQMHistoSizes: changed ( 145.014,... ): 0.492 KiB L1T/L1TStage2uGT
  • DQMHistoSizes: changed ( 145.408 ): -0.012 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): 0.152 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.713 ): ...
  • Checked 215 log files, 184 edm output root files, 50 DQM output files

@smuzaffar
Copy link
Contributor

merging this as 15.1.0.pre3 and FASTPU flavor has been uploaded

@smuzaffar smuzaffar merged commit d684de6 into cms-sw:IB/CMSSW_15_1_X/master May 17, 2025
8 of 9 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_15_1_X/master_cuda_12.9.0_for_ARM branch May 17, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants