Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented May 6, 2025

Drop support for the NVTX2 library.

Fix the cmake flags used for PyTorch and its extensions to properly use CUDA, and find the NVTX3 header-only library.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 6, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2025

A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_1_X/master.

@iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2025

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented May 6, 2025

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented May 6, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2025

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-27775d/45884/summary.html
COMMIT: de90d7c
CMSSW: CMSSW_15_1_X_2025-05-06-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9833/45884/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

USE_CUDA


CMake Generate step failed.  Build files cannot be regenerated correctly.
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.LVOdrQ (%build)


RPM build errors:
Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.LVOdrQ (%build)



@cmsbuild
Copy link
Contributor

cmsbuild commented May 7, 2025

Pull request #9833 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 7, 2025

please test

@fwyzard fwyzard changed the title Update CUDA to version 12.9.0 Update CUDA to version 12.9.0 and cuDNN to version 9.9.0 May 7, 2025
@fwyzard
Copy link
Contributor Author

fwyzard commented May 7, 2025

please abort

@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_cuda_12.9.0 branch from 676a036 to bb0902e Compare May 7, 2025 11:46
@cmsbuild
Copy link
Contributor

cmsbuild commented May 7, 2025

Pull request #9833 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented May 7, 2025

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented May 7, 2025

please test for el10_amd64_gcc14

@cmsbuild
Copy link
Contributor

cmsbuild commented May 7, 2025

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-27775d/45920/summary.html
COMMIT: bb0902e
CMSSW: CMSSW_15_1_X_2025-05-06-1100/el10_amd64_gcc14
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9833/45920/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

checking for LIBXML2... yes
checking for libxml/parser.h... no
checking for final LIBXML2 support... no
**** end of libxml2 configuration
configure: WARNING: --enable-libxml2 requested, but libxml2 was not found
configure: error: Cannot continue
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.0Zx50l (%build)

RPM build errors:
Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.0Zx50l (%build)



@fwyzard
Copy link
Contributor Author

fwyzard commented May 7, 2025

@smuzaffar @iarspider is it normal that el10_amd64_gcc14 fails to build ?

@cmsbuild
Copy link
Contributor

-1

Failed Tests: rocmUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-27775d/46063/summary.html
COMMIT: e5a2173
CMSSW: CMSSW_15_1_X_2025-05-12-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9833/46063/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-27775d/46063/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-27775d/46063/git-merge-result

ROCm Unit Tests

I found 2 errors in the following unit tests:

---> test testRocmSoALayoutAndView_t had ERRORS
---> test alpakaTestBufferROCmAsync had ERRORS

Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4038163
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4038140
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

ignore tests-rejected with ib-failure

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el10_amd64_gcc14

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el8_amd64_gcc14

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

please test for el8_aarch64_gcc12

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

OK, assuming the gcc 14 and arm tests pass, this should be ready to be merged.

iarspider added a commit to cms-sw/cms-bot that referenced this pull request May 13, 2025
@iarspider
Copy link
Contributor

@fwyzard running many tests (= test for many architectures) on a single commit SHA causes bot to fail. Can you push a dummy commit (git commit --allow-empty) to have a new SHA and use it to run tests for arm (other architectures are fine, I think)?

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

If I push a new commit it will reset the tests that were already done.
I'd rather open a separate PR for testing on ARM, with the identical content but a different hash.

@smuzaffar
Copy link
Contributor

@fwyzard , each commit has a limit of 1000 statuses and the latest commit here has already reach that limit. So for now bot is not going to update statuses for currently running tests or start new tests. Only thing you can do is to create an dummy commit ( adding a comment somewhere :-) )

@fwyzard
Copy link
Contributor Author

fwyzard commented May 13, 2025

Replaced by #9856 to let the bot run tests again.

@fwyzard fwyzard closed this May 13, 2025
@fwyzard fwyzard deleted the IB/CMSSW_15_1_X/master_cuda_12.9.0 branch May 15, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants