Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 16, 2025

No description provided.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 16, 2025

enable gpu

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_0_X/master.

@cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 16, 2025

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 16, 2025

test parameters:

  • full_cmssw = true
  • enable_tests = gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 16, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6dff/43802/summary.html
COMMIT: 161e7b9
CMSSW: CMSSW_15_0_X_2025-01-16-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9619/43802/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Package Fireworks/TableWidget built
Copying tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaCopyBufferToDeviceROCmAsync/libalpakaCopyBufferToDeviceROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaMoveToDeviceAsyncROCmAsync/libalpakaMoveToDeviceAsyncROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaMoveToDeviceAsyncROCmAsync/libalpakaMoveToDeviceAsyncROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaMoveToDeviceAsyncROCmAsync/libalpakaMoveToDeviceAsyncROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1884: tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaMoveToDeviceAsyncROCmAsync/libalpakaMoveToDeviceAsyncROCmAsync_rocm.a] Error 1
Copying tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestAtomicPairCounterROCmAsync/libalpakaTestAtomicPairCounterROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestAtomicPairCounterROCmAsync/libalpakaTestAtomicPairCounterROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestAtomicPairCounterROCmAsync/libalpakaTestAtomicPairCounterROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1884: tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestAtomicPairCounterROCmAsync/libalpakaTestAtomicPairCounterROCmAsync_rocm.a] Error 1
Copying tmp/el8_amd64_gcc12/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestBufferROCmAsync/libalpakaTestBufferROCmAsync_rocm.a to productstore area:


@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

please test with cms-sw/cmssw#47119

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

please test with cms-sw/cmssw#47119, cms-sw/cmssw#47120

@cmsbuild
Copy link
Contributor

-1

Failed Tests: GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6dff/43807/summary.html
COMMIT: 161e7b9
CMSSW: CMSSW_15_0_X_2025-01-16-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9619/43807/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Unit Tests

I found 4 errors in the following unit tests:

---> test testCUDAService had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS
---> test testCudaDeviceAdditionWrapper had ERRORS
and more ...

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

The error in CUDAService is unrelated to this PR, and is addressed by cms-sw/cmssw#47121 .
The other errors are already present in the IBs, due to the -Wl,--as-needed option, and being addressed separately.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

please test with cms-sw/cmssw#47119, cms-sw/cmssw#47120, cms-sw/cmssw#47121

@cmsbuild
Copy link
Contributor

-1

Failed Tests: GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6dff/43819/summary.html
COMMIT: 161e7b9
CMSSW: CMSSW_15_0_X_2025-01-16-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9619/43819/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6dff/43819/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b6dff/43819/git-merge-result

GPU Unit Tests

I found 3 errors in the following unit tests:

---> test testCudaDeviceAdditionKernel had ERRORS
---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testTorchSimpleDnnCUDA had ERRORS

Comparison Summary

Summary:

  • You potentially removed 120 lines from the logs
  • Reco comparison results: 38297 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3819085
  • DQMHistoTests: Total failures: 95687
  • DQMHistoTests: Total nulls: 35
  • DQMHistoTests: Total successes: 3723343
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 4 / 47 workflows

GPU Comparison Summary

Summary:

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

ignore tests-rejected with ib-failure

@smuzaffar
Copy link
Contributor

smuzaffar commented Jan 17, 2025

@fwyzard , should this be merged along with cmssw PRs or is it safe to integrate it without cmssw PRs?

edited: ah OK , there are build errors without cmssw PRs.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 17, 2025

cms-sw/cmssw#47120 can go first.
Then cms-sw/cmssw#47119 and this PR need to go together.

Or, they can go all three together.

@mandrenguyen
Copy link

cms-sw/cmssw#47120 is merged now.
@smuzaffar I let you sign for externals and then I'll merge this together with cms-sw/cmssw#47119

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_15_0_X/master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmssw#47119

@mandrenguyen
Copy link

+1

@cmsbuild cmsbuild merged commit 1c4a819 into cms-sw:IB/CMSSW_15_0_X/master Jan 21, 2025
13 of 14 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_15_0_X/master_alpaka120 branch January 21, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants