Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jun 26, 2025

This is primarily a bugfix release with a focus on SYCL changes.
It addresses several SYCL-specific issues, including fixes to the parallel loop patterns for the Intel FPGA SYCL backend, corrections to the SYCL index order, and updates to template arguments of SYCL buffer specializations.
Additionally, the release includes changes to SYCL attributes and protections for __SYCL_TARGET macros.
Overall, this release aims to improve stability and performance, particularly for SYCL-related functionalities.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 26, 2025

enable gpu

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_1_X/master.

@akritkbehera, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 26, 2025

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 26, 2025

test parameters:

  • full = true
  • enable = gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 26, 2025

please test

@slava77
Copy link
Contributor

slava77 commented Jun 26, 2025

I was curious to see the new features since 1.2.0.
It looks like https://github.com/cms-externals/alpaka/blob/branch-1.3.0/CHANGELOG.md is still old and there is no official release yet for 1.3.0
https://github.com/alpaka-group/alpaka/releases

Is this PR a pre-release test?

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 26, 2025

Hi @slava77,
this release should come out tomorrow, but I will be away so I prepared the PR in advance :-D
I can update our fork to match the official release once it's out, the only difference should be in the ChangeLog.

alpaka 1.3.0 is a bug-fix release, there are no new features.
You can see the changes since version 1.2.0 here.

alpaka 2.0.0 is the current feature branch, but I already tested that CMSSW does not build cleanly with it, so I will follow up next month.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: cudaUnitTests rocmUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/46932/summary.html
COMMIT: 514f909
CMSSW: CMSSW_15_1_X_2025-06-26-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9944/46932/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/46932/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/46932/git-merge-result

CUDA Unit Tests

I found 1 errors in the following unit tests:

---> test testHeterogeneousCoreCUDACoreStreamEvent had ERRORS

ROCm Unit Tests

I found 3 errors in the following unit tests:

---> test testRocmSoALayoutAndView_t had ERRORS
---> test alpakaTestBufferROCmAsync had ERRORS
---> test alpakaTestRadixSortROCmAsync had ERRORS

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

ROCM Comparison Summary

Summary:

  • You potentially removed 24 lines from the logs
  • Reco comparison results: 41 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53212
  • DQMHistoTests: Total failures: 8614
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 44598
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 27, 2025

Mhm, the ROCm failures are known, but the CUDA one is unexpected.

@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_alpaka_1_3_0 branch from 514f909 to cb6addf Compare July 7, 2025 12:54
@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_alpaka_1_3_0 branch from cb6addf to 06884ba Compare July 7, 2025 12:55
@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2025

Pull request #9944 was updated.

This is primarily a bugfix release with a focus on SYCL changes.

It addresses several SYCL-specific issues, including fixes to the parallel loop
patterns for the Intel FPGA SYCL backend, corrections to the SYCL index order,
and updates to template arguments of SYCL buffer specializations.

Additionally, the release includes changes to SYCL attributes and protections
for __SYCL_TARGET macros.

Overall, this release aims to improve stability and performance, particularly
for SYCL-related functionalities.
@fwyzard fwyzard force-pushed the IB/CMSSW_15_1_X/master_alpaka_1_3_0 branch from 06884ba to 9add34a Compare July 7, 2025 12:56
@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2025

Pull request #9944 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2025

Alpaka 1.3.0 has been officially released, so I've updated the PR to match.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 7, 2025

-1

Failed Tests: UnitTests rocmUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47112/summary.html
COMMIT: 9add34a
CMSSW: CMSSW_15_1_X_2025-07-07-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9944/47112/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test BUFU_TEST had ERRORS

ROCm Unit Tests

I found 0 errors in the following unit tests:


Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

ROCM Comparison Summary

Summary:

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 7, 2025

Both failures seem pretty much unrelated to this PR.

@smuzaffar do you know if this failure

/scratch/cmsbuild/jenkins/workspace/ib-run-pr-unittests/CMSSW_15_1_X_2025-07-07-1100/external/el8_amd64_gcc12/bin/python3: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or directory

means the NGT image or the CI image need to be updated ?

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47937/summary.html
COMMIT: 9add34a
CMSSW: CMSSW_15_1_X_2025-09-01-2300/el8_amd64_gcc12
Additional Tests: GPU,AMD_MI300X,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9944/47937/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47937/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47937/git-merge-result

Comparison Summary

Summary:

AMD_MI300X Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 6, 2025

please test

To check the new GPU architectures, as well.

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 6, 2025

@smuzaffar @iarspider @ftenchini @mandrenguyen as I will miss the release planning meeting next week, let me ask here: can we have this merged for CMSSW 15.1.x to have it in the final release ?
Thanks !

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 7, 2025

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47996/summary.html
COMMIT: 9add34a
CMSSW: CMSSW_15_1_X_2025-09-06-1100/el8_amd64_gcc12
Additional Tests: GPU,AMD_MI300X,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9944/47996/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47996/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0955d5/47996/git-merge-result

Comparison Summary

Summary:

AMD_MI300X Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@iarspider iarspider changed the base branch from IB/CMSSW_15_1_X/master to IB/CMSSW_16_0_X/master September 11, 2025 09:19
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 11, 2025

can we merge this in 16.0.x ?

@iarspider
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_16_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @ftenchini, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@iarspider iarspider merged commit 354d393 into cms-sw:IB/CMSSW_16_0_X/master Sep 11, 2025
33 of 42 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_15_1_X/master_alpaka_1_3_0 branch September 14, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants