Add ROCm Support in OpenMPI, + removed deprecated `./configure` options by ghyls · Pull Request #10072 · cms-sw/cmsdist

ghyls · 2025-09-12T11:18:49Z

This is required to allow MPI to access memory in AMD GPUs, and in particular to perform RDMA to/from AMD GPUs.

Removed also the following configure options that are deprecated in OpenMPI 5:

--without-psm (PSM has been removed and is no longer supported [1] )
--without-mxm (mxm has been removed, and replaced by UCX support [1] )
--with-verbs=$RDMA_CORE_ROOT (verbs support is provided through UCX now [2] )

cmsbuild · 2025-09-12T11:19:11Z

A new Pull Request was created by @ghyls for branch IB/CMSSW_16_0_X/master.

@akritkbehera, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

cmsbuild · 2025-09-12T11:19:12Z

cms-bot internal usage

fwyzard · 2025-09-12T12:03:30Z

enable gpu

fwyzard · 2025-09-12T12:03:35Z

please test

makortel · 2025-09-12T14:24:31Z

How big is the ROCm distribution (that we include) nowadays? Can you tell how much of that gets used by libraries the OpenMPI gets to depend on?

I'm mildly concerned (to the extent I'd like to at least understand it) of the impact to (the size of) the set of libraries used by production jobs. I see the following components depend on OpenMPI

boost
- I guess the use of MPI is limited to boost_mpi tool, which is not used in CMSSW, but I didn't try to understand if any externals would make use of MPI through boost
hdf5
- used by rivet, yoda, and highfive; used further by herwig7 and professor2
  - used by `GeneratorInterface/{LHEInterface,RivetInterface,Herwig7Interface}
pytorch
- not used in production yet
sherpa
- removal of use of MPI is being discussed in Sherpa related workflows get stuck due to a problem with opening an openmpi session cmssw#45165

I now wonder why hdf5 and pytorch depend on OpenMPI and if that could be turned off, but we can move that discussion into a separate issue.

smuzaffar · 2025-09-12T14:50:41Z

@makortel , currently rocm distribution size is around 3G.
In https://github.com/cms-sw/cmsdist/pull/10056/files#diff-6c5a60ebe90df4a0319bb73aa4561e90267da6781d8efc1315b3cbc0c929078f @iarspider found out that we need to distribute more rocm libs for pytorch rocm. I have not checked yet if we really need those extra rocm libs or not

makortel · 2025-09-12T15:10:17Z

Thanks @smuzaffar. 3 GB doesn't sounds "too bad" (I was remembering O(10 GB)), although it is not small either.

smuzaffar · 2025-09-12T15:19:52Z

@makortel , yes it was between 20-30GB (with full distribution) and then @fwyzard trimmed it down to 3GB to distribute the minimum

fwyzard · 2025-09-12T15:20:05Z

Thanks @smuzaffar. 3 GB doesn't sounds "too bad" (I was remembering O(10 GB)), although it is not small either.

We've been adding and removing pieces of ROCm to try and keep the size under control - but if we include the ML libraries needed by PyTorch, it will likely blow up again.

One option that I plan to start experimenting in the coming weeks is to build ROCm from sources (via https://github.com/ROCm/TheRock) to limit the support to only the GPU architectures that we have a use case for: MI250X (Lumi), MI300X (NGT), Radeon Pro W7800/W7900 (NGT), maybe MI300A if CMS has an allocation on El Captain.

No idea if and how much it will help, though.

makortel · 2025-09-12T16:53:20Z

MI300A if CMS has an allocation on El Captain.

My feeling is that would be unlikely.

cmsbuild · 2025-09-12T22:17:46Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48087/summary.html
COMMIT: 03af1c4
CMSSW: CMSSW_16_0_X_2025-09-12-1100/el8_amd64_gcc12
Additional Tests: GPU,AMD_MI300X,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10072/48087/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@cms-sw openmpi: Set needed env to avoid openmpi calling setenv at runtime #10058

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48087/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48087/git-merge-result

Comparison Summary

Summary:

You potentially added 11 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 4113751
DQMHistoTests: Total failures: 26
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 4113705
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
Checked 215 log files, 184 edm output root files, 50 DQM output files
TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

You potentially added 2 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53486
DQMHistoTests: Total failures: 7411
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 46075
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53486
DQMHistoTests: Total failures: 7843
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 45643
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53486
DQMHistoTests: Total failures: 7089
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 46397
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53486
DQMHistoTests: Total failures: 7684
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 45802
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

mandrenguyen · 2025-09-13T14:43:37Z

+1

smuzaffar · 2025-09-14T20:07:50Z

openmpi.spec

@@ -33,13 +34,11 @@ AUTOMAKE_JOBS=%{compiling_processes} ./autogen.pl
  --disable-mpi-java \
  --with-zlib=$ZLIB_ROOT \
  %{!?without_cuda:--with-cuda=$CUDA_ROOT} \


@ghyls , looks like our openmpi is not built with cuda [a]. I guess as we build on host without cuda installed on system so it could not find libcuda.so. I suggest to change

%{!?without_cuda:--with-cuda=$CUDA_ROOT} \

to

%{!?without_cuda:--with-cuda=$CUDA_ROOT --with-cuda-libdir=$CUDA_ROOT/lib64/stubs}

this way during build/configure it can pickup libcuda.so from stubs.
@fwyzard , do you have any better suggestion?

[a]

checking for MCA component accelerator:cuda compile mode... dso checking if --with-cuda is set... found (/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.9.1-ed601c3aacdd4f0b0abc31ff95aeff6e/include/cuda.h) checking for cuda pkg-config name... /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.9.1-ed601c3aacdd4f0b0abc31ff95aeff6e/lib/pkgconfig/cuda.pc checking if cuda pkg-config module exists... no checking for cuda header at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.9.1-ed601c3aacdd4f0b0abc31ff95aeff6e/include... found checking for cuda library (cuda) in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.9.1-ed601c3aacdd4f0b0abc31ff95aeff6e... not found checking whether CU_MEM_LOCATION_TYPE_HOST_NUMA is declared... yes checking if have cuda support... no checking if MCA component accelerator:cuda can compile... no

If using the stubs works, that could be a good workaround.

An other possibility could be --with-cuda-libdir=$CUDA_ROOT/drivers to pick the compatibility drivers.

Hopefully neither will hard-code the paths in the binary 🤷🏻

@ghyls do you have access to a machine without CUDA, to check if building with either the stub library or the compatibility library works ?

smuzaffar · 2025-09-14T20:31:56Z

@ghyls , can you please also add in the %post section so that these files do not contain build paths [a]

%{relocateConfig}/share/pmix/pmixcc-wrapper-data.txt
%{relocateConfig}/include/pmix/src/include/pmix_config.h

[a]

include/pmix/src/include/pmix_config.h:#define PMIX_CONFIGURE_CLI " \'--disable-option-checking\' \'--prefix=/build/muz/d/w/tmp/BUILDROOT/35bea6272e2cbe86c6340f0b67a6dd4f/opt/cmssw/el8_amd64_gcc12/external/openmpi/5.0.8-35bea6272e2cbe86c6340f0b67a6dd4f\' \'--without-tests-examples\'  ......

share/pmix/pmixcc-wrapper-data.txt:preprocessor_flags=-I${includedir} -I${includedir}/pmix  -I/build/muz/d/w/el8_amd64_gcc12/external/hwloc/2.12.2-412235d09f67c300aa559077883316ba/include

cmsbuild · 2025-09-15T11:48:11Z

REMINDER @mandrenguyen, @sextonkennedy, @ftenchini: This PR was tested with cms-sw/cms-bot#2567, please check if they should be merged together

cmsbuild · 2025-10-15T08:39:21Z

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48689/summary.html
COMMIT: 7d49d3d
CMSSW: CMSSW_16_0_X_2025-10-14-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10072/48689/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

Lustre: no (not found)
PVFS2/OrangeFS: no

+ --with-rocm=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/rocm/6.4.3-8bc52e5de186aa7fa61c7d17f290f0df --with-hwloc=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/hwloc/2.12.2-0e4be55b06015a70e96883ca65eb3e61 --with-ofi=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/libfabric/2.1.0-8de1033f0b20ec964002c4f73fa267b0 --without-portals4 --without-psm2 --with-ucx=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/ucx/1.19.0-68aa36405ac1a03bc7eb47fd8708d9a7 --with-cma --without-knem --with-xpmem=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/xpmem/v2.6.3-20220308-9b40da6112cf24c0bcdb5df4d025e6d1 --with-pic --disable-io-romio --with-gnu-ld --with-pmix=internal
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.GdilVg: line 61: --with-rocm=/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/rocm/6.4.3-8bc52e5de186aa7fa61c7d17f290f0df: No such file or directory
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.GdilVg (%prep)

RPM build warnings:
Macro expanded in comment on line 406: %{pkginstroot}

Macro expanded in comment on line 407: %{pkginstroot}

smuzaffar · 2025-10-15T08:43:00Z

openmpi.spec

  --disable-mpi-java \
  --with-zlib=$ZLIB_ROOT \
-  %{!?without_cuda:--with-cuda=$CUDA_ROOT} \
+  %{!?without_cuda:--with-cuda=$CUDA_ROOT --with-cuda-libdir=$CUDA_ROOT/lib64/stubs}


@ghyls , please add \ at the end here

…n openmpi 5.x

cmsbuild · 2025-10-15T08:48:54Z

Pull request #10072 was updated.

smuzaffar · 2025-10-15T08:51:55Z

please test

cmsbuild · 2025-10-18T08:51:32Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48690/summary.html
COMMIT: dbe497f
CMSSW: CMSSW_16_0_X_2025-10-14-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10072/48690/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 4 lines to the logs
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3940073
DQMHistoTests: Total failures: 33
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3940020
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
Checked 218 log files, 188 edm output root files, 51 DQM output files
TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

You potentially added 12 lines to the logs
Reco comparison results: 237 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 147869
DQMHistoTests: Total failures: 34907
DQMHistoTests: Total nulls: 11
DQMHistoTests: Total successes: 112951
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 42 log files, 45 edm output root files, 11 DQM output files
TriggerResults: found differences in 1 / 10 workflows

NVIDIA_H100 Comparison Summary

Summary:

You potentially removed 15 lines from the logs
Reco comparison results: 214 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 147869
DQMHistoTests: Total failures: 32127
DQMHistoTests: Total nulls: 9
DQMHistoTests: Total successes: 115733
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 42 log files, 45 edm output root files, 11 DQM output files
TriggerResults: found differences in 1 / 10 workflows

NVIDIA_L40S Comparison Summary

Summary:

You potentially added 28 lines to the logs
Reco comparison results: 255 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 147869
DQMHistoTests: Total failures: 26932
DQMHistoTests: Total nulls: 9
DQMHistoTests: Total successes: 120928
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 42 log files, 45 edm output root files, 11 DQM output files
TriggerResults: found differences in 1 / 10 workflows

NVIDIA_T4 Comparison Summary

Summary:

You potentially removed 3 lines from the logs
Reco comparison results: 265 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 147869
DQMHistoTests: Total failures: 28752
DQMHistoTests: Total nulls: 8
DQMHistoTests: Total successes: 119109
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 42 log files, 45 edm output root files, 11 DQM output files
TriggerResults: no differences found

smuzaffar · 2025-10-20T13:06:45Z

please test for el8_aarch64_gcc13

smuzaffar · 2025-10-20T13:12:00Z

with latest changes, now openmpi has both cuda and rocm enabled

checking if have cuda support... yes (-I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc13/external/cuda/12.9.1-cff83d5f72da96ebfea8cafd87a05296/include)
checking if MCA component accelerator:cuda can compile... yes
.....
checking for hip/hip_runtime.h... yes
checking for hipFree... yes
checking if rocm requires libnl v1 or v3... none
checking if MCA component accelerator:rocm can compile... yes
...

Accelerators
-----------------------
CUDA support: yes
ROCm support: yes

cmsbuild · 2025-10-20T20:47:43Z

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48729/summary.html
COMMIT: dbe497f
CMSSW: CMSSW_16_0_X_2025-10-19-2300/el8_aarch64_gcc13
Additional Tests: GPU,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10072/48729/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@fwyzard Include cmake files in Boost installation #10126

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48729/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48729/git-merge-result

RelVals

17034.0A fatal system signal has occurred: segmentation violation

makortel · 2025-10-20T21:01:44Z

17034.0A fatal system signal has occurred: segmentation violation

The stack trace looks worrisome

[arm-cmsbuild002:1810326:0:1810326] Caught signal 7 (Bus error: invalid address alignment)

Thread 1 (Thread 0x400018bf63f0 (LWP 1810326) "cmsRun"):
#0  0x0000400019b11a64 in poll () from /lib64/libc.so.6
#1  0x0000400022178f24 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10072/48729/CMSSW_16_0_X_2025-10-19-2300/lib/el8_aarch64_gcc13/pluginFWCoreServicesPlugins.so
#2  0x0000400022179154 in sig_dostack_then_abort () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10072/48729/CMSSW_16_0_X_2025-10-19-2300/lib/el8_aarch64_gcc13/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00004000197118c4 in aarch64_fallback_frame_state (context=0x4000ebf50c20, fs=0x4000ebf50fe0) at ./md-unwind-support.h:74
#5  uw_frame_state_for (context=context@entry=0x4000ebf50c20, fs=fs@entry=0x4000ebf50fe0) at ../../../libgcc/unwind-dw2.c:1013
#6  0x0000400019713188 in _Unwind_Backtrace (trace=0x400019b26fa8 <backtrace_helper>, trace_argument=0x4000ebf513e0) at ../../../libgcc/unwind.inc:303
#7  0x0000400019b27174 in backtrace () from /lib64/libc.so.6
#8  0x00004000eaba1d98 in ucs_debug_backtrace_create (strip=2, bckt=0x4000ebf51470) at debug/debug.c:600
#9  ucs_debug_backtrace_create (bckt=0x4000ebf51470, strip=2) at debug/debug.c:589
#10 0x00004000eaba2028 in ucs_debug_print_backtrace (stream=0x400019bb83f8 <_IO_2_1_stderr_>, strip=strip@entry=2) at debug/debug.c:659
#11 0x00004000eaba4070 in ucs_handle_error (message=0x4000eabc43a0 "invalid address alignment") at debug/debug.c:1092
#12 0x00004000eaba41c8 in ucs_debug_handle_error_signal (signo=signo@entry=7, cause=0x4000eabc43a0 "invalid address alignment", fmt=fmt@entry=0x4000eabc1570 "") at debug/debug.c:1044
#13 0x00004000eaba4528 in ucs_error_signal_handler (signo=7, info=0x4000ebf517a0, context=<optimized out>) at debug/debug.c:1060
#14 <signal handler called>
#15 0x000000007dd9d76d in ?? ()
#16 0x0000400029c2f158 in ?? ()
#17 0x0000fffff5368c70 in ?? ()

Current Modules:
Module: NoBPTXMonitor:hltNoBPTXL2Mu40Monitoring (crashed)

(also connects to cms-sw/cmssw#48940)

fwyzard · 2025-10-21T05:03:20Z

Is it related to these changes ?
There is no ROCm on ARM 🤷🏻‍♂️

fwyzard · 2025-10-21T05:34:46Z

For what is worth, running the workflow locally on lxplus-arm it passed:

[2025-10-21 07:08:11] fwyzard@lxplus9102:/tmp/fwyzard/CMSSW_16_0_X_2025-10-19-2300/run$ runTheMatrix.py -l 17034.0
...
17034.0_TTbar_14TeV+2025PU Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Tue Oct 21 07:28:12 2025-date Tue Oct 21 07:09:01 2025; exit: 0 0 0 0
1 1 1 1 tests passed, 0 0 0 0 failed

[2025-10-21 07:28:15] fwyzard@lxplus9102:/tmp/fwyzard/CMSSW_16_0_X_2025-10-19-2300/run$ uname -a
Linux lxplus9102.cern.ch 5.14.0-570.46.1.el9_6.aarch64 #1 SMP PREEMPT_DYNAMIC Tue Sep 16 08:29:52 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

fwyzard · 2025-10-21T06:57:47Z

please test for el8_aarch64_gcc13

cmsbuild · 2025-10-21T10:58:23Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a49e38/48747/summary.html
COMMIT: dbe497f
CMSSW: CMSSW_16_0_X_2025-10-20-2300/el8_aarch64_gcc13
Additional Tests: GPU,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10072/48747/install.sh to create a dev area with all the needed externals and cmssw changes.

makortel · 2025-10-21T18:27:17Z

Is it related to these changes ?

Hard to say because there is very little useful information of the problem itself, but I'd guess probably not. Although there appear to be more changes in the build options than just the optional ROCm flags (but I don't know their impact).

makortel · 2025-10-21T18:29:08Z

Actually on a further thought the stack trace in #10072 (comment) is probably "along expectations" given cms-sw/cmssw#48940

smuzaffar · 2025-10-21T19:53:30Z

+extrnals

cmsbuild added tests-pending externals-pending pending-signatures orp-pending labels Sep 12, 2025

cmsbuild added tests-started and removed tests-pending labels Sep 12, 2025

cmsbuild added tests-approved and removed tests-started labels Sep 12, 2025

cmsbuild added orp-approved and removed orp-pending labels Sep 13, 2025

smuzaffar reviewed Sep 14, 2025

View reviewed changes

smuzaffar mentioned this pull request Sep 15, 2025

Fail the run-pr-external_checks if it can not find cmssw-tool-conf cms-sw/cms-bot#2567

Merged

smuzaffar mentioned this pull request Sep 15, 2025

Testing bot if it can not find cmssw-tool-conf cms-sw/cms-bot#2568

Closed

cmsbuild added tests-pending and removed tests-approved orp-approved labels Oct 13, 2025

cmsbuild added the tests-started label Oct 15, 2025

cmsbuild added tests-rejected and removed tests-started labels Oct 15, 2025

smuzaffar reviewed Oct 15, 2025

View reviewed changes

Added rocm support in openmpi, removed configure options deprecated i…

dbe497f

…n openmpi 5.x

ghyls force-pushed the devel-ompi-rocm branch from 7d49d3d to dbe497f Compare October 15, 2025 08:48

cmsbuild added tests-pending and removed tests-rejected labels Oct 15, 2025

cmsbuild added tests-started and removed tests-pending labels Oct 15, 2025

cmsbuild added tests-approved and removed tests-started labels Oct 18, 2025

makortel mentioned this pull request Oct 21, 2025

libucs signal handler cms-sw/cmssw#48940

Closed

smuzaffar merged commit f9eceec into cms-sw:IB/CMSSW_16_0_X/master Oct 21, 2025
28 checks passed

cmsbuild mentioned this pull request Oct 22, 2025

Remove error/warning flag for missing braces #10141

Merged

Conversation

ghyls commented Sep 12, 2025

Uh oh!

cmsbuild commented Sep 12, 2025

Uh oh!

cmsbuild commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fwyzard commented Sep 12, 2025

Uh oh!

fwyzard commented Sep 12, 2025

Uh oh!

makortel commented Sep 12, 2025

Uh oh!

smuzaffar commented Sep 12, 2025

Uh oh!

makortel commented Sep 12, 2025

Uh oh!

smuzaffar commented Sep 12, 2025

Uh oh!

fwyzard commented Sep 12, 2025

Uh oh!

makortel commented Sep 12, 2025

Uh oh!

cmsbuild commented Sep 12, 2025

Comparison Summary

AMD_MI300X Comparison Summary

NVIDIA_H100 Comparison Summary

NVIDIA_L40S Comparison Summary

NVIDIA_T4 Comparison Summary

Uh oh!

mandrenguyen commented Sep 13, 2025

Uh oh!

smuzaffar Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

fwyzard Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

smuzaffar commented Sep 14, 2025

Uh oh!

cmsbuild commented Sep 15, 2025

Uh oh!

cmsbuild commented Oct 15, 2025

External Build

Uh oh!

smuzaffar Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

cmsbuild commented Oct 15, 2025

Uh oh!

smuzaffar commented Oct 15, 2025

Uh oh!

cmsbuild commented Oct 18, 2025

Comparison Summary

AMD_W7900 Comparison Summary

NVIDIA_H100 Comparison Summary

NVIDIA_L40S Comparison Summary

NVIDIA_T4 Comparison Summary

Uh oh!

smuzaffar commented Oct 20, 2025

Uh oh!

smuzaffar commented Oct 20, 2025

Uh oh!

cmsbuild commented Oct 20, 2025

RelVals

Uh oh!

makortel commented Oct 20, 2025

Uh oh!

fwyzard commented Oct 21, 2025

Uh oh!

fwyzard commented Oct 21, 2025

Uh oh!

fwyzard commented Oct 21, 2025

Uh oh!

cmsbuild commented Oct 21, 2025

Uh oh!

makortel commented Oct 21, 2025

Uh oh!

makortel commented Oct 21, 2025

cmsbuild commented Sep 12, 2025 •

edited

Loading