-
Notifications
You must be signed in to change notification settings - Fork 207
Update CUDA to 12.6.3 and cuDNN to 9.6.0 #9620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CUDA to 12.6.3 and cuDNN to 9.6.0 #9620
Conversation
Drop support for Power, which is no longer supported in CUDA. Package nvidia-smi along with the compatibility drivers.
|
A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_0_X/master. @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks. |
|
cms-bot internal usage |
|
test parameters:
|
|
please test |
|
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-de0284/43803/summary.html External BuildI found compilation warning when building: See details on the summary page. |
|
please test |
|
onnxruntime failed [a]. looks like we need to update onnxruntime microsoft/onnxruntime#20953 |
|
Ah, thanks. |
|
I am updating https://github.com/cms-externals/onnxruntime with latest onnxruntime tag v1.20.1. |
|
please test with #9625 |
|
We need to update also cuDNN, I'll push it here and restart the tests. |
|
Pull request #9620 was updated. |
|
please test with #9625 |
|
please test for el9_amd64_gcc13 |
|
-1 Failed Tests: GpuUnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: GPU Unit TestsI found 2 errors in the following unit tests: ---> test testCudaDeviceAdditionKernel had ERRORS ---> test testCudaDeviceAdditionWrapper had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
|
-1 Failed Tests: Build The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: BuildI found compilation error when building: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/bin/c++ -c -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=130200 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DDD4HEP_USE_GEANT4_UNITS=1 -DCMSSW_GIT_HASH='CMSSW_15_0_X_2025-01-17-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_0_X_2025-01-17-2300' -Isrc -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/dd4hep/v01-29-00-cd7bdac48c4a3f308cf8a3422ae17389/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/pcre/8.43-322dba8751ceabf66cf89a6077221d46/include -isystem/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/boost/1.80.0-ae354c86c9828efe7c6e37f80473137a/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/bz2lib/1.0.6-452db86d4d93dcf1690037af99319ec9/include -isystem/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/clhep/2.4.7.1-dd3c7a054bd6da4e71a71d531b36e267/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gsl/2.6-e1d9acaffa4d45817db870679c9cda91/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/libuuid/2.34-543db95ed1650444408e40e641f34da9/include -isystem/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/lcg/root/6.32.09-ebbf3cd6d7ee0939a46ec35a9d287e9e/include -isystem/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/tbb/v2021.9.0-c1066da3248e24c5fe7a0100f9ac160a/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/xerces-c/3.1.3-e24b915ca5dee2ac502754af591a659e/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/xz/5.2.5-b6caa493ffc2cdd51db397ec2ac3b210/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/zlib/1.2.13-33497fa85fdb17c9984295219fa199e1/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-777da7040cb89c68e8201affdcbf2aca/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-777da7040cb89c68e8201affdcbf2aca/include/eigen3 -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/fmt/10.2.1-d2066ca2935f75b53bf88bf3b87de839/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/md5/1.0.0-2f660b2cdbedb9b324f19c722f640619/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/OpenBLAS/0.3.27-066b1de6ce4eab16e12dc0e36f9bef59/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/tinyxml2/6.2.0-a65222deac81e9aef99925452e933be2/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DEIGEN_DONT_PARALLELIZE -DEIGEN_MAX_ALIGN_BYTES=64 -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalWaferType.cc.d src/Geometry/HGCalCommonData/src/HGCalWaferType.cc -o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalWaferType.cc.o
>> Building shared library tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/libGeometryHGCalCommonData.so
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/bin/c++ -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DEIGEN_DONT_PARALLELIZE -DEIGEN_MAX_ALIGN_BYTES=64 -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -shared -Wl,-E -Wl,-z,defs tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/AHCalParameters.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalCalibrationCell.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalCassette.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalCell.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalCellOffset.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalCellUV.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalDDDConstants.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalGeomParameters.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalGeomRotation.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalGeomTools.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalGeomUtils.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalGeometryMode.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalParameters.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalParametersFromDD.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalProperty.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalTileIndex.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalTypes.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalWaferIndex.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalWaferMask.cc.o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/HGCalWaferType.cc.o -o tmp/el9_amd64_gcc13/src/Geometry/HGCalCommonData/src/GeometryHGCalCommonData/libGeometryHGCalCommonData.so -Wl,-E -Wl,--hash-style=gnu -Wl,--as-needed -L/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_15_0_X_2025-01-17-2300/biglib/el9_amd64_gcc13 -L/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_15_0_X_2025-01-17-2300/lib/el9_amd64_gcc13 -L/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_15_0_X_2025-01-17-2300/external/el9_amd64_gcc13/lib -L/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_15_0_X_2025-01-17-2300/static/el9_amd64_gcc13 -lDetectorDescriptionCore -lDetectorDescriptionDDCMS -lDataFormatsForwardDetId -lDataFormatsGeometryVector -lFWCoreFramework -lDataFormatsDetId -lDataFormatsMath -lFWCoreCommon -lFWCoreServiceRegistry -lDataFormatsCommon -lFWCoreParameterSet -lFWCoreMessageLogger -lDataFormatsProvenance -lFWCoreConcurrency -lFWCorePluginManager -lFWCoreReflection -lFWCoreUtilities -lFWCoreVersion -lDDAlign -lDDCond -lDDCore -lDDParsers -lPhysics -lHist -lMatrix -lGenVector -lMathMore -lTree -lNet -lGeom -lThread -lMathCore -lRIO -lboost_program_options -lCore -lboost_thread -lboost_date_time -lCLHEP -lpcre -lbz2 -lgsl -luuid -ltbb -lxerces-c -llzma -lz -lfmt -lcms-md5 -lopenblas -lcrypt -ldl -lrt -lstdc++fs -ltinyxml2
In member function 'operator[]',
inlined from '__ct_base ' at src/Geometry/HGCalCommonData/src/HGCalCellOffset.cc:601:23:
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:203:24: error: array subscript 20 is above array bounds of 'struct array[6]' [-Werror=array-bounds=]
203 | return _M_elems[__n];
| ^
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array: In member function '__ct_base ':
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:109:55: note: while referencing '_M_elems'
109 | typename __array_traits<_Tp, _Nm>::_Type _M_elems;
|
Is this a pre-existing error ? |
yes @fwyzard ( https://cmssdt.cern.ch/SDT/cgi-bin/showBuildLogs.py/el9_amd64_gcc13/www/fri/15.0-fri-23/CMSSW_15_0_X_2025-01-17-2300 ). |
|
+externals |
|
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_15_0_X/master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
b94980a
into
cms-sw:IB/CMSSW_15_0_X/master
|
@fwyzard , @iarspider noticed that this broke So looks like more and more projects have stop supporting CentOS7. I think it is time for us to drop slc7 support for 15.0.X and above. What do you think @makortel , @antoniovilela, @mandrenguyen, @sextonkennedy ? |
I agree it is time to drop slc7. It just means the work to improve memory monitoring/profiling tooling becomes even more important (since we'll loose the remaining capability to run IgProf memory profiling) |
|
thanks @makortel , I will open a cmssw github issue so that every one knows about slc7 support |
|
@fwyzard , do we need to update https://github.com/cms-patatrack/cuda-compatible-runtime/blob/master/Makefile#L15 to have |
|
Mhm, not sure, but I'll check tomorrow. |
Drop support for Power, which is no longer supported in CUDA, and package
nvidia-smialong with the compatibility drivers.