-
Notifications
You must be signed in to change notification settings - Fork 206
ROCm: various updates #9843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm: various updates #9843
Conversation
|
enable gpu |
|
please test |
|
A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_1_X/master. @iarspider, @smuzaffar can you please review it and eventually sign? Thanks. |
|
cms-bot internal usage |
|
-1 Failed Tests: RelVals-ROCM rocmUnitTests RelVals-ROCM
ROCm Unit TestsI found 2 errors in the following unit tests: ---> test testRocmSoALayoutAndView_t had ERRORS ---> test alpakaTestBufferROCmAsync had ERRORS Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
CUDA Comparison SummarySummary:
|
|
ignore tests-rejected with ib-failure |
|
The ROCm failures are a known issue. |
38abd56 to
384ba7f
Compare
|
Pull request #9843 was updated. |
|
Rebased after merging #9818. |
|
please test |
|
hold We may need to change the |
|
Pull request has been put on hold by @fwyzard |
|
-1 Failed Tests: RelVals-ROCM rocmUnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: RelVals-ROCM
ROCm Unit TestsI found 3 errors in the following unit tests: ---> test testRocmSoALayoutAndView_t had ERRORS ---> test alpakaTestBufferROCmAsync had ERRORS ---> test alpakaTestRadixSortROCmAsync had ERRORS Comparison SummarySummary:
CUDA Comparison SummarySummary:
|
8587e4b to
215e928
Compare
|
Pull request #9843 was updated. |
|
OK, for the time being I've only added here the minimal set of libraries for UCX and MPI. |
|
please test |
|
-1 Failed Tests: rocmUnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: ROCm Unit TestsI found 3 errors in the following unit tests: ---> test testRocmSoALayoutAndView_t had ERRORS ---> test alpakaTestKernelROCmAsync had ERRORS ---> test alpakaTestBufferROCmAsync had ERRORS Comparison SummarySummary:
CUDA Comparison SummarySummary:
ROCM Comparison SummarySummary:
|
|
ignore tests-rejected with ib-failure |
|
@smuzaffar could you merge these changes ? They are needed to rebase the MPI PR, and to make progress debugging the issues on LUMI. Thanks ! |
|
+externals Thanks @fwyzard for the cleanup up. |
2870eb8
into
cms-sw:IB/CMSSW_15_1_X/master
|
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_15_1_X/master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
Include additional libraries and tools in the ROCm package, needed to build UCX and MPI with ROCm support.
Enable unified memory on Instinct MI100, MI210/250, and MI300 GPUs:
xnacksetting, which supports running withxnackenabled or disabled;xnacksupport settingHSA_XNACK=1.