[alpaka] Support CUDA or ROCm/HIP #342

fwyzard · 2022-03-15T23:49:35Z

Implement the changes for building the alpakatest and alpaka applications with support for either of CUDA or HPI/ROCm.

Host code changes

Add Cuda and Hip types

Since I hope to be able to enable both CUDA and HIP/ROCm at some point in the future, I've decided to split already now the relevant Alpaka types.
Alpaka itself does a mixed effort on this:

the "accelerator" types are already available as two distinct types
some other types are just using aliases for the common type
some other types are not available at all

I've added these last ones in src/alpaka/alpaka/alpakaExtra.hpp, with the intention of moving them into Alpaka itself sooner or later.

Replace the use of the UniformCudaHipRt types with the explicit CudaRt types
Add similar code paths and definitions for the HipRt types

I've duplicated all CUDA-specific code, that was using either the alpaka_cuda_async namespace or the ALPAKA_ACC_GPU_CUDA_ENABLED macro with HIP/ROCm equivalent code, using the alpaka_rocm_async namespace and the ALPAKA_ACC_GPU_HIP_ENABLED macro.

Update the command line options in main.cc and the list of plugins
Update the code under the .../alpaka/... folders

Mostly, I've changed

#ifdef ALPAKA_ACC_GPU_CUDA_ASYNC_BACKEND

to

#if defined(ALPAKA_ACC_GPU_CUDA_ASYNC_BACKEND) || defined(ALPAKA_ACC_GPU_HIP_ASYNC_BACKEND)

Unrelated changes

There are also some unrelated changes due to clang complaining about implicitly-deleted default constructors, the inappropriate use of std::move, and some missing casts.

Device code changes

The two places with a lot of changes are src/alpaka/AlpakaCore/prefixScan.h and src/alpaka/AlpakaCore/radixSort.h:

HIP does not support the masked warp instructions like __shfl_up_sync, it still has the pre-CUDA 9 versions like __shfl_up, so I've #ifdefed them... eventually the whole code should be rewritten using the primitives provided by Alpaka, and benchmarked to make sure that does not introduce any regressions.

I've also added (unconditionally) the memory fence from #210; this too should be benchmarked to check the impact on the CUDA implementation.

fwyzard · 2022-03-16T07:52:01Z

$ ./hip --numberOfThreads 20 --numberOfStreams 20 --maxEvents 2000 --validation
Found 1 devices
Processing 2000 events, of which 20 concurrently, with 20 threads.
CountValidator: all 2000 events passed validation
 Average relative track difference 0.000922907 (all within tolerance)
 Average absolute vertex difference 0.0005 (all within tolerance)
Processed 2000 events in 6.039834e+00 seconds, throughput 331.135 events/s, CPU usage per thread: 31.2%

Makefile

fwyzard · 2022-03-16T18:02:49Z

Now supports building either CUDA or ROCm/HIP:

$ make -j`nproc` alpaka ROCM_BASE= CUDA_BASE=/usr/local/cuda-11.5
...

$ source env.sh
$ ./alpaka --cuda --numberOfThreads 20 --numberOfStreams 20 --maxEvents 2000
Found 1 device:
  - NVIDIA GeForce GTX 1080 Ti
Processing 2000 events, of which 20 concurrently, with 20 threads.
Processed 2000 events in 2.096939e+00 seconds, throughput 953.771 events/s, CPU usage per thread: 64.1%

$ make clean
rm -fR /data/user/fwyzard/pixeltrack-standalone/lib /data/user/fwyzard/pixeltrack-standalone/obj /data/user/fwyzard/pixeltrack-standalone/test alpaka alpakatest cuda cudacompat cudadev cudatest cudauvm fwtest hip hiptest kokkos kokkostest serial sycltest
$ rm env.sh 
$ make -j`nproc` alpaka ROCM_BASE=/opt/rocm-5.0.2 CUDA_BASE=
...

$ source env.sh
$ ./alpaka --hip --numberOfThreads 20 --numberOfStreams 20 --maxEvents 2000
Found 1 device:
  - Radeon Pro WX 9100
Processing 2000 events, of which 20 concurrently, with 20 threads.
Processed 2000 events in 4.311149e+00 seconds, throughput 463.913 events/s, CPU usage per thread: 73.1%

fwyzard · 2022-03-18T12:56:00Z

@makortel this PR has grown to be quite large... let me know if you would rather have it split into smaller ones.

fwyzard · 2022-03-18T12:56:54Z

By the way, I've tested that kokkostest builds and run, but I could not get kokkos to build, as it would get stuck while compiling or linking some of the tests.

makortel · 2022-03-18T14:26:07Z

By the way, I've tested that kokkostest builds and run, but I could not get kokkos to build, as it would get stuck while compiling or linking some of the tests.

The compilation kokkos program taking outrageously long for HIP is a known problem, see #178 (comment) and the following discussion (has been reported to Kokkos, link at the bottom of the issue).

fwyzard · 2022-03-18T14:30:00Z

OK, then I won't worry about it.

makortel · 2022-03-18T14:31:05Z

this PR has grown to be quite large... let me know if you would rather have it split into smaller ones.

Looking at the commits I think splitting this PR into three could be worth it

Update Makefile, hip', and hiptest` (first three commits)
Update Alpaka external and `alpakatest (next two commits)
Add CUDA xor ROCm/HIP support for alpaka and alpakatest (last two commits)

fwyzard · 2022-03-18T22:17:17Z

OK, I've split it into

This PR needs to be merged after #347.

fwyzard · 2022-03-19T09:41:27Z

src/alpaka/AlpakaCore/radixSort.h

+          // cms-patatrack/pixeltrack-standalone#210
+          alpaka::mem_fence(acc, alpaka::memory_scope::Grid{});


We should benchmark the changes on an NVIDIA GPU to see if this has any negative impact on the performance there.

src/alpaka/AlpakaCore/alpakaMemory.h

makortel · 2022-03-21T20:16:19Z

src/alpaka/Makefile


+$$($(1)_ROCM_LIB): $$($(1)_ROCM_OBJ) $$(foreach dep,$(EXTERNAL_DEPENDS_H),$$($$(dep)_DEPS)) $$(foreach lib,$$($(1)_DEPENDS),$$($$(lib)_LIB)) $$(foreach lib,$$($(1)_DEPENDS),$$($$(lib)_ROCM_LIB))
+	@[ -d $$(@D) ] || mkdir -p $$(@D)
+	$(CXX) $$($(1)_ROCM_OBJ) $(LDFLAGS) -shared $(SO_LDFLAGS) $(LIB_LDFLAGS) $$(foreach lib,$$($(1)_DEPENDS),$$($$(lib)_LDFLAGS)) $$(foreach lib,$$($(1)_DEPENDS),$$($$(lib)_ROCM_LDFLAGS)) $$(foreach dep,$(EXTERNAL_DEPENDS),$$($$(dep)_LDFLAGS)) -o $$@


Oh nice, linking object files with ROCm device code works automatically with the host compiler.

I guess that's the case as long as we don't use separate compilation (-fno-gpu-rdc).
If we switch on -fgpu-rdc we probably need some special support.

On the other hand, with Alpaka (almost) all device code will end up in header files, templated on the accelerator type.
So maybe we can get rid of separable compilation for CUDA as well ?

Allow building the "alpakatest" application with support for either of CUDA or ROCm/HIP.

Allow building the "alpaka" application with support for either of CUDA or ROCm/HIP.

fwyzard · 2022-03-22T07:33:30Z

Rebased, and squashed the clang-format changes.

makortel · 2022-03-22T15:22:12Z

Here is a comparison on V100 (1 set of 1 minute jobs)

I'm running another test with longer jobs and repetitions.

makortel · 2022-03-22T18:27:58Z

Here is a comparison on V100 with 4 2-minutes jobs

Within the statistical uncertainty (few %)

fwyzard · 2022-03-24T09:50:06Z

No differences on a T4, either:

fwyzard marked this pull request as draft March 15, 2022 23:49

fwyzard added the alpaka label Mar 16, 2022

fwyzard force-pushed the alpaka_HIP_support branch from 6e7af4a to a29359b Compare March 16, 2022 09:27

makortel reviewed Mar 16, 2022

View reviewed changes

Makefile Outdated Show resolved Hide resolved

fwyzard force-pushed the alpaka_HIP_support branch 2 times, most recently from d2c175e to f7829b6 Compare March 16, 2022 17:55

fwyzard changed the title ~~[alpaka] Code changes to support ROCm *instead* of CUDA~~ [alpaka] Support either CUDA or ROCm/HIP Mar 16, 2022

fwyzard marked this pull request as ready for review March 16, 2022 17:59

fwyzard mentioned this pull request Mar 16, 2022

Support CUDA and HIP/ROCm at the same time alpaka-group/alpaka#1613

Open

fwyzard force-pushed the alpaka_HIP_support branch from f7829b6 to 5fd1098 Compare March 18, 2022 12:46

fwyzard changed the title ~~[alpaka] Support either CUDA or ROCm/HIP~~ Update ROCm support Mar 18, 2022

fwyzard requested a review from makortel March 18, 2022 12:55

fwyzard force-pushed the alpaka_HIP_support branch from 5fd1098 to c17df5a Compare March 18, 2022 22:14

fwyzard changed the title ~~Update ROCm support~~ [alpaka] Support CUDA or ROCm/HIP Mar 18, 2022

fwyzard self-assigned this Mar 18, 2022

fwyzard commented Mar 19, 2022

View reviewed changes

makortel reviewed Mar 21, 2022

View reviewed changes

makortel mentioned this pull request Mar 21, 2022

Rename TARGETS_HIP to TARGETS_ROCM in Makefile #349

Merged

fwyzard added 2 commits March 22, 2022 08:29

[alpakatest] Support CUDA or ROCm/HIP

7cbfa3d

Allow building the "alpakatest" application with support for either of CUDA or ROCm/HIP.

[alpaka] Support CUDA or ROCm/HIP

23428cc

Allow building the "alpaka" application with support for either of CUDA or ROCm/HIP.

fwyzard force-pushed the alpaka_HIP_support branch from c24ea5f to 23428cc Compare March 22, 2022 07:33

fwyzard merged commit f48b21c into cms-patatrack:master Mar 24, 2022

fwyzard deleted the alpaka_HIP_support branch March 24, 2022 09:52

		// cms-patatrack/pixeltrack-standalone#210
		alpaka::mem_fence(acc, alpaka::memory_scope::Grid{});

[alpaka] Support CUDA or ROCm/HIP #342

[alpaka] Support CUDA or ROCm/HIP #342

Uh oh!

Conversation

fwyzard commented Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Host code changes

Device code changes

Uh oh!

fwyzard commented Mar 16, 2022

Uh oh!

Uh oh!

fwyzard commented Mar 16, 2022

Uh oh!

fwyzard commented Mar 18, 2022

Uh oh!

fwyzard commented Mar 18, 2022

Uh oh!

makortel commented Mar 18, 2022

Uh oh!

fwyzard commented Mar 18, 2022

Uh oh!

makortel commented Mar 18, 2022

Uh oh!

fwyzard commented Mar 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fwyzard Mar 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

makortel Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

fwyzard Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

fwyzard Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

fwyzard commented Mar 22, 2022

Uh oh!

makortel commented Mar 22, 2022

Uh oh!

makortel commented Mar 22, 2022

Uh oh!

fwyzard commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fwyzard commented Mar 15, 2022 •

edited

Loading

fwyzard commented Mar 18, 2022 •

edited

Loading

fwyzard commented Mar 24, 2022 •

edited

Loading