-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Add a test for Alpaka libraries and build rules #44622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
cmsbuild
merged 5 commits into
cms-sw:master
from
fwyzard:implement_HeterogeneousTest_Alpaka_141x
Apr 8, 2024
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
9401edd
Fix names in the ROCm packages documentation
fwyzard 1f84811
Fix typos in the ROCm tests
fwyzard 9e3922b
Fix line wrap
fwyzard d516150
Update "private" namespace names to match package names
fwyzard f963c4c
Add a test for Alpaka libraries and build rules
fwyzard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| <use name="alpaka"/> | ||
| <use name="HeterogeneousCore/AlpakaInterface"/> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Introduction | ||
|
|
||
| The packages `HeterogeneousTest/AlpakaDevice`, `HeterogeneousTest/AlpakaKernel`, | ||
| `HeterogeneousTest/AlpakaWrapper` and `HeterogeneousTest/AlpakaOpaque` implement a set of libraries, | ||
| plugins and tests to exercise the build rules for Alpaka. | ||
| In particular, these tests show what is supported and what are the limitations implementing | ||
| Alpaka-based libraries, and using them from multiple plugins. | ||
|
|
||
|
|
||
| # `HeterogeneousTest/AlpakaDevice` | ||
|
|
||
| The package `HeterogeneousTest/AlpakaDevice` implements a library that defines and exports Alpaka | ||
| device-side functions: | ||
| ```c++ | ||
| namespace cms::alpakatest { | ||
|
|
||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void add_vectors_f(TAcc const& acc, ...); | ||
|
|
||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void add_vectors_d(TAcc const& acc, ...); | ||
|
|
||
| } // namespace cms::alpakatest | ||
| ``` | ||
|
|
||
| The `plugins` directory implements the `AlpakaTestDeviceAdditionModule` `EDAnalyzer` that launches | ||
| an Alpaka kernel using the functions defined in ths library. As a byproduct this plugin also shows | ||
| how to split an `EDAnalyzer` or other framework plugin into a host-only part (in a `.cc` file) and | ||
| a device part (in a `.dev.cc` file). | ||
|
|
||
| The `test` directory implements the `testAlpakaDeviceAddition` binary that launches a Alpaka kernel | ||
| using these functions. | ||
| It also contains the `testAlpakaTestDeviceAdditionModule.py` python configuration to exercise the | ||
| `AlpakaTestDeviceAdditionModule` plugin. | ||
|
|
||
|
|
||
| # Other packages | ||
|
|
||
| For various ways in which this library and plugin can be tested, see also the other | ||
| `HeterogeneousTest/Alpaka...` packages: | ||
| - [`HeterogeneousTest/AlpakaKernel/README.md`](../../HeterogeneousTest/AlpakaKernel/README.md) | ||
| - [`HeterogeneousTest/AlpakaWrapper/README.md`](../../HeterogeneousTest/AlpakaWrapper/README.md) | ||
| - [`HeterogeneousTest/AlpakaOpaque/README.md`](../../HeterogeneousTest/AlpakaOpaque/README.md) | ||
|
|
||
|
|
||
| # Combining plugins | ||
|
|
||
| `HeterogeneousTest/AlpakaOpaque/test` contains the `testAlpakaTestAdditionModules.py` python | ||
| configuration that exercise all four plugins in a single application. |
36 changes: 36 additions & 0 deletions
36
HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| #ifndef HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h | ||
| #define HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h | ||
|
|
||
| #include <cstdint> | ||
|
|
||
| #include <alpaka/alpaka.hpp> | ||
|
|
||
| #include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h" | ||
|
|
||
| namespace cms::alpakatest { | ||
|
|
||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void add_vectors_f(TAcc const& acc, | ||
| float const* __restrict__ in1, | ||
| float const* __restrict__ in2, | ||
| float* __restrict__ out, | ||
| uint32_t size) { | ||
| for (auto i : cms::alpakatools::uniform_elements(acc, size)) { | ||
| out[i] = in1[i] + in2[i]; | ||
| } | ||
| } | ||
|
|
||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void add_vectors_d(TAcc const& acc, | ||
| double const* __restrict__ in1, | ||
| double const* __restrict__ in2, | ||
| double* __restrict__ out, | ||
| uint32_t size) { | ||
| for (auto i : cms::alpakatools::uniform_elements(acc, size)) { | ||
| out[i] = in1[i] + in2[i]; | ||
| } | ||
| } | ||
|
|
||
| } // namespace cms::alpakatest | ||
|
|
||
| #endif // HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| <library file="alpaka/*.cc" name="HeterogeneousTestAlpakaDevicePlugins"> | ||
| <use name="alpaka"/> | ||
| <use name="FWCore/Framework"/> | ||
| <use name="FWCore/ParameterSet"/> | ||
| <use name="FWCore/ServiceRegistry"/> | ||
| <use name="HeterogeneousCore/AlpakaInterface"/> | ||
| <use name="HeterogeneousCore/AlpakaServices"/> | ||
| <use name="HeterogeneousTest/AlpakaDevice"/> | ||
| <flags ALPAKA_BACKENDS="1"/> | ||
| <flags EDM_PLUGIN="1"/> | ||
| </library> |
32 changes: 32 additions & 0 deletions
32
HeterogeneousTest/AlpakaDevice/plugins/alpaka/AlpakaTestDeviceAdditionAlgo.dev.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| #include <cstdint> | ||
|
|
||
| #include <alpaka/alpaka.hpp> | ||
|
|
||
| #include "HeterogeneousCore/AlpakaInterface/interface/config.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h" | ||
| #include "HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h" | ||
|
|
||
| #include "AlpakaTestDeviceAdditionAlgo.h" | ||
|
|
||
| namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins { | ||
|
|
||
| struct KernelAddVectorsF { | ||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void operator()(TAcc const& acc, | ||
| const float* __restrict__ in1, | ||
| const float* __restrict__ in2, | ||
| float* __restrict__ out, | ||
| uint32_t size) const { | ||
| cms::alpakatest::add_vectors_f(acc, in1, in2, out, size); | ||
| } | ||
| }; | ||
|
|
||
| void wrapper_add_vectors_f(Queue& queue, | ||
| const float* __restrict__ in1, | ||
| const float* __restrict__ in2, | ||
| float* __restrict__ out, | ||
| uint32_t size) { | ||
| alpaka::exec<Acc1D>(queue, cms::alpakatools::make_workdiv<Acc1D>(32, 32), KernelAddVectorsF{}, in1, in2, out, size); | ||
| } | ||
|
|
||
| } // namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins | ||
18 changes: 18 additions & 0 deletions
18
HeterogeneousTest/AlpakaDevice/plugins/alpaka/AlpakaTestDeviceAdditionAlgo.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| #ifndef HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h | ||
| #define HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h | ||
|
|
||
| #include <cstdint> | ||
|
|
||
| #include "HeterogeneousCore/AlpakaInterface/interface/config.h" | ||
|
|
||
| namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins { | ||
|
|
||
| void wrapper_add_vectors_f(Queue& queue, | ||
| const float* __restrict__ in1, | ||
| const float* __restrict__ in2, | ||
| float* __restrict__ out, | ||
| uint32_t size); | ||
|
|
||
| } // namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins | ||
|
|
||
| #endif // HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h |
124 changes: 124 additions & 0 deletions
124
HeterogeneousTest/AlpakaDevice/plugins/alpaka/AlpakaTestDeviceAdditionModule.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| #include <cstdint> | ||
| #include <iostream> | ||
| #include <random> | ||
| #include <vector> | ||
|
|
||
| #include <alpaka/alpaka.hpp> | ||
|
|
||
| #include "FWCore/Framework/interface/Event.h" | ||
| #include "FWCore/Framework/interface/Frameworkfwd.h" | ||
| #include "FWCore/Framework/interface/global/EDAnalyzer.h" | ||
| #include "FWCore/ParameterSet/interface/ConfigurationDescriptions.h" | ||
| #include "FWCore/ParameterSet/interface/ParameterSet.h" | ||
| #include "FWCore/ParameterSet/interface/ParameterSetDescription.h" | ||
| #include "FWCore/ServiceRegistry/interface/Service.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/config.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/memory.h" | ||
| #include "HeterogeneousCore/AlpakaServices/interface/alpaka/AlpakaService.h" | ||
|
|
||
| #include "AlpakaTestDeviceAdditionAlgo.h" | ||
|
|
||
| namespace ALPAKA_ACCELERATOR_NAMESPACE { | ||
|
|
||
| class AlpakaTestDeviceAdditionModule : public edm::global::EDAnalyzer<> { | ||
| public: | ||
| explicit AlpakaTestDeviceAdditionModule(edm::ParameterSet const& config); | ||
| ~AlpakaTestDeviceAdditionModule() override = default; | ||
|
|
||
| static void fillDescriptions(edm::ConfigurationDescriptions& descriptions); | ||
|
|
||
| void analyze(edm::StreamID, edm::Event const& event, edm::EventSetup const& setup) const override; | ||
|
|
||
| private: | ||
| const uint32_t size_; | ||
| }; | ||
|
|
||
| AlpakaTestDeviceAdditionModule::AlpakaTestDeviceAdditionModule(edm::ParameterSet const& config) | ||
| : size_(config.getParameter<uint32_t>("size")) {} | ||
|
|
||
| void AlpakaTestDeviceAdditionModule::fillDescriptions(edm::ConfigurationDescriptions& descriptions) { | ||
| edm::ParameterSetDescription desc; | ||
| desc.add<uint32_t>("size", 1024 * 1024); | ||
|
|
||
| // ignore the alpaka = cms.untracked.PSet(...) injected by the framework | ||
| edm::ParameterSetDescription alpaka; | ||
| alpaka.setAllowAnything(); | ||
| desc.addUntracked<edm::ParameterSetDescription>("alpaka", alpaka); | ||
|
|
||
| descriptions.addWithDefaultLabel(desc); | ||
| } | ||
|
|
||
| void AlpakaTestDeviceAdditionModule::analyze(edm::StreamID, | ||
| edm::Event const& event, | ||
| edm::EventSetup const& setup) const { | ||
| // require a valid Alpaka backend for running | ||
| edm::Service<ALPAKA_TYPE_ALIAS(AlpakaService)> service; | ||
| if (not service or not service->enabled()) { | ||
| std::cout << "The " << ALPAKA_TYPE_ALIAS_NAME(AlpakaService) | ||
| << " is not available or disabled, the test will be skipped.\n"; | ||
| return; | ||
| } | ||
|
|
||
| // random number generator with a gaussian distribution | ||
| std::random_device rd{}; | ||
| std::default_random_engine rand{rd()}; | ||
| std::normal_distribution<float> dist{0., 1.}; | ||
|
|
||
| // tolerance | ||
| constexpr float epsilon = 0.000001; | ||
|
|
||
| // allocate input and output host buffers | ||
| std::vector<float> in1_h(size_); | ||
| std::vector<float> in2_h(size_); | ||
| std::vector<float> out_h(size_); | ||
|
|
||
| // fill the input buffers with random data, and the output buffer with zeros | ||
| for (uint32_t i = 0; i < size_; ++i) { | ||
| in1_h[i] = dist(rand); | ||
| in2_h[i] = dist(rand); | ||
| out_h[i] = 0.; | ||
| } | ||
|
|
||
| // run the test on all available devices | ||
| for (auto const& device : cms::alpakatools::devices<Platform>()) { | ||
| Queue queue{device}; | ||
|
|
||
| // allocate input and output buffers on the device | ||
| auto in1_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_); | ||
| auto in2_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_); | ||
| auto out_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_); | ||
|
|
||
| // copy the input data to the device | ||
| // FIXME: pass the explicit size of type uint32_t to avoid compilation error | ||
| // The destination view and the extent are required to have compatible index types! | ||
| alpaka::memcpy(queue, in1_d, in1_h, size_); | ||
| alpaka::memcpy(queue, in2_d, in2_h, size_); | ||
|
|
||
| // fill the output buffer with zeros | ||
| alpaka::memset(queue, out_d, 0); | ||
|
|
||
| // launch the 1-dimensional kernel for vector addition | ||
| HeterogeneousTestAlpakaDevicePlugins::wrapper_add_vectors_f( | ||
| queue, in1_d.data(), in2_d.data(), out_d.data(), size_); | ||
|
|
||
| // copy the results from the device to the host | ||
| alpaka::memcpy(queue, out_h, out_d); | ||
|
|
||
| // wait for all the operations to complete | ||
| alpaka::wait(queue); | ||
|
|
||
| // check the results | ||
| for (uint32_t i = 0; i < size_; ++i) { | ||
| float sum = in1_h[i] + in2_h[i]; | ||
| assert(out_h[i] < sum + epsilon); | ||
| assert(out_h[i] > sum - epsilon); | ||
| } | ||
| } | ||
|
|
||
| std::cout << "All tests passed.\n"; | ||
| } | ||
|
|
||
| } // namespace ALPAKA_ACCELERATOR_NAMESPACE | ||
|
|
||
| #include "HeterogeneousCore/AlpakaCore/interface/alpaka/MakerMacros.h" | ||
| DEFINE_FWK_ALPAKA_MODULE(AlpakaTestDeviceAdditionModule); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| <bin file="alpaka/testDeviceAddition.dev.cc" name="testAlpakaDeviceAddition"> | ||
| <use name="catch2"/> | ||
| <use name="alpaka"/> | ||
| <use name="HeterogeneousTest/AlpakaDevice"/> | ||
| <use name="HeterogeneousCore/AlpakaInterface"/> | ||
| <flags ALPAKA_BACKENDS="1"/> | ||
| </bin> | ||
|
|
||
| <test name="testAlpakaTestDeviceAdditionModule" command="cmsRun ${LOCALTOP}/src/HeterogeneousTest/AlpakaDevice/test/testAlpakaTestDeviceAdditionModule.py"/> |
102 changes: 102 additions & 0 deletions
102
HeterogeneousTest/AlpakaDevice/test/alpaka/testDeviceAddition.dev.cc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| #include <cstdint> | ||
| #include <random> | ||
| #include <vector> | ||
|
|
||
| #define CATCH_CONFIG_MAIN | ||
| #include <catch.hpp> | ||
|
|
||
| #include <alpaka/alpaka.hpp> | ||
|
|
||
| #include "HeterogeneousCore/AlpakaInterface/interface/config.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/devices.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/memory.h" | ||
| #include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h" | ||
| #include "HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h" | ||
|
|
||
| using namespace ALPAKA_ACCELERATOR_NAMESPACE; | ||
|
|
||
| struct KernelAddVectorsF { | ||
| template <typename TAcc> | ||
| ALPAKA_FN_ACC void operator()(TAcc const& acc, | ||
| const float* __restrict__ in1, | ||
| const float* __restrict__ in2, | ||
| float* __restrict__ out, | ||
| uint32_t size) const { | ||
| cms::alpakatest::add_vectors_f(acc, in1, in2, out, size); | ||
| } | ||
| }; | ||
|
|
||
| TEST_CASE("HeterogeneousTest/AlpakaDevice test", "[alpakaTestDeviceAddition]") { | ||
| auto const& devices = cms::alpakatools::devices<Platform>(); | ||
| if (devices.empty()) { | ||
| FAIL("No devices available for the " EDM_STRINGIZE(ALPAKA_ACCELERATOR_NAMESPACE) " backend, " | ||
| "the test will be skipped."); | ||
| } | ||
|
|
||
| // random number generator with a gaussian distribution | ||
| std::random_device rd{}; | ||
| std::default_random_engine rand{rd()}; | ||
| std::normal_distribution<float> dist{0., 1.}; | ||
|
|
||
| // tolerance | ||
| constexpr float epsilon = 0.000001; | ||
|
|
||
| // buffer size | ||
| constexpr uint32_t size = 1024 * 1024; | ||
|
|
||
| // allocate input and output host buffers | ||
| std::vector<float> in1_h(size); | ||
| std::vector<float> in2_h(size); | ||
| std::vector<float> out_h(size); | ||
|
|
||
| // fill the input buffers with random data, and the output buffer with zeros | ||
| for (uint32_t i = 0; i < size; ++i) { | ||
| in1_h[i] = dist(rand); | ||
| in2_h[i] = dist(rand); | ||
| out_h[i] = 0.; | ||
| } | ||
|
|
||
| // run the test on all available devices | ||
| for (auto const& device : cms::alpakatools::devices<Platform>()) { | ||
| SECTION("Test add_vectors_f on " EDM_STRINGIZE(ALPAKA_ACCELERATOR_NAMESPACE) " backend") { | ||
| REQUIRE_NOTHROW([&]() { | ||
| Queue queue{device}; | ||
|
|
||
| // allocate input and output buffers on the device | ||
| auto in1_d = cms::alpakatools::make_device_buffer<float[]>(queue, size); | ||
| auto in2_d = cms::alpakatools::make_device_buffer<float[]>(queue, size); | ||
| auto out_d = cms::alpakatools::make_device_buffer<float[]>(queue, size); | ||
|
|
||
| // copy the input data to the device | ||
| // FIXME: pass the explicit size of type uint32_t to avoid compilation error | ||
| // The destination view and the extent are required to have compatible index types! | ||
| alpaka::memcpy(queue, in1_d, in1_h, size); | ||
| alpaka::memcpy(queue, in2_d, in2_h, size); | ||
|
|
||
| // fill the output buffer with zeros | ||
| alpaka::memset(queue, out_d, 0); | ||
|
|
||
| // launch the 1-dimensional kernel for vector addition | ||
| alpaka::exec<Acc1D>(queue, | ||
| cms::alpakatools::make_workdiv<Acc1D>(32, 32), | ||
| KernelAddVectorsF{}, | ||
| in1_d.data(), | ||
| in2_d.data(), | ||
| out_d.data(), | ||
| size); | ||
|
|
||
| // copy the results from the device to the host | ||
| alpaka::memcpy(queue, out_h, out_d, size); | ||
|
|
||
| // wait for all the operations to complete | ||
| alpaka::wait(queue); | ||
| }()); | ||
|
|
||
| // check the results | ||
| for (uint32_t i = 0; i < size; ++i) { | ||
| float sum = in1_h[i] + in2_h[i]; | ||
| CHECK_THAT(out_h[i], Catch::Matchers::WithinAbs(sum, epsilon)); | ||
| } | ||
| } | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makortel in cases like this, where we have to keep the code inside the
ALPAKA_ACCELERATOR_NAMESPACEbecausewrapper_add_vectors_fis not templated on the accelerator type, would it make sense to useAcc1Ddirectly ?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this could be a way to avoid templating all device code on the accelerator - which in turn could let us move some code back from header files to .dev.cc files...
Opened #44625 to remind us to look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the explicit
Acc1Dwould be an avenue worth of exploring. I'll comment more in #44625.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK :-)
I'm testing the possibility on top of this PR... the CUDA version built fine, but I'm having problems with the ROCm version :-(
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we are building CUDA and ROCm code in
__host__ __device__mode: alpaka functions marked asALPAKA_FN_ACCare marked as__host__ __device__when targetting the CUDA or ROCm backends.This doesn't seem to cause any problems to CUDA, but HIP/ROCm is less forgiving, and fails to link host-side libraries that contain kernels that make calls to
blockIdx.x, etc.cms-sw/cmsdist#9121 should improve the situation, and then #44636 tries to use the
Acc1Dapproach with these packages.