Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions HeterogeneousTest/AlpakaDevice/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<use name="alpaka"/>
<use name="HeterogeneousCore/AlpakaInterface"/>
49 changes: 49 additions & 0 deletions HeterogeneousTest/AlpakaDevice/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Introduction

The packages `HeterogeneousTest/AlpakaDevice`, `HeterogeneousTest/AlpakaKernel`,
`HeterogeneousTest/AlpakaWrapper` and `HeterogeneousTest/AlpakaOpaque` implement a set of libraries,
plugins and tests to exercise the build rules for Alpaka.
In particular, these tests show what is supported and what are the limitations implementing
Alpaka-based libraries, and using them from multiple plugins.


# `HeterogeneousTest/AlpakaDevice`

The package `HeterogeneousTest/AlpakaDevice` implements a library that defines and exports Alpaka
device-side functions:
```c++
namespace cms::alpakatest {

template <typename TAcc>
ALPAKA_FN_ACC void add_vectors_f(TAcc const& acc, ...);

template <typename TAcc>
ALPAKA_FN_ACC void add_vectors_d(TAcc const& acc, ...);

} // namespace cms::alpakatest
```

The `plugins` directory implements the `AlpakaTestDeviceAdditionModule` `EDAnalyzer` that launches
an Alpaka kernel using the functions defined in ths library. As a byproduct this plugin also shows
how to split an `EDAnalyzer` or other framework plugin into a host-only part (in a `.cc` file) and
a device part (in a `.dev.cc` file).

The `test` directory implements the `testAlpakaDeviceAddition` binary that launches a Alpaka kernel
using these functions.
It also contains the `testAlpakaTestDeviceAdditionModule.py` python configuration to exercise the
`AlpakaTestDeviceAdditionModule` plugin.


# Other packages

For various ways in which this library and plugin can be tested, see also the other
`HeterogeneousTest/Alpaka...` packages:
- [`HeterogeneousTest/AlpakaKernel/README.md`](../../HeterogeneousTest/AlpakaKernel/README.md)
- [`HeterogeneousTest/AlpakaWrapper/README.md`](../../HeterogeneousTest/AlpakaWrapper/README.md)
- [`HeterogeneousTest/AlpakaOpaque/README.md`](../../HeterogeneousTest/AlpakaOpaque/README.md)


# Combining plugins

`HeterogeneousTest/AlpakaOpaque/test` contains the `testAlpakaTestAdditionModules.py` python
configuration that exercise all four plugins in a single application.
36 changes: 36 additions & 0 deletions HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#ifndef HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h
#define HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h

#include <cstdint>

#include <alpaka/alpaka.hpp>

#include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h"

namespace cms::alpakatest {

template <typename TAcc>
ALPAKA_FN_ACC void add_vectors_f(TAcc const& acc,
float const* __restrict__ in1,
float const* __restrict__ in2,
float* __restrict__ out,
uint32_t size) {
for (auto i : cms::alpakatools::uniform_elements(acc, size)) {
out[i] = in1[i] + in2[i];
}
}

template <typename TAcc>
ALPAKA_FN_ACC void add_vectors_d(TAcc const& acc,
double const* __restrict__ in1,
double const* __restrict__ in2,
double* __restrict__ out,
uint32_t size) {
for (auto i : cms::alpakatools::uniform_elements(acc, size)) {
out[i] = in1[i] + in2[i];
}
}

} // namespace cms::alpakatest

#endif // HeterogeneousTest_AlpakaDevice_interface_alpaka_DeviceAddition_h
11 changes: 11 additions & 0 deletions HeterogeneousTest/AlpakaDevice/plugins/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<library file="alpaka/*.cc" name="HeterogeneousTestAlpakaDevicePlugins">
<use name="alpaka"/>
<use name="FWCore/Framework"/>
<use name="FWCore/ParameterSet"/>
<use name="FWCore/ServiceRegistry"/>
<use name="HeterogeneousCore/AlpakaInterface"/>
<use name="HeterogeneousCore/AlpakaServices"/>
<use name="HeterogeneousTest/AlpakaDevice"/>
<flags ALPAKA_BACKENDS="1"/>
<flags EDM_PLUGIN="1"/>
</library>
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include <cstdint>

#include <alpaka/alpaka.hpp>

#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
#include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h"
#include "HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h"

#include "AlpakaTestDeviceAdditionAlgo.h"

namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins {

struct KernelAddVectorsF {
template <typename TAcc>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
Comment on lines +14 to +15
Copy link
Contributor Author

@fwyzard fwyzard Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makortel in cases like this, where we have to keep the code inside the ALPAKA_ACCELERATOR_NAMESPACE because wrapper_add_vectors_f is not templated on the accelerator type, would it make sense to use Acc1D directly ?

Suggested change
template <typename TAcc>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
ALPAKA_FN_ACC void operator()(Acc1D const& acc,

Copy link
Contributor Author

@fwyzard fwyzard Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this could be a way to avoid templating all device code on the accelerator - which in turn could let us move some code back from header files to .dev.cc files...

Opened #44625 to remind us to look into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the explicit Acc1D would be an avenue worth of exploring. I'll comment more in #44625.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK :-)
I'm testing the possibility on top of this PR... the CUDA version built fine, but I'm having problems with the ROCm version :-(

Copy link
Contributor Author

@fwyzard fwyzard Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we are building CUDA and ROCm code in __host__ __device__ mode: alpaka functions marked as ALPAKA_FN_ACC are marked as __host__ __device__ when targetting the CUDA or ROCm backends.

This doesn't seem to cause any problems to CUDA, but HIP/ROCm is less forgiving, and fails to link host-side libraries that contain kernels that make calls to blockIdx.x, etc.

cms-sw/cmsdist#9121 should improve the situation, and then #44636 tries to use the Acc1D approach with these packages.

const float* __restrict__ in1,
const float* __restrict__ in2,
float* __restrict__ out,
uint32_t size) const {
cms::alpakatest::add_vectors_f(acc, in1, in2, out, size);
}
};

void wrapper_add_vectors_f(Queue& queue,
const float* __restrict__ in1,
const float* __restrict__ in2,
float* __restrict__ out,
uint32_t size) {
alpaka::exec<Acc1D>(queue, cms::alpakatools::make_workdiv<Acc1D>(32, 32), KernelAddVectorsF{}, in1, in2, out, size);
}

} // namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#ifndef HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h
#define HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h

#include <cstdint>

#include "HeterogeneousCore/AlpakaInterface/interface/config.h"

namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins {

void wrapper_add_vectors_f(Queue& queue,
const float* __restrict__ in1,
const float* __restrict__ in2,
float* __restrict__ out,
uint32_t size);

} // namespace ALPAKA_ACCELERATOR_NAMESPACE::HeterogeneousTestAlpakaDevicePlugins

#endif // HeterogeneousTest_AlpakaDevice_plugins_alpaka_AlpakaTestDeviceAdditionAlgo_h
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#include <cstdint>
#include <iostream>
#include <random>
#include <vector>

#include <alpaka/alpaka.hpp>

#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/Frameworkfwd.h"
#include "FWCore/Framework/interface/global/EDAnalyzer.h"
#include "FWCore/ParameterSet/interface/ConfigurationDescriptions.h"
#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
#include "HeterogeneousCore/AlpakaInterface/interface/memory.h"
#include "HeterogeneousCore/AlpakaServices/interface/alpaka/AlpakaService.h"

#include "AlpakaTestDeviceAdditionAlgo.h"

namespace ALPAKA_ACCELERATOR_NAMESPACE {

class AlpakaTestDeviceAdditionModule : public edm::global::EDAnalyzer<> {
public:
explicit AlpakaTestDeviceAdditionModule(edm::ParameterSet const& config);
~AlpakaTestDeviceAdditionModule() override = default;

static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);

void analyze(edm::StreamID, edm::Event const& event, edm::EventSetup const& setup) const override;

private:
const uint32_t size_;
};

AlpakaTestDeviceAdditionModule::AlpakaTestDeviceAdditionModule(edm::ParameterSet const& config)
: size_(config.getParameter<uint32_t>("size")) {}

void AlpakaTestDeviceAdditionModule::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
edm::ParameterSetDescription desc;
desc.add<uint32_t>("size", 1024 * 1024);

// ignore the alpaka = cms.untracked.PSet(...) injected by the framework
edm::ParameterSetDescription alpaka;
alpaka.setAllowAnything();
desc.addUntracked<edm::ParameterSetDescription>("alpaka", alpaka);

descriptions.addWithDefaultLabel(desc);
}

void AlpakaTestDeviceAdditionModule::analyze(edm::StreamID,
edm::Event const& event,
edm::EventSetup const& setup) const {
// require a valid Alpaka backend for running
edm::Service<ALPAKA_TYPE_ALIAS(AlpakaService)> service;
if (not service or not service->enabled()) {
std::cout << "The " << ALPAKA_TYPE_ALIAS_NAME(AlpakaService)
<< " is not available or disabled, the test will be skipped.\n";
return;
}

// random number generator with a gaussian distribution
std::random_device rd{};
std::default_random_engine rand{rd()};
std::normal_distribution<float> dist{0., 1.};

// tolerance
constexpr float epsilon = 0.000001;

// allocate input and output host buffers
std::vector<float> in1_h(size_);
std::vector<float> in2_h(size_);
std::vector<float> out_h(size_);

// fill the input buffers with random data, and the output buffer with zeros
for (uint32_t i = 0; i < size_; ++i) {
in1_h[i] = dist(rand);
in2_h[i] = dist(rand);
out_h[i] = 0.;
}

// run the test on all available devices
for (auto const& device : cms::alpakatools::devices<Platform>()) {
Queue queue{device};

// allocate input and output buffers on the device
auto in1_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_);
auto in2_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_);
auto out_d = cms::alpakatools::make_device_buffer<float[]>(queue, size_);

// copy the input data to the device
// FIXME: pass the explicit size of type uint32_t to avoid compilation error
// The destination view and the extent are required to have compatible index types!
alpaka::memcpy(queue, in1_d, in1_h, size_);
alpaka::memcpy(queue, in2_d, in2_h, size_);

// fill the output buffer with zeros
alpaka::memset(queue, out_d, 0);

// launch the 1-dimensional kernel for vector addition
HeterogeneousTestAlpakaDevicePlugins::wrapper_add_vectors_f(
queue, in1_d.data(), in2_d.data(), out_d.data(), size_);

// copy the results from the device to the host
alpaka::memcpy(queue, out_h, out_d);

// wait for all the operations to complete
alpaka::wait(queue);

// check the results
for (uint32_t i = 0; i < size_; ++i) {
float sum = in1_h[i] + in2_h[i];
assert(out_h[i] < sum + epsilon);
assert(out_h[i] > sum - epsilon);
}
}

std::cout << "All tests passed.\n";
}

} // namespace ALPAKA_ACCELERATOR_NAMESPACE

#include "HeterogeneousCore/AlpakaCore/interface/alpaka/MakerMacros.h"
DEFINE_FWK_ALPAKA_MODULE(AlpakaTestDeviceAdditionModule);
9 changes: 9 additions & 0 deletions HeterogeneousTest/AlpakaDevice/test/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<bin file="alpaka/testDeviceAddition.dev.cc" name="testAlpakaDeviceAddition">
<use name="catch2"/>
<use name="alpaka"/>
<use name="HeterogeneousTest/AlpakaDevice"/>
<use name="HeterogeneousCore/AlpakaInterface"/>
<flags ALPAKA_BACKENDS="1"/>
</bin>

<test name="testAlpakaTestDeviceAdditionModule" command="cmsRun ${LOCALTOP}/src/HeterogeneousTest/AlpakaDevice/test/testAlpakaTestDeviceAdditionModule.py"/>
102 changes: 102 additions & 0 deletions HeterogeneousTest/AlpakaDevice/test/alpaka/testDeviceAddition.dev.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#include <cstdint>
#include <random>
#include <vector>

#define CATCH_CONFIG_MAIN
#include <catch.hpp>

#include <alpaka/alpaka.hpp>

#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
#include "HeterogeneousCore/AlpakaInterface/interface/devices.h"
#include "HeterogeneousCore/AlpakaInterface/interface/memory.h"
#include "HeterogeneousCore/AlpakaInterface/interface/workdivision.h"
#include "HeterogeneousTest/AlpakaDevice/interface/alpaka/DeviceAddition.h"

using namespace ALPAKA_ACCELERATOR_NAMESPACE;

struct KernelAddVectorsF {
template <typename TAcc>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
const float* __restrict__ in1,
const float* __restrict__ in2,
float* __restrict__ out,
uint32_t size) const {
cms::alpakatest::add_vectors_f(acc, in1, in2, out, size);
}
};

TEST_CASE("HeterogeneousTest/AlpakaDevice test", "[alpakaTestDeviceAddition]") {
auto const& devices = cms::alpakatools::devices<Platform>();
if (devices.empty()) {
FAIL("No devices available for the " EDM_STRINGIZE(ALPAKA_ACCELERATOR_NAMESPACE) " backend, "
"the test will be skipped.");
}

// random number generator with a gaussian distribution
std::random_device rd{};
std::default_random_engine rand{rd()};
std::normal_distribution<float> dist{0., 1.};

// tolerance
constexpr float epsilon = 0.000001;

// buffer size
constexpr uint32_t size = 1024 * 1024;

// allocate input and output host buffers
std::vector<float> in1_h(size);
std::vector<float> in2_h(size);
std::vector<float> out_h(size);

// fill the input buffers with random data, and the output buffer with zeros
for (uint32_t i = 0; i < size; ++i) {
in1_h[i] = dist(rand);
in2_h[i] = dist(rand);
out_h[i] = 0.;
}

// run the test on all available devices
for (auto const& device : cms::alpakatools::devices<Platform>()) {
SECTION("Test add_vectors_f on " EDM_STRINGIZE(ALPAKA_ACCELERATOR_NAMESPACE) " backend") {
REQUIRE_NOTHROW([&]() {
Queue queue{device};

// allocate input and output buffers on the device
auto in1_d = cms::alpakatools::make_device_buffer<float[]>(queue, size);
auto in2_d = cms::alpakatools::make_device_buffer<float[]>(queue, size);
auto out_d = cms::alpakatools::make_device_buffer<float[]>(queue, size);

// copy the input data to the device
// FIXME: pass the explicit size of type uint32_t to avoid compilation error
// The destination view and the extent are required to have compatible index types!
alpaka::memcpy(queue, in1_d, in1_h, size);
alpaka::memcpy(queue, in2_d, in2_h, size);

// fill the output buffer with zeros
alpaka::memset(queue, out_d, 0);

// launch the 1-dimensional kernel for vector addition
alpaka::exec<Acc1D>(queue,
cms::alpakatools::make_workdiv<Acc1D>(32, 32),
KernelAddVectorsF{},
in1_d.data(),
in2_d.data(),
out_d.data(),
size);

// copy the results from the device to the host
alpaka::memcpy(queue, out_h, out_d, size);

// wait for all the operations to complete
alpaka::wait(queue);
}());

// check the results
for (uint32_t i = 0; i < size; ++i) {
float sum = in1_h[i] + in2_h[i];
CHECK_THAT(out_h[i], Catch::Matchers::WithinAbs(sum, epsilon));
}
}
}
}
Loading