[alpakatest][RFC] Prototype evolution of EDModule API #314

makortel · 2022-02-22T02:01:53Z

This PR prototypes the Alpaka EDModule API, taking inspiration from #224 and #256. A major tested idea was to see how far the system could be implemented with just forward-declared Alpaka device, queue, and event types in order to minimize the set of source files that need to be compiled with the device compiler (I first crafted this prototype before the ALPAKA_HOST_ONLY macro).

The first commit extends the build rules by adding a new category of source files that need to be compiled for each Alpaka backend, but can be compiled with the host compiler. This functionality might be beneficial also on wider scope than this PR alone (so I could open a separate PR with only it). Here I took the approach of using a new file extension, .acc ("a" for e.g. "accelerated"), for the files that need to be compiled with the device compiler. The .cc files can be compiled with the host compiler. I'm not advocating for this particular choice as I'm not very fond of it, but I needed something to get on with the prototype.

I don't think we should apply this PR as is, but identify the constructs that would be useful, and pick those (and improve the rest).

One idea here was to hide the cms::alpakatools::Product<T> from users (having to explicitly interact with the ScopedContext to get the T is annoying). In addition, for CPU serial backend (synchronous, operates in regular host memory) the Product<T> wrapper is not used (because it is not really needed). In this way the downstream code could use the data products from Serial backend directly. For developers the setup would look like

The data products in the memory space of the current backend are consumed with edm::EDGetTokenT<T> and produced with edm::EDPutTokenT<T> (i.e. they look like normal products)
The data products from the "non-portable memory space" are consumed with edm::EDGetTokenT<edm::Host<T>> and produced to with edm::EDPutTokenT<edm::Host<T>>
- The edm::Host<T> is just a "tag", not an actual product type.

Internally this setup works such that for CPU Serial backend the edm::Host<...> part is ignored, and for other backends

The edm::EDGetTokenT<T> is mapped to edm::EDGetTokenT<edm::Product<T>>
The edm::EDGetTokenT<edm::Host<T>> is mapped to edm::EDGetTokenT<T>

For this setup to work an ALPAKA_ACCELERATOR_NAMESPACE::Event class is defined to be used in the EDModules instead of edm::Event. It wraps the edm::Event, and implements the aforementioned mapping logic (for getting and putting side) with a set of helper classes that are specialized for the backends. The ALPAKA_ACCELERATOR_NAMESPACE::EDProducer(ExternalWork) class implements the (reverse) mapping logic for the consumes() and produces() side.

The cms::alpakatools::Product<TQueue, T> is transformed into edm::Product<T> that can hold arbitrary metadata via type erasure (currently std::any for demonstration purposes). For Alpaka EDModules a ALPAKA_ACCELERATOR_NAMESPACE::ProductMetadata class is defined for the metadata purpose. This class(es) took also some of the functionality of ScopedContext that seems to work there better in this abstraction model (actually the kokkos version has similar structure here).

The ScopedContext class structure is completely reorganized, and is now completely hidden from the developers. There is now an ALPAKA_ACCELERATOR_NAMESPACE::impl::FwkContextBase base class for the common functionality between ED modules and ES modules (although the latter is not exercised in this prototype, so this is what I believe to be the common functionality). The ALPAKA_ACCELERATOR_NAMESPACE::EDContext class derives the FwkContextBase and adds ED specific functionality. I guess the FwkContextBase and EDContext could be implemented also as templates instead of placing them into ALPAKA_ACCELERATOR_NAMESPACE (they are hidden from developers anyway).

A third context class, ALPAKA_ACCELERATOR_NAMESPACE::Context, is defined to be given to the developers (via EDModule::produce() argument). It gives access to the Queue object. Internally it also signals to the FwkContextBase when the Queue has been asked by the developer, so that if the EDModule accesses its input products for the first time after that point, it won't try to re-use the Queue from the input product (because the initially assigned Queue is already being used). This Context class can be later extended e.g. along #256.

One additional piece that would reduce the number of places where the edm::Host<T> would appear in user code, but is not prototyped here, would be automating the (mainly device-to-host) transfers. As long as the type T can be arbitrary, framework needs to be told how to transfer that type between two memory spaces (e.g. something along a plugin factory for functions), but at least these transfers would not have to expressed in the configuration anymore.

makortel · 2022-02-22T02:02:30Z

@fwyzard This is the prototype I mentioned earlier (and apparently failed to open in a draft mode...).

…led for each backend but can be compiled with host compiler

…e done without alpaka.hpp

makortel · 2022-03-02T21:41:22Z

Rebased on top of master to fix conflicts in src/alpakatest/Makefile.

fwyzard · 2022-09-04T11:24:32Z

Could this be extended to better handle multiple backends with the same memory space ?

Currently we define a backend with

the memory space of the "accelerator" (host vs cuda vs rocm)
how the accelerator runs a kernel (e.g. CPU serial vs TBB)
how the host enqueues the work (blocking/sync vs non-blocking/async)

In principle we should have different execution options for the same memory space: cpu sync vs tbb sync, cuda sync vs cuda async, etc.

Do you think the approach researched here could be used to have a single data product (both in terms of dataformat type, and of underlying memory buffer/soa) shared among different execution cases ?

One concrete example would be having the CPU serial implementation for every module, and the TBB (serial) only for some modules where the extra parallelism makes sense.

makortel · 2022-09-08T01:55:01Z

Could this be extended to better handle multiple backends with the same memory space ?
...
Do you think the approach researched here could be used to have a single data product (both in terms of dataformat type, and of underlying memory buffer/soa) shared among different execution cases ?

I think this approach would allow such an extension. There would certainly be many details to be worked out (like how to make the framework enough aware of memory and execution spaces, including supporting multiple devices of the same type, but in a generic way). But I'd expect the user-facing interfaces would stay mostly the same.

I have also the CUDA managed memory / SYCL shared memory in mind (for platforms that have a truly unified memory), in which case it would be nice if the downstream, alpaka-independent consumers could use directly the data product wrapped in edm::Product (as it is called here) after a proper synchronization. With edm::Product<T> class template being part of the framework we could peek in there (like with edm::View).

Of course, for any of this "using data products of one memory space in many backends" to work at all, the data product the EDProducer perceives to produce should be exactly the same type in all the backends for which this "sharing" is done (but IIUC you also wrote that).

For Serial/TBB backends using the same product types should, in principle, be trivial (and therefore the setup should be straightforward if the TBB backend uses a synchronous queue).

fwyzard · 2022-09-08T07:54:58Z

OK, so we are thinking about:

unified memory / shared memory: different "accelerators" (cpu vs gpu), in the same memory space (unified addressing space accessible from all devices), with different queue types (sync vs async);
serial execution vs internal parallelism with TBB: different "accelerators" (cpu serial vs cpu parallel), in the same memory space (host memory), with the same queue type (sync).

At lease for debugging, it might be useful to support also:

sync GPU queues, async CPU queues: a given "accelerator", with its given memory space, but with both queue types (sync and async).

I'm starting to see why alpaka keeps the three concepts almost orthogonal...

makortel · 2022-09-16T19:13:57Z

Made effectively obsolete by cms-sw/cmssw#39428

makortel added 9 commits March 2, 2022 13:10

[alpakatest] Add a new category of source files that need to be compi…

3b042b5

…led for each backend but can be compiled with host compiler

[alpakatest] Add alpakaConfigFwd.h header for declarations that can b…

f8cf67d

…e done without alpaka.hpp

[alpakatest] Add TestProducerIsolated

fc1c8aa

[alpakatest] Add AlpakaAlgoIsolatedMember test

2221b27

[alpakatest] Add AlpakaAlgoProducer test

56194bb

[alpakatest] Add AlpakaAlgoConsumer test

6a13be8

[alpakatest] Prototype improved EDProducer interface

e8b69a8

[alpakatest] Use async host allocations in test algorithms

6f30ea5

[alpakatest] Remove old Product and ScopedContext

2275971

makortel force-pushed the alpakatestFramework_v3 branch from 6b4fa18 to 2275971 Compare March 2, 2022 21:40

makortel added the alpaka label Mar 4, 2022

makortel mentioned this pull request Sep 16, 2022

Evolution of the Alpaka "gpu framework" cms-sw/cmssw#39428

Merged

5 tasks

makortel closed this Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[alpakatest][RFC] Prototype evolution of EDModule API #314

[alpakatest][RFC] Prototype evolution of EDModule API #314

Uh oh!

makortel commented Feb 22, 2022

Uh oh!

makortel commented Feb 22, 2022

Uh oh!

makortel commented Mar 2, 2022

Uh oh!

fwyzard commented Sep 4, 2022

Uh oh!

makortel commented Sep 8, 2022

Uh oh!

fwyzard commented Sep 8, 2022

Uh oh!

makortel commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[alpakatest][RFC] Prototype evolution of EDModule API #314

[alpakatest][RFC] Prototype evolution of EDModule API #314

Uh oh!

Conversation

makortel commented Feb 22, 2022

Uh oh!

makortel commented Feb 22, 2022

Uh oh!

makortel commented Mar 2, 2022

Uh oh!

fwyzard commented Sep 4, 2022

Uh oh!

makortel commented Sep 8, 2022

Uh oh!

fwyzard commented Sep 8, 2022

Uh oh!

makortel commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants