-
Notifications
You must be signed in to change notification settings - Fork 47
[alpakatest][RFC] Prototype evolution of EDModule API #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@fwyzard This is the prototype I mentioned earlier (and apparently failed to open in a draft mode...). |
…led for each backend but can be compiled with host compiler
…e done without alpaka.hpp
6b4fa18 to
2275971
Compare
|
Rebased on top of |
|
Could this be extended to better handle multiple backends with the same memory space ? Currently we define a backend with
In principle we should have different execution options for the same memory space: cpu sync vs tbb sync, cuda sync vs cuda async, etc. Do you think the approach researched here could be used to have a single data product (both in terms of dataformat type, and of underlying memory buffer/soa) shared among different execution cases ? One concrete example would be having the CPU serial implementation for every module, and the TBB (serial) only for some modules where the extra parallelism makes sense. |
I think this approach would allow such an extension. There would certainly be many details to be worked out (like how to make the framework enough aware of memory and execution spaces, including supporting multiple devices of the same type, but in a generic way). But I'd expect the user-facing interfaces would stay mostly the same. I have also the CUDA managed memory / SYCL shared memory in mind (for platforms that have a truly unified memory), in which case it would be nice if the downstream, alpaka-independent consumers could use directly the data product wrapped in Of course, for any of this "using data products of one memory space in many backends" to work at all, the data product the EDProducer perceives to produce should be exactly the same type in all the backends for which this "sharing" is done (but IIUC you also wrote that). For Serial/TBB backends using the same product types should, in principle, be trivial (and therefore the setup should be straightforward if the TBB backend uses a synchronous queue). |
|
OK, so we are thinking about:
At lease for debugging, it might be useful to support also:
I'm starting to see why alpaka keeps the three concepts almost orthogonal... |
|
Made effectively obsolete by cms-sw/cmssw#39428 |
This PR prototypes the Alpaka EDModule API, taking inspiration from #224 and #256. A major tested idea was to see how far the system could be implemented with just forward-declared Alpaka device, queue, and event types in order to minimize the set of source files that need to be compiled with the device compiler (I first crafted this prototype before the
ALPAKA_HOST_ONLYmacro).The first commit extends the build rules by adding a new category of source files that need to be compiled for each Alpaka backend, but can be compiled with the host compiler. This functionality might be beneficial also on wider scope than this PR alone (so I could open a separate PR with only it). Here I took the approach of using a new file extension,
.acc("a" for e.g. "accelerated"), for the files that need to be compiled with the device compiler. The.ccfiles can be compiled with the host compiler. I'm not advocating for this particular choice as I'm not very fond of it, but I needed something to get on with the prototype.I don't think we should apply this PR as is, but identify the constructs that would be useful, and pick those (and improve the rest).
One idea here was to hide the
cms::alpakatools::Product<T>from users (having to explicitly interact with theScopedContextto get theTis annoying). In addition, for CPU serial backend (synchronous, operates in regular host memory) theProduct<T>wrapper is not used (because it is not really needed). In this way the downstream code could use the data products from Serial backend directly. For developers the setup would look likeedm::EDGetTokenT<T>and produced withedm::EDPutTokenT<T>(i.e. they look like normal products)edm::EDGetTokenT<edm::Host<T>>and produced to withedm::EDPutTokenT<edm::Host<T>>edm::Host<T>is just a "tag", not an actual product type.Internally this setup works such that for CPU Serial backend the
edm::Host<...>part is ignored, and for other backendsedm::EDGetTokenT<T>is mapped toedm::EDGetTokenT<edm::Product<T>>edm::EDGetTokenT<edm::Host<T>>is mapped toedm::EDGetTokenT<T>For this setup to work an
ALPAKA_ACCELERATOR_NAMESPACE::Eventclass is defined to be used in the EDModules instead ofedm::Event. It wraps theedm::Event, and implements the aforementioned mapping logic (for getting and putting side) with a set of helper classes that are specialized for the backends. TheALPAKA_ACCELERATOR_NAMESPACE::EDProducer(ExternalWork)class implements the (reverse) mapping logic for theconsumes()andproduces()side.The
cms::alpakatools::Product<TQueue, T>is transformed intoedm::Product<T>that can hold arbitrary metadata via type erasure (currentlystd::anyfor demonstration purposes). For Alpaka EDModules aALPAKA_ACCELERATOR_NAMESPACE::ProductMetadataclass is defined for the metadata purpose. This class(es) took also some of the functionality ofScopedContextthat seems to work there better in this abstraction model (actually thekokkosversion has similar structure here).The
ScopedContextclass structure is completely reorganized, and is now completely hidden from the developers. There is now anALPAKA_ACCELERATOR_NAMESPACE::impl::FwkContextBasebase class for the common functionality between ED modules and ES modules (although the latter is not exercised in this prototype, so this is what I believe to be the common functionality). TheALPAKA_ACCELERATOR_NAMESPACE::EDContextclass derives theFwkContextBaseand adds ED specific functionality. I guess theFwkContextBaseandEDContextcould be implemented also as templates instead of placing them intoALPAKA_ACCELERATOR_NAMESPACE(they are hidden from developers anyway).A third context class,
ALPAKA_ACCELERATOR_NAMESPACE::Context, is defined to be given to the developers (viaEDModule::produce()argument). It gives access to theQueueobject. Internally it also signals to theFwkContextBasewhen theQueuehas been asked by the developer, so that if the EDModule accesses its input products for the first time after that point, it won't try to re-use theQueuefrom the input product (because the initially assignedQueueis already being used). ThisContextclass can be later extended e.g. along #256.One additional piece that would reduce the number of places where the
edm::Host<T>would appear in user code, but is not prototyped here, would be automating the (mainly device-to-host) transfers. As long as the typeTcan be arbitrary, framework needs to be told how to transfer that type between two memory spaces (e.g. something along a plugin factory for functions), but at least these transfers would not have to expressed in the configuration anymore.