[alpaka] Caching allocators for host and device#248
[alpaka] Caching allocators for host and device#248abhinavramesh8 wants to merge 1 commit intocms-patatrack:masterfrom
Conversation
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h | ||
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h |
There was a problem hiding this comment.
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h | |
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h | |
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsAlpaka_h | |
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsAlpaka_h |
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h | ||
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h |
There was a problem hiding this comment.
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h | |
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h | |
| #ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisAlpaka_h | |
| #define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisAlpaka_h |
| * Descriptor for device memory allocations | ||
| */ | ||
| struct BlockDescriptor { | ||
| ALPAKA_ACCELERATOR_NAMESPACE::AlpakaDeviceBuf<std::byte> buf; // Device buffer |
There was a problem hiding this comment.
These classes should either be templated, or placed in a ALPAKA_ACCELERATOR_NAMESPACE namespace, similar to #236. Anything else violates ODR.
In the long term templating would be preferred over the namespace trick, I believe.
|
I have some general thoughts
|
|
In light of #260 can this PR be closed then? |
While we likely won't gain in performance from the caching allocator for CPU memory, we do need the stream-ordered behaviour for the "device" memory operations when using a non-blocking queue for a CPU backend (e.g. TBB). |
Right, but would a non-blocking queue for a CPU backend be useful for production use case? I'd naively expect non-blocking queue to mostly add overhead, even in the cases where intra-algorithm parallelization would otherwise be useful. Written that, for testing purposes I agree it can be useful, and then a caching allocator (or other mechanism to keep the temporary memory alive) would be needed. |
No, indeed - from a performance point of view, we do not want to use a non-blocking queue in production... especially with out But so far the non-blocking TBB backend has been very useful for finding synchronisation bugs :-)
|
|
Moved to #260 . |
A caching memory allocator each for host and device have been implemented for Alpaka, similar to the CUDA version. Host and device unique pointers have been provided for managing memory allocations. These pointers use the caching allocators by default, but this behavior can be disabled at compile time. The appropriate changes have been made to the existing codebase in order to make use of these unique pointers.