[alpaka] Caching allocators for host and device by abhinavramesh8 · Pull Request #248 · cms-patatrack/pixeltrack-standalone

abhinavramesh8 · 2021-10-19T10:04:28Z

A caching memory allocator each for host and device have been implemented for Alpaka, similar to the CUDA version. Host and device unique pointers have been provided for managing memory allocations. These pointers use the caching allocators by default, but this behavior can be disabled at compile time. The appropriate changes have been made to the existing codebase in order to make use of these unique pointers.

waredjeb · 2021-10-19T10:19:32Z

src/alpaka/AlpakaDataFormats/SiPixelDigiErrorsAlpaka.h

+#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h
+#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h


Suggested change

#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h

#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h

#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsAlpaka_h

#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsAlpaka_h

waredjeb · 2021-10-19T10:20:25Z

src/alpaka/AlpakaDataFormats/SiPixelDigisAlpaka.h

+#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h
+#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h


Suggested change

#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h

#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h

#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisAlpaka_h

#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigisAlpaka_h

makortel · 2021-10-19T13:20:12Z

src/alpaka/AlpakaCore/CachingDeviceAllocator.h

+     * Descriptor for device memory allocations
+     */
+    struct BlockDescriptor {
+      ALPAKA_ACCELERATOR_NAMESPACE::AlpakaDeviceBuf<std::byte> buf;  // Device buffer


These classes should either be templated, or placed in a ALPAKA_ACCELERATOR_NAMESPACE namespace, similar to #236. Anything else violates ODR.

In the long term templating would be preferred over the namespace trick, I believe.

makortel · 2021-10-27T01:55:10Z

I have some general thoughts

The approach of this PR could be useful for early performance studies (after the ODR is addressed in some way, regardless of what else I write below)
In light of a deployment in CMSSW, I think the Event and EventSetup data formats should be independent of Alpaka (at compile time). This would imply a smart-pointer type similar to std::shared_ptr that erases (or hides) the exact type of the deleter.
I would imagine caching allocations for non-pinned host memory to only add overhead
Which makes me wonder if, instead of using Alpaka's API in the caching allocator, would it be better to use CUDA directly? Or use the Alpaka's API for caching device (and pinned host) memory allocations, but bypass the caching allocator for host backends?

makortel · 2021-11-02T14:07:15Z

In light of #260 can this PR be closed then?

fwyzard · 2021-11-04T10:54:09Z

I would imagine caching allocations for non-pinned host memory to only add overhead

While we likely won't gain in performance from the caching allocator for CPU memory, we do need the stream-ordered behaviour for the "device" memory operations when using a non-blocking queue for a CPU backend (e.g. TBB).

makortel · 2021-11-04T13:53:22Z

I would imagine caching allocations for non-pinned host memory to only add overhead

While we likely won't gain in performance from the caching allocator for CPU memory, we do need the stream-ordered behaviour for the "device" memory operations when using a non-blocking queue for a CPU backend (e.g. TBB).

Right, but would a non-blocking queue for a CPU backend be useful for production use case? I'd naively expect non-blocking queue to mostly add overhead, even in the cases where intra-algorithm parallelization would otherwise be useful.

Written that, for testing purposes I agree it can be useful, and then a caching allocator (or other mechanism to keep the temporary memory alive) would be needed.

fwyzard · 2021-11-04T14:27:46Z

Right, but would a non-blocking queue for a CPU backend be useful for production use case? I'd naively expect non-blocking queue to mostly add overhead, even in the cases where intra-algorithm parallelization would otherwise be useful.

No, indeed - from a performance point of view, we do not want to use a non-blocking queue in production... especially with out acquire()/produce() mechanism.

But so far the non-blocking TBB backend has been very useful for finding synchronisation bugs :-)

Written that, for testing purposes I agree it can be useful, and then a caching allocator (or other mechanism to keep the temporary memory alive) would be needed.

fwyzard · 2021-11-04T14:28:06Z

Moved to #260 .

[alpaka] Caching allocators for host and device

b1fa23e

waredjeb reviewed Oct 19, 2021

View reviewed changes

makortel reviewed Oct 19, 2021

View reviewed changes

fwyzard added the alpaka label Oct 19, 2021

waredjeb mentioned this pull request Nov 2, 2021

[alpaka] Caching allocators for host and device #260

Closed

fwyzard closed this Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[alpaka] Caching allocators for host and device#248

[alpaka] Caching allocators for host and device#248
abhinavramesh8 wants to merge 1 commit intocms-patatrack:masterfrom
abhinavramesh8:consolidated_alpaka_caching_allocator

abhinavramesh8 commented Oct 19, 2021 •

edited

Loading

Uh oh!

waredjeb Oct 19, 2021

Uh oh!

waredjeb Oct 19, 2021

Uh oh!

makortel Oct 19, 2021

Uh oh!

makortel commented Oct 27, 2021

Uh oh!

makortel commented Nov 2, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

makortel commented Nov 4, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		#ifndef AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h
		#define AlpakaDataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h

Conversation

abhinavramesh8 commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

waredjeb Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

waredjeb Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

makortel Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

makortel commented Oct 27, 2021

Uh oh!

makortel commented Nov 2, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

makortel commented Nov 4, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

fwyzard commented Nov 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abhinavramesh8 commented Oct 19, 2021 •

edited

Loading