-
Notifications
You must be signed in to change notification settings - Fork 47
[cudadev][RFC] Prototype (host|device)_unique_ptr API to use lightweight "Context" object instead of CUDA stream #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add runAcquire(), runProduce() functions
… in better-defined way
| class HostAllocatorContext { | ||
| public: | ||
| explicit HostAllocatorContext(cudaStream_t stream) : stream_(stream) {} | ||
|
|
||
| void *allocate_host(size_t nbytes) const { return cms::cuda::allocate_host(nbytes, stream_); } | ||
|
|
||
| void free_host(void *ptr) const { cms::cuda::free_host(ptr); } | ||
|
|
||
| private: | ||
| cudaStream_t stream_; | ||
| }; | ||
|
|
||
| class DeviceAllocatorContext { | ||
| public: | ||
| explicit DeviceAllocatorContext(cudaStream_t stream) : stream_(stream) {} | ||
|
|
||
| void *allocate_device(size_t nbytes) const { return cms::cuda::allocate_device(nbytes, stream_); } | ||
|
|
||
| void free_device(void *ptr) const { cms::cuda::free_device(ptr, stream_); } | ||
|
|
||
| private: | ||
| cudaStream_t stream_; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now (and possibly forever in cudadev) the HostAllocatorContext and DeviceAllocatorContext look nearly identical, but in the future (in CMSSW) they could hold a pointer to the CachingHostAllocator/CachingDeviceAllocator objects.
…uda::Context objects instead of cudaStream_t
f2054b5 to
c91e0e3
Compare
|
Made effectively obsolete by cms-sw/cmssw#39428 (although this particular development is not part of the CMSSW PR). |
This PR builds on top of #224, but because of the actual developments conflict between the base commit of #224 and
master, also the #224 part is rebased. The actual developments of this PR are in the last three commits.The change can be summarized in
make_device_unique<T>(stream)changing tomake_device_unique(ctx)wherectxcan be e.g. theAcquireContext/ProduceContext, or a "lightweight"HostAllocatorContext/DeviceAllocatorContext/Context(theAcquireContext/ProduceContextare convertible to the latterContextobjects). (I'm really overusing the "Context" term here, but haven't figured out better wording yet).The idea is that
HostAllocatorContextprovides the access to pinned host memory allocator (and only that)DeviceAllocatorContextprovides access to device memory allocator (and only that)Contextprovides access to both pinned host and device memory allocators (via conversions to the two former types), and also whatever is needed to launch asynchronous kernels or memory transfers (in practice the CUDA stream)This change would allow e.g.
CachingDeviceAllocatorandCachingHostAllocatorobjects from global variables to be owned (again) byCUDAService(in CMSSW only), that would further enable (again) the caching allocator parameters be configured at run timeAcquireContext/ProduceContextfor better performance (see discussion in [cudadev] Improve caching allocator performance #218)