Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async view memory resource bindings to Python. #1864

Merged
merged 5 commits into from
Mar 18, 2025

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Mar 17, 2025

Description

Closes #1611.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Mar 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Related to RMM Python API cpp Pertains to C++ code labels Mar 17, 2025
@bdice
Copy link
Contributor Author

bdice commented Mar 17, 2025

/ok to test

@bdice
Copy link
Contributor Author

bdice commented Mar 17, 2025

Currently tests are failing locally and I am struggling to identify why. It seems like something is breaking when passing the pool from Python to C++?

@bdice
Copy link
Contributor Author

bdice commented Mar 17, 2025

/ok to test

@bdice bdice marked this pull request as ready for review March 17, 2025 20:27
@bdice bdice requested review from a team as code owners March 17, 2025 20:27
@bdice bdice requested review from harrism and wence- March 17, 2025 20:27
@bdice bdice added feature request New feature or request non-breaking Non-breaking change labels Mar 17, 2025
@bdice bdice self-assigned this Mar 17, 2025
@bdice
Copy link
Contributor Author

bdice commented Mar 18, 2025

This PR is out of date with the commits I've pushed. I will close and reopen to retrigger it.

@bdice bdice closed this Mar 18, 2025
@bdice bdice reopened this Mar 18, 2025
@@ -72,7 +72,8 @@ class cuda_async_view_memory_resource final : public device_memory_resource {
*/
[[nodiscard]] cudaMemPool_t pool_handle() const noexcept { return cuda_pool_handle_; }

cuda_async_view_memory_resource() = default;
cuda_async_view_memory_resource() = default;
~cuda_async_view_memory_resource() = default;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a missing rule-of-6 destructor declaration here, which raised a warning in my IDE.


Parameters
----------
valid_pool_handle : cudaMemPool_t
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name valid_pool_handle but I copied it from C++. Should we consider renaming it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though please mention somewhere in this docstring that the user is responsible for keeping the mempool alive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b7e899d. The new sentence says,

The memory pool passed in must not be destroyed during the lifetime of this memory resource.

raw_pool_handle = int(valid_pool_handle)
c_valid_pool_handle = <cyruntime.cudaMemPool_t><uintptr_t>raw_pool_handle
else:
raw_pool_handle = int(runtime.cudaMemPool_t(valid_pool_handle))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any typing we can apply to the constructor argument that would prevent someone passing something bad in here (or does this constructor do that for hs?)

Copy link
Contributor Author

@bdice bdice Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I narrowed the constructor a bit in b7e899d. I am now forcing users to pass a cuda.bindings.runtime.cudaMemPool_t or cuda.bindings.driver.CUmemoryPool instance. The constructor no longer accepts raw ints as pointers to the corresponding types. I think that is much safer.

props.allocType = runtime.cudaMemAllocationType.cudaMemAllocationTypePinned
props.location.id = rmm._cuda.gpu.getDevice()
props.location.type = runtime.cudaMemLocationType.cudaMemLocationTypeDevice
err, pool = runtime.cudaMemPoolCreate(props)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should also destroy the pool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Tested in b7e899d. It also makes sure that the memory resource raises a MemoryError when asked to allocate after the pool is destroyed.

def test_cuda_async_view_memory_resource_default_pool(dtype, nelem, alloc):
# Get the default memory pool handle
current_device = rmm._cuda.gpu.getDevice()
err, pool = runtime.cudaDeviceGetDefaultMemPool(current_device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pool need to be released? I don't know how the default mempool behaves

Copy link
Contributor Author

@bdice bdice Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It cannot be destroyed, per the docs. Attempting to destroy it returns an error code. 😄

Note:
A device's default memory pool cannot be destroyed.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY__POOLS.html#group__CUDART__MEMORY__POOLS_1g709113128c1c52c3bf170022dc7723dd

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only minor nitpicks, so approving as is.

@bdice
Copy link
Contributor Author

bdice commented Mar 18, 2025

/merge

@rapids-bot rapids-bot bot merged commit c66ded5 into rapidsai:branch-25.04 Mar 18, 2025
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Need a Python interface for accessing the C++ cuda_async_view_memory_resource
2 participants