Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the stream module a part of the public API #1775

Open
wants to merge 3 commits into
base: branch-25.02
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Jan 2, 2025

Description

Closes #1770.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 self-assigned this Jan 2, 2025
Copy link

copy-pr-bot bot commented Jan 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added CMake Python Related to RMM Python API labels Jan 2, 2025
@Matt711 Matt711 added feature request New feature or request non-breaking Non-breaking change CMake Python Related to RMM Python API and removed CMake Python Related to RMM Python API labels Jan 2, 2025
@Matt711 Matt711 marked this pull request as ready for review January 2, 2025 17:56
@Matt711 Matt711 requested review from a team as code owners January 2, 2025 17:56
Comment on lines +69 to +111
# @singledispatchmethod
# def _init_from_stream(self, obj):
# if obj is None:
# self._init_with_new_cuda_stream()
# return
# try:
# protocol = getattr(obj, "__cuda_stream__")
# except AttributeError:
# raise ValueError(
# "Argument must be None, a Stream, or implement __cuda_stream__"
# )
# if protocol[0] != 0:
# raise ValueError("Only protocol version 0 is supported")

# self._cuda_stream = <cudaStream_t>obj
# self.owner = obj

# @_init_from_stream.register
# def _(self, stream: Stream):
# self._cuda_stream, self._owner = stream._cuda_stream, stream._owner

# try:
# from numba import cuda
# @_init_from_stream.register
# def _(self, obj: cuda.cudadrv.driver.Stream):
# self._cuda_stream = <cudaStream_t><uintptr_t>(int(obj))
# self._owner = obj
# except ImportError:
# pass

# try:
# import cupy
# @_init_from_stream.register(cupy.cuda.stream.Stream)
# def _(self, obj):
# self._cuda_stream = <cudaStream_t><uintptr_t>(obj.ptr)
# self._owner = obj

# @_init_from_stream.register(cupy.cuda.stream.ExternalStream)
# def _(self, obj):
# self._cuda_stream = <cudaStream_t><uintptr_t>(obj.ptr)
# self._owner = obj
# except ImportError:
# pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reviewer: I left this commented code here just to note that registering the dispatch methods in a try-except block is not working. The code compiles but the tests fail with an ImportError: module cuda not available. I'll delete this comment if we can't get this approach to work.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! I will need a bit of time to play with this and see what works well in the design (or not).

Do we need both cuda_stream.pyx and stream.pyx? Could/should we combine these files?


import warnings

from rmm.pylibrmm.stream import ( # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using noqa: F401, please define __all__ for this module.

)

warnings.warn(
"The `rmm.pylibrmm.stream` module is deprecated in will be removed in a future release. Use `rmm.pylibrmm.stream` instead.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The `rmm.pylibrmm.stream` module is deprecated in will be removed in a future release. Use `rmm.pylibrmm.stream` instead.",
"The `rmm._cuda.stream` module is deprecated in 25.02 and will be removed in a future release. Use `rmm.pylibrmm.stream` instead.",

@leofang
Copy link
Member

leofang commented Jan 2, 2025

What's the rationale of making this a public API now that we have cuda.core...? Couldn't it stay private while we perfect cuda.core and make it production ready? I might have missed something.

@bdice
Copy link
Contributor

bdice commented Jan 2, 2025

What's the rationale of making this a public API now that we have cuda.core...? Couldn't it stay private while we perfect cuda.core and make it production ready? I might have missed something.

We want to start adding streams to pylibcudf APIs in RAPIDS 25.02. Exposing streams publicly is important for speed-of-light performance on several workloads.

Eventually we will want to move to cuda.core but I don't think we want RAPIDS to be tied to cuda.core's timeline at this moment. I would like cuda.core to take whatever time is needed to have a stable design. Once it's stable, we will be able to deprecate this in RAPIDS and move to cuda.core within ~2-4 months (one release to deprecate the existing design, then make the change in the subsequent release). Having this in RMM/pylibcudf in the short term also gives us a way to test the cuda.core APIs with a full application by changing imports and doing minor refactoring.

@leofang
Copy link
Member

leofang commented Jan 3, 2025

Thanks for explanation!

What's the rationale of making this a public API now that we have cuda.core...? Couldn't it stay private while we perfect cuda.core and make it production ready? I might have missed something.

We want to start adding streams to pylibcudf APIs in RAPIDS 25.02. Exposing streams publicly is important for speed-of-light performance on several workloads.

Eventually we will want to move to cuda.core but I don't think we want RAPIDS to be tied to cuda.core's timeline at this moment. I would like cuda.core to take whatever time is needed to have a stable design. Once it's stable, we will be able to deprecate this in RAPIDS and move to cuda.core within ~2-4 months

First of all, I should not have given the impression that cuda.core is not production ready. We just want to have the flexibility to "soft-break" the design (if possible) in case deemed necessary, but we do want to get people to start using it today. For example, we'll start integrating it with numba-cuda very soon, so one way or another RAPIDS users will have a copy of cuda.core installed in their environment and used somewhere in their workloads.

Next, I understand pylibcudf's needs and I am very supportive of having all APIs taking the stream explicitly. But I don't think it needs to be tied to any concrete type (not even cuda.core.Stream) considering that any object having __cuda_stream__ could be consumed by pylibcudf. So perhaps all of these (exposure + later deprecation/removal) are still redundant work?

Finally, with regard to __cuda_stream__ please chime in this RFC: NVIDIA/cuda-python#348. Thanks!

@bdice
Copy link
Contributor

bdice commented Jan 5, 2025

@leofang I spent some time reviewing the Stream class in cuda.core. I have some questions before we could adopt this. However, I think we should probably try to avoid making streams "more public" in the RMM API (several RAPIDS libraries already use these internals) and use cuda.core.

  • Will there be a cuda.core Cython API? Currently all our RMM stream-related logic is in Cython classes.
  • What API stability guarantees do we have? Will the cuda.core.experimental namespace only change with minor releases (0.2.0, 0.3.0, ...) or can it also change with point releases (0.1.1, 0.1.2, ...)? We would need to pin cuda.core tightly enough to guard against API breaks if we use it in a shipping RAPIDS product.
  • We will need to figure out how to turn __cuda_stream__ protocol objects into RMM's Cython cuda_stream_view class. I am guessing the best way to do this is to construct cuda.core.experimental.Stream.from_handle(...) and then figure out how to make a cuda_stream_view from the raw stream pointer (cast from the handle pointer int to uintptr_t to cudaStream_t, probably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[FEA] Make the rmm._cuda.stream.Stream a part of the public API
3 participants