-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set mempool hw_decompress flag if driver supports it #1854
Conversation
@@ -64,6 +64,13 @@ class cuda_async_memory_resource final : public device_memory_resource { | |||
fabric = 0x8 ///< Allows a fabric handle to be used for exporting. (cudaMemFabricHandle_t) | |||
}; | |||
|
|||
enum class mempool_usage { | |||
none = 0x0, ///< No specific usage. | |||
hw_decompress = 0x2, ///< If set indicates that the memory will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend "If set indicates that the memory can be"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right to me. Should we add a test that, if on driver >=12.8, the async MR’s pool has the right flags set?
Even better, you could check that an allocation from the pool is valid for decomp. I can provide an example for how to do this |
This sounds like a good idea. |
|
Did some cargo-cult cmake to link the driver in the async mr test. |
tests/CMakeLists.txt
Outdated
CUDART STATIC) | ||
ConfigureTest(CUDA_ASYNC_MR_SHARED_CUDART_TEST mr/device/cuda_async_mr_tests.cpp GPUS 1 PERCENT 60 | ||
CUDART SHARED) | ||
ConfigureTest(CUDA_ASYNC_MR_STATIC_CUDART_TEST mr/device/cuda_async_mr_tests.cpp LINK_DRIVER GPUS 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the overlinking check, you'll need to update conda/recipes/librmm/recipe.yaml
and add cuda-driver-dev
to the dependencies in cache.requirements.build
:
rmm/conda/recipes/librmm/recipe.yaml
Line 49 in d8b7dac
build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to need some help from @rapidsai/packaging-codeowners here. AIUI, since one of the test files links directly against libcuda.so.1
, I need to advertise a host
dependency on cuda-driver-dev
for the librmm-tests
package.
I also should advertise a run
dependency for cuda-driver-dev
for the same package.
I believe I have done that correctly, but conda build still complains about overlinking. So I am at the end of my ability to cargo cult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wence- -- it looks like cuda-driver-dev
has libcuda.so
but not the libcuda.so.1
link to whatever driver version I think it links against.
It looks like cuda-compat-impl
DOES have the files this might need? I'm unclear on the relationship with the cuda-compat
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, ok, it looks like cuda-compat
will pull in the appropriate cuda-compat-impl
subpackage, so maybe add that to host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean run
? I need -lcuda
(cuda-driver-dev
) at compile-time (host
?). But libcuda.so.1
(cuda-compat
?) at runtime, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did mean run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, the finalized run dependencies
don't show cuda-compat-impl
despite it being a dependency of cuda-compat
. That actually is probably sensible because otherwise the run dependencies would be a transitive explosion of stuff. However, given that libcuda.so.1
is actually in cuda-compat-impl
, I wonder if that's why the overlinking
check is still failing.
We could try adding cuda-compat-impl
as the runtime dependency, and then either leave it that way (assuming the checks pass) or revert that change and add an explicit allowlist entry for libcuda.so.1
since we know that cuda-compat
will (transitively) provide it.
I'm generally more for explicit dependencies than transitive ones, but meta- / helper-packages are a weird corner case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had one more go with that setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying this out locally and also not finding libcuda.so.1
-- it looks like it gets installed not into lib/
within the conda environment but into a separate conda-compat/
directory:
gforsyth …/miniforge3/envs/cuda-compat-impl 14:55
🐚 ls lib
libatomic.so libgomp.so.1 libquadmath.so.0
libatomic.so.1 libgomp.so.1.0.0 libquadmath.so.0.0.0
libatomic.so.1.2.0 libitm.so libstdc++.so
libgcc_s.so libitm.so.1 libstdc++.so.6
libgcc_s.so.1 libitm.so.1.0.0 libstdc++.so.6.0.33
libgomp.so libquadmath.so
gforsyth …/miniforge3/envs/cuda-compat-impl 14:56
🐚 ls cuda-compat/
libcuda.so libnvidia-nvvm.so
libcuda.so.1 libnvidia-nvvm.so.4
libcuda.so.570.124.06 libnvidia-nvvm.so.570.124.06
libcudadebugger.so.1 libnvidia-ptxjitcompiler.so.1
libcudadebugger.so.570.124.06 libnvidia-ptxjitcompiler.so.570.124.06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm out of ideas at this point, so feel free to poke at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope these comments help work out the conda build issues.
conda/recipes/librmm/recipe.yaml
Outdated
@@ -114,6 +115,7 @@ outputs: | |||
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }} | |||
- ${{ pin_subpackage("librmm", exact=True) }} | |||
- rapids-logger =0.1 | |||
- cuda-compat-impl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never need compat packages as a RAPIDS dependency. Why was this added?
- cuda-compat-impl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I have no clue why conda is complaining about overlinking of the driver in the tests package, and @gforsyth suggested it. It doesn't work, but nothing I tried works, so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot have load-time linkage to the CUDA driver. We need to replace this with a dlopen so that our packages remain importable on nodes without GPUs (with suitably delayed initialization). If we need driver APIs, we'll have to dlopen libcuda.so.
I only need the driver in the tests. |
Co-authored-by: Bradley Dice <[email protected]>
Sadly not :( https://github.com/rapidsai/rmm/actions/runs/13811046966/job/38632706568?pr=1854#step:9:5571 |
My inclination here to actually support this at all is to just rip the test out and remove all the conda changes. If anyone has an idea how I can actually solve this overlinking error, I am all ears. If we do want the tests and can solve the linking problem, I hope it is acceptable to link the tests to libcuda.so.1 (rather than dlopening), but if absolutely necessary I suppose I can do that too. |
I think we simply have to allowlist the driver in this case. The linkage is intentional, but we also can't get the driver from conda. We have to rely on the test being installed into environments where the driver is already installed on the host (or in the container where the test is being run). |
@wence- @vyasr Do we want to land this in 25.04? I saw this failure in the logs. I updated the branch in case there's something unexpected. I can help dig into this tomorrow if nobody else has time (but would happily take a quick review on #1864 in exchange).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks good to me. CI passed on this round. @vyasr Can you approve, since you previously requested changes?
Please merge once you're happy with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks!
/merge |
…#1873) This reverts commit 7f0cead. - Closes #1872 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1873
…ai#1854)" (rapidsai#1873) This reverts commit c6773f2.
Description
If the driver supports the flag, unconditionally set the async memory pool usage property to include a request to support HW decompression.
Checklist