Set mempool hw_decompress flag if driver supports it #1854

wence- · 2025-03-07T17:49:10Z

Description

If the driver supports the flag, unconditionally set the async memory pool usage property to include a request to support HW decompression.

Closes [FEA] Support Blackwell decompression engine with async memory resource #1849

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

- Closes rapidsai#1849

eschmidt-nvidia · 2025-03-07T17:51:22Z

include/rmm/mr/device/cuda_async_memory_resource.hpp

@@ -64,6 +64,13 @@ class cuda_async_memory_resource final : public device_memory_resource {
    fabric    = 0x8   ///< Allows a fabric handle to be used for exporting. (cudaMemFabricHandle_t)
  };

+  enum class mempool_usage {
+    none          = 0x0, ///< No specific usage.
+    hw_decompress = 0x2, ///< If set indicates that the memory will be


Would recommend "If set indicates that the memory can be"

bdice

Looks right to me. Should we add a test that, if on driver >=12.8, the async MR’s pool has the right flags set?

eschmidt-nvidia · 2025-03-07T17:53:07Z

Looks right to me. Should we add a test that, if on driver >=12.8, the async MR’s pool has the right flags set?

Even better, you could check that an allocation from the pool is valid for decomp. I can provide an example for how to do this

wence- · 2025-03-07T18:08:05Z

Even better, you could check that an allocation from the pool is valid for decomp. I can provide an example for how to do this

This sounds like a good idea.

eschmidt-nvidia · 2025-03-07T18:20:13Z

bool ptrIsHwDecompressCapable;
cuPointerGetAttribute(&ptrIsHwDecompressCapable,
                                                CU_POINTER_ATTRIBUTE_IS_HW_DECOMPRESS_CAPABLE,
                                                ptr);

wence- · 2025-03-10T12:24:59Z

Did some cargo-cult cmake to link the driver in the async mr test.

include/rmm/mr/device/cuda_async_memory_resource.hpp

bdice · 2025-03-10T13:01:57Z

tests/CMakeLists.txt

-              CUDART STATIC)
-ConfigureTest(CUDA_ASYNC_MR_SHARED_CUDART_TEST mr/device/cuda_async_mr_tests.cpp GPUS 1 PERCENT 60
-              CUDART SHARED)
+ConfigureTest(CUDA_ASYNC_MR_STATIC_CUDART_TEST mr/device/cuda_async_mr_tests.cpp LINK_DRIVER GPUS 1


To fix the overlinking check, you'll need to update conda/recipes/librmm/recipe.yaml and add cuda-driver-dev to the dependencies in cache.requirements.build:

rmm/conda/recipes/librmm/recipe.yaml

Line 49 in d8b7dac

build:

I'm going to need some help from @rapidsai/packaging-codeowners here. AIUI, since one of the test files links directly against libcuda.so.1, I need to advertise a host dependency on cuda-driver-dev for the librmm-tests package.

I also should advertise a run dependency for cuda-driver-dev for the same package.

I believe I have done that correctly, but conda build still complains about overlinking. So I am at the end of my ability to cargo cult.

Hey @wence- -- it looks like cuda-driver-dev has libcuda.so but not the libcuda.so.1 link to whatever driver version I think it links against.

It looks like cuda-compat-impl DOES have the files this might need? I'm unclear on the relationship with the cuda-compat package.

Ahh, ok, it looks like cuda-compat will pull in the appropriate cuda-compat-impl subpackage, so maybe add that to host

Do you mean run? I need -lcuda (cuda-driver-dev) at compile-time (host?). But libcuda.so.1 (cuda-compat?) at runtime, I think.

Yes, I did mean run

hm, the finalized run dependencies don't show cuda-compat-impl despite it being a dependency of cuda-compat. That actually is probably sensible because otherwise the run dependencies would be a transitive explosion of stuff. However, given that libcuda.so.1 is actually in cuda-compat-impl, I wonder if that's why the overlinking check is still failing.

We could try adding cuda-compat-impl as the runtime dependency, and then either leave it that way (assuming the checks pass) or revert that change and add an explicit allowlist entry for libcuda.so.1 since we know that cuda-compat will (transitively) provide it.

I'm generally more for explicit dependencies than transitive ones, but meta- / helper-packages are a weird corner case

Had one more go with that setup.

I was trying this out locally and also not finding libcuda.so.1 -- it looks like it gets installed not into lib/ within the conda environment but into a separate conda-compat/ directory:

󰕈 gforsyth  …/miniforge3/envs/cuda-compat-impl   14:55  🐚 ls lib libatomic.so libgomp.so.1 libquadmath.so.0 libatomic.so.1 libgomp.so.1.0.0 libquadmath.so.0.0.0 libatomic.so.1.2.0 libitm.so libstdc++.so libgcc_s.so libitm.so.1 libstdc++.so.6 libgcc_s.so.1 libitm.so.1.0.0 libstdc++.so.6.0.33 libgomp.so libquadmath.so 󰕈 gforsyth  …/miniforge3/envs/cuda-compat-impl   14:56  🐚 ls cuda-compat/ libcuda.so libnvidia-nvvm.so libcuda.so.1 libnvidia-nvvm.so.4 libcuda.so.570.124.06 libnvidia-nvvm.so.570.124.06 libcudadebugger.so.1 libnvidia-ptxjitcompiler.so.1 libcudadebugger.so.570.124.06 libnvidia-ptxjitcompiler.so.570.124.06

Thanks. I'm out of ideas at this point, so feel free to poke at this

tests/mr/device/cuda_async_mr_tests.cpp

bdice

Hope these comments help work out the conda build issues.

bdice · 2025-03-11T23:21:27Z

conda/recipes/librmm/recipe.yaml

@@ -114,6 +115,7 @@ outputs:
        - ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
        - ${{ pin_subpackage("librmm", exact=True) }}
        - rapids-logger =0.1
+        - cuda-compat-impl


We should never need compat packages as a RAPIDS dependency. Why was this added?

Suggested change

- cuda-compat-impl

Because I have no clue why conda is complaining about overlinking of the driver in the tests package, and @gforsyth suggested it. It doesn't work, but nothing I tried works, so 🤷

tests/mr/device/cuda_async_mr_tests.cpp

conda/recipes/librmm/recipe.yaml

vyasr

We cannot have load-time linkage to the CUDA driver. We need to replace this with a dlopen so that our packages remain importable on nodes without GPUs (with suitably delayed initialization). If we need driver APIs, we'll have to dlopen libcuda.so.

wence- · 2025-03-12T12:05:46Z

We cannot have load-time linkage to the CUDA driver. We need to replace this with a dlopen so that our packages remain importable on nodes without GPUs (with suitably delayed initialization). If we need driver APIs, we'll have to dlopen libcuda.so.

I only need the driver in the tests.

Co-authored-by: Bradley Dice <[email protected]>

wence- · 2025-03-12T12:21:49Z

Hope these comments help work out the conda build issues.

Sadly not :( https://github.com/rapidsai/rmm/actions/runs/13811046966/job/38632706568?pr=1854#step:9:5571

wence- · 2025-03-13T14:36:35Z

My inclination here to actually support this at all is to just rip the test out and remove all the conda changes. If anyone has an idea how I can actually solve this overlinking error, I am all ears.

If we do want the tests and can solve the linking problem, I hope it is acceptable to link the tests to libcuda.so.1 (rather than dlopening), but if absolutely necessary I suppose I can do that too.

vyasr · 2025-03-15T00:59:08Z

I think we simply have to allowlist the driver in this case. The linkage is intentional, but we also can't get the driver from conda. We have to rely on the test being installed into environments where the driver is already installed on the host (or in the container where the test is being run).

bdice · 2025-03-18T21:26:59Z

@wence- @vyasr Do we want to land this in 25.04? I saw this failure in the logs. I updated the branch in case there's something unexpected. I can help dig into this tomorrow if nobody else has time (but would happily take a quick review on #1864 in exchange).

>       with pytest.raises(err_catch, match="My alloc error"):
E       AssertionError: Regex pattern did not match.
E        Regex: 'My alloc error'
E        Input: 'std::bad_alloc'

rmm/tests/test_rmm.py:938: AssertionError

bdice

This now looks good to me. CI passed on this round. @vyasr Can you approve, since you previously requested changes?
Please merge once you're happy with it.

vyasr

LGTM now, thanks!

bdice · 2025-03-18T23:46:45Z

/merge

…i#1854)" This reverts commit 7f0cead.

…#1873) This reverts commit 7f0cead. - Closes #1872 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1873

…ai#1854)" (rapidsai#1873) This reverts commit c6773f2.

Set mempool hw_decompress flag if driver supports it

bec106d

- Closes rapidsai#1849

wence- requested a review from a team as a code owner March 7, 2025 17:49

wence- requested review from rongou and bdice March 7, 2025 17:49

github-actions bot added the cpp Pertains to C++ code label Mar 7, 2025

wence- added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function and removed cpp Pertains to C++ code labels Mar 7, 2025

eschmidt-nvidia reviewed Mar 7, 2025

View reviewed changes

bdice reviewed Mar 7, 2025

View reviewed changes

Docs/clang-format

c6eecc6

github-actions bot added the cpp Pertains to C++ code label Mar 7, 2025

Test skeleton for hw compress

b6bad87

Include test body

a722910

rongou approved these changes Mar 7, 2025

View reviewed changes

wence- added 2 commits March 10, 2025 12:21

Usage slot of pool props only defined if cuda runtime version is >= 12.8

b08a1f3

Need to link against driver to check pointer attributes

ad5d9af

wence- requested a review from a team as a code owner March 10, 2025 12:24

wence- requested a review from vyasr March 10, 2025 12:24

github-actions bot added the CMake label Mar 10, 2025

bdice reviewed Mar 10, 2025

View reviewed changes

tests/mr/device/cuda_async_mr_tests.cpp Outdated Show resolved Hide resolved

Split test requiring driver to separate executable

1667d59

wence- requested a review from a team as a code owner March 10, 2025 13:22

github-actions bot added the conda label Mar 10, 2025

Ifdef out for older versions

d72aebc

wence- added 6 commits March 10, 2025 14:26

Perhaps cargo cult something

72d0fc3

More, we're in vanuatu now

db4c148

Maybe?

30e2328

Huh

847955c

Do we need this one too?

d553513

Let's try this one

c21bb0b

bdice reviewed Mar 11, 2025

View reviewed changes

vyasr requested changes Mar 12, 2025

View reviewed changes

wence- and others added 2 commits March 12, 2025 12:08

Remove unnecessary include

7a2624b

Co-authored-by: Bradley Dice <[email protected]>

More cargo cult

a7e37ab

Allowlist libcuda.so.1

6bb4329

Merge branch 'branch-25.04' into wence/fix/1849

7dcff0f

bdice assigned wence- Mar 18, 2025

bdice approved these changes Mar 18, 2025

View reviewed changes

bdice requested a review from vyasr March 18, 2025 22:39

vyasr approved these changes Mar 18, 2025

View reviewed changes

rapids-bot bot merged commit 7f0cead into rapidsai:branch-25.04 Mar 18, 2025
64 checks passed

pxLi mentioned this pull request Mar 19, 2025

[BUG] failed cudf JNI test: testCreateAdaptors cudaErrorInvalidValue invalid argument in cuda12.8+driver 535.xx NVIDIA/spark-rapids-jni#3044

Open

wence- mentioned this pull request Mar 19, 2025

[BUG] Setting the hw_decompress usage props on async pools breaks pool creation on 12.2 driver #1872

Open

wence- added a commit to wence-/rmm that referenced this pull request Mar 19, 2025

Revert "Set mempool hw_decompress flag if driver supports it (rapidsa…

698573c

…i#1854)" This reverts commit 7f0cead.

bdice mentioned this pull request Mar 19, 2025

[FEA] Support Blackwell decompression engine with async memory resource #1849

Open

bdice added a commit to bdice/rmm that referenced this pull request Mar 19, 2025

Reapply "Set mempool hw_decompress flag if driver supports it (rapids…

4feac2f

…ai#1854)" (rapidsai#1873) This reverts commit c6773f2.

bdice mentioned this pull request Mar 19, 2025

Set mempool hw_decompress flag if driver supports it #1875

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set mempool hw_decompress flag if driver supports it #1854

Set mempool hw_decompress flag if driver supports it #1854

wence- commented Mar 7, 2025

eschmidt-nvidia Mar 7, 2025

bdice left a comment

eschmidt-nvidia commented Mar 7, 2025

wence- commented Mar 7, 2025

eschmidt-nvidia commented Mar 7, 2025

wence- commented Mar 10, 2025

bdice Mar 10, 2025

wence- Mar 10, 2025

gforsyth Mar 10, 2025

gforsyth Mar 10, 2025

wence- Mar 10, 2025

gforsyth Mar 10, 2025

gforsyth Mar 10, 2025 •

edited

Loading

wence- Mar 10, 2025

gforsyth Mar 10, 2025

wence- Mar 11, 2025

bdice left a comment

bdice Mar 11, 2025

wence- Mar 12, 2025

vyasr left a comment

wence- commented Mar 12, 2025

wence- commented Mar 12, 2025

wence- commented Mar 13, 2025

vyasr commented Mar 15, 2025

bdice commented Mar 18, 2025

bdice left a comment •

edited

Loading

vyasr left a comment

bdice commented Mar 18, 2025

Set mempool hw_decompress flag if driver supports it #1854

Set mempool hw_decompress flag if driver supports it #1854

Conversation

wence- commented Mar 7, 2025

Description

Checklist

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

eschmidt-nvidia commented Mar 7, 2025

wence- commented Mar 7, 2025

eschmidt-nvidia commented Mar 7, 2025

wence- commented Mar 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gforsyth Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

wence- commented Mar 12, 2025

wence- commented Mar 12, 2025

wence- commented Mar 13, 2025

vyasr commented Mar 15, 2025

bdice commented Mar 18, 2025

bdice left a comment • edited Loading

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

bdice commented Mar 18, 2025

gforsyth Mar 10, 2025 •

edited

Loading

bdice left a comment •

edited

Loading