CUBLAS_STATUS_NOT_SUPPORTED when run pytorch==2.5.1+cu118 #53

fbotp · 2024-12-12T02:44:54Z

My graphics card is AMD Radeon RX 6750 GRE 10GB (gfx1031), with version 24.12.1 AMD graphics card driver installed. The system is Windows 11 LTSC. I want to run deep learning code on this computer and call pytorch library, so I did the following:

Install HIP SDK 6.1.2 and add HIP_PATH/bin to the PATH path.
From ROCmLibs I downloaded rocm.gfx1031.for.hip.sdk.6.1.2.7z and replaced the library folder under %HIP_PATH%/bin/rocblas with all the files in it (rocblas.dll was also placed in the same location), and restarted the computer.
Installed Miniforge 3 and created a Python environment using the mamba create - n ai Python=3.10 command. Installed torch using pip install torch --index-url https://download.pytorch.org/whl/cu118.
I downloaded ROCm-6-ZLUDA and decompressed it, and also added the path of zluda to the PATH environment variable.
I replaced cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0dll in microforge3/env/ai/lib/site packages/torch/lib with cublas.dll, cusparsedll, and nvrtc.dll from zluda, respectively.
Call python -c import torch;print(torch.cuda.is_available()) and output True.
The following content has been added at the beginning of the deep learning code:

torch.backends.cudnn.enabled = False
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(False)

I loaded the BERT model using transformers and trained it, but found that it cannot run. The error message is as follows:

Traceback (most recent call last):
File "C:\Users\rouxiaobei\Downloads\address-291b83f2838c047820c9456a3e7250f601b60328\address-291b83f2838c047820c9456a3e7250f601b60328\main.py", line 362, in <module>
main()
File "C:\Users\rouxiaobei\Downloads\address-291b83f2838c047820c9456a3e7250f601b60328\address-291b83f2838c047820c9456a3e7250f601b60328\main.py", line 293, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "C:\Users\rouxiaobei\Downloads\address-291b83f2838c047820c9456a3e7250f601b60328\address-291b83f2838c047820c9456a3e7250f601b60328\main.py", line 91, in train
outputs = model(**inputs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\rouxiaobei\Downloads\address-291b83f2838c047820c9456a3e7250f601b60328\address-291b83f2838c047820c9456a3e7250f601b60328\models.py", line 16, in forward
outputs = self.bert(input_ids, attention_mask, token_type_ids)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\transformers\models\bert\modeling_bert.py", line 1142, in forward
encoder_outputs = self.encoder(
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\transformers\models\bert\modeling_bert.py", line 695, in forward
layer_outputs = layer_module(
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\transformers\models\bert\modeling_bert.py", line 585, in forward
self_attention_outputs = self.attention(
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\transformers\models\bert\modeling_bert.py", line 515, in forward
self_outputs = self.self(
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\transformers\models\bert\modeling_bert.py", line 395, in forward
query_layer = self.transpose_for_scores(self.query(hidden_states))
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\miniforge3\envs\temp\lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`

I want to know if I have made a mistake. I have been troubled for many days and have not been able to solve this problem. I am not sure if it is related to the Windows 11 LTSC version I am using, and I noticed that when using Dependencies to view the DLL file, I found that ext-ms-win-oobe-query-l1-1-0.dll is missing. I don't know if this is the reason why the code cannot run.

Sorry to bother you. Any suggestions would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

lshqqytiger · 2024-12-12T04:14:08Z

torch>=2.4 is broken. The official release started to require cuBLASLt, which is unavailable on ZLUDA right now. It is available only on Linux build as we cannot build hipBLASLt on Windows.

fbotp · 2024-12-12T11:40:27Z

OK, after replacing the torch to version 2.3.1, it can run normally, but not as fast as expected. The same code is about 12.5% slower than Ubuntu with ROCm installed, and there will be a 1Torch was not compiled with flash attention UserWarning, but it does not affect the operation. I look forward to this project getting better in the future.
Thank you again for your reply!

Exodeadh · 2025-01-09T16:23:17Z

hey :) i see that in these latest releases you are working very hard and are adding hipBLASLt too.
dows this mean that when time comes, we'll be able to update pytorch to 2.5.1+cu118?

thank you really much for your work. Really.

lshqqytiger · 2025-01-09T16:59:40Z

Now we have cuBLASLt on dev branch. After #61 is merged, I'll upload the nightly build that includes cublasLt.dll. For now, you can download v3.8.5 with cuBLASLt or build from dev branch. However, because AMD hasn't officially released hipBLASLt for Windows yet, you should build it yourself or download unofficial build from the Internet.
Additionally, although torch 2.5.1 is available, the performance is worse than 2.3.1 if you are using powerful cards (gfx1100) because rocBLAS tensile libraries are more well optimized than BLASLt.

lshqqytiger self-assigned this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUBLAS_STATUS_NOT_SUPPORTED when run pytorch==2.5.1+cu118 #53

CUBLAS_STATUS_NOT_SUPPORTED when run pytorch==2.5.1+cu118 #53

fbotp commented Dec 12, 2024

lshqqytiger commented Dec 12, 2024

fbotp commented Dec 12, 2024

Exodeadh commented Jan 9, 2025

lshqqytiger commented Jan 9, 2025 •

edited

Loading

CUBLAS_STATUS_NOT_SUPPORTED when run pytorch==2.5.1+cu118 #53

CUBLAS_STATUS_NOT_SUPPORTED when run pytorch==2.5.1+cu118 #53

Comments

fbotp commented Dec 12, 2024

lshqqytiger commented Dec 12, 2024

fbotp commented Dec 12, 2024

Exodeadh commented Jan 9, 2025

lshqqytiger commented Jan 9, 2025 • edited Loading

lshqqytiger commented Jan 9, 2025 •

edited

Loading