CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm` #73

traugdor · 2025-02-17T15:32:23Z

  File "D:\Stable Diffusion\ComfyUI\custom_nodes\ComfyUI-Riffusion\nodes.py", line 58, in waveform_from_spectrogram
    Sxx_torch = mel_inv_scaler(Sxx_torch)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torchaudio\transforms\_transforms.py", line 498, in forward
    specgram = torch.relu(torch.linalg.lstsq(self.fb.transpose(-1, -2)[None], melspec, driver=self.driver).solution)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm( handle, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb)`

here is the code that builds the call to InverseMelScale

    def waveform_from_spectrogram(self, Sxx: np.ndarray, n_fft: int, hop_length: int, win_length: int, num_samples: int, 
        sample_rate: int, mel_scale: bool = True, nmels: int = 512, max_mel_iters: int = 200, num_griffin_lim_iters: int = 32,
        device: str = platform.system() == "Darwin" and "cpu" or "cuda"
    ) -> np.ndarray:
        Sxx_torch = torch.from_numpy(Sxx).to(device)
        
        if mel_scale:
            mel_inv_scaler = torchaudio.transforms.InverseMelScale(n_mels=nmels, sample_rate=sample_rate, f_min=0, f_max=10000,
                n_stft=n_fft // 2 + 1, norm=None, mel_scale="htk").to(device)
            Sxx_torch = mel_inv_scaler(Sxx_torch)

versions:

Windows: 10
torch==2.6.0+cu118
torchaudio==2.6.0+cu118

This is after doing the following:

upgraded PyTorch to 2.6.0 cu118
replacing the 3 dll files of official CUDA (cublas, cusparse, and nvrtc) and comfyUI-Zluda has the torch backends already disabled as recommended in the README for Zluda
set $env:DISABLE_ADDM_CUDA_LT=1 in powershell run script before invoking zluda.exe

The text was updated successfully, but these errors were encountered:

traugdor · 2025-02-17T15:37:12Z

Should I also replace

cudart
cufft
cufftw

I'm not sure if including these 3 dlls into the official PyTorch libraries in my venv would fix the issue...

lshqqytiger · 2025-02-18T03:58:20Z

Thank you for report. Try this build.
~~cublas_dev.zip~~

Should I also replace

cudart

cufft

cufftw

cudart is not needed, but sometimes fft dlls are used by PyTorch.

traugdor · 2025-02-18T06:04:49Z

I copied this into my zluda directory and into the torch directory and now I get this error:

rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
!!! Exception during processing !!! CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasStrsm( handle, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb)`
Traceback (most recent call last):
  File "D:\Stable Diffusion\ComfyUI\execution.py", line 327, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\execution.py", line 202, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\execution.py", line 174, in _map_node_over_list
    process_inputs(input_dict, i)
  File "D:\Stable Diffusion\ComfyUI\execution.py", line 163, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\custom_nodes\riffusion\nodes.py", line 149, in Process_Riffusion
    audio, duration = self.get_wave_bytes_from_spectrogram(spec)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\custom_nodes\riffusion\nodes.py", line 93, in get_wave_bytes_from_spectrogram
    samples = self.waveform_from_spectrogram(Sxx=Sxx, n_fft=n_fft, hop_length=hop_length, win_length=win_length,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\custom_nodes\riffusion\nodes.py", line 58, in waveform_from_spectrogram
    Sxx_torch = mel_inv_scaler(Sxx_torch)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Stable Diffusion\ComfyUI\venv\Lib\site-packages\torchaudio\transforms\_transforms.py", line 498, in forward
    specgram = torch.relu(torch.linalg.lstsq(self.fb.transpose(-1, -2)[None], melspec, driver=self.driver).solution)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasStrsm( handle, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb)`

For reference I am running on a RX 6600XT

lshqqytiger · 2025-02-18T06:11:16Z

Could you try this?
dev.zip

traugdor · 2025-02-18T15:18:19Z

Hi Sorry for the late reply. There are two extra files in this build. What files do I copy from this into torch?

I'm guessing I need at least:

cublas.dll
cublasLt.dll
cudnn.dll

However I would like to be sure before I accidentally break my torch install in my venv.

lshqqytiger · 2025-02-19T01:14:39Z

cublasLt and cudnn are excluded by default because they depend on ROCm components, which are not included in official Windows HIP SDK releases.

You'll need

cublas
cusparse
nvrtc
cufft, cufftw

lshqqytiger self-assigned this Feb 18, 2025

lshqqytiger added the implementation Unimplemented feature(s) label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm` #73

CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm` #73

traugdor commented Feb 17, 2025

traugdor commented Feb 17, 2025

lshqqytiger commented Feb 18, 2025 •

edited

Loading

traugdor commented Feb 18, 2025

lshqqytiger commented Feb 18, 2025

traugdor commented Feb 18, 2025 •

edited

Loading

lshqqytiger commented Feb 19, 2025 •

edited

Loading

CUBLAS_STATUS_NOT_SUPPORTED when calling cublasStrsm #73

CUBLAS_STATUS_NOT_SUPPORTED when calling cublasStrsm #73

Comments

traugdor commented Feb 17, 2025

traugdor commented Feb 17, 2025

lshqqytiger commented Feb 18, 2025 • edited Loading

traugdor commented Feb 18, 2025

lshqqytiger commented Feb 18, 2025

traugdor commented Feb 18, 2025 • edited Loading

lshqqytiger commented Feb 19, 2025 • edited Loading

CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm` #73

CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasStrsm` #73

lshqqytiger commented Feb 18, 2025 •

edited

Loading

traugdor commented Feb 18, 2025 •

edited

Loading

lshqqytiger commented Feb 19, 2025 •

edited

Loading