ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

Bluexchan · 2025-01-05T08:39:10Z

Errow while using comfyiu with zluda-4.

Hello, i am using comfyui with zluda-4 for a while. Thanks for the great work.
Well, I am using 5800X and 6750GRE 10, comfyui with cuda118 and torch 2.5.1(torch 2.3 also OK too), it works fine for the 2 or 3 months.

Then i notice the new zluda-4 released, also the HIP 6.2.4. so here was what done:

install the new HIP 6.2.4 from AMD drivers. (was 6.1 2024Q3) , modify ROCM_HOME, HIP_HOME to the HIP 6.2.4 installed path, also add hip-6.2.4/bin to the PATH.
patch HIP with rocm.gfx1031.for.hip.sdk.6.2.4.littlewu.s.logic(1), from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.6.2.4
conda create a new venv, install torch 2.5.1 + cuda12.4, and copied cublas64_12.dll, cusparse64_12.dll, nvrtc64_120_0.dll to ....Lib\site-packages\torch\lib, those file from https://github.com/lshqqytiger/ZLUDA/releases. the <ZLUDA-windows-rocm6-amd64.zip>.
those was fine while in zluda-3 + cuda112 + torch2.5.1
torch.backends.cudnn.enabled = False
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(False)
Then run zluda.exe python main.py.

Here's the ERROR from comfyui.

**RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.**

How should i do to solve this. Or How should do while running comfy with zluda-4 with cu12.4 ? Thansk a lot.

and btw, i notice that in https://github.com/lshqqytiger/ZLUDA/releases:
New environment variable ZLUDA_NVRTC_LIB: our new ZLUDA runtime compiler depends on the original NVIDIA runtime compiler library, so you should specify the path of it unless it is named nvrtc_cuda.dll.

I have no idea how to do this.

Thanks again.

The text was updated successfully, but these errors were encountered:

lshqqytiger · 2025-01-05T09:01:32Z

This repository is continuing project of original ZLUDA 3 project of vosen.
ZLUDA 3 does not fully support CUDA 12 and mimics the behaviors of CUDA 11.8.
torch cu118 should work.
About runtime compiler, if you meet nvrtc internal error, you will need ZLUDA_NVRTC_LIB or dll renaming. If not, it's fine because nvrtc is not being used by PyTorch in that case.

Bluexchan · 2025-01-05T09:21:44Z

Got it, Thanks for replying.
Will this project support cu12.4 in the future. That could be cool. Thanks for the work.

lshqqytiger · 2025-01-09T04:29:46Z

According to my investigation and the comments of vosen, the original author of ZLUDA, we may already have everything that is needed to run CUDA 12 applications. However, for unknown reasons, CUDA runtime (the application side) behaves quite differently between torch+cu118 and +cu12x. It invokes cuLibraryLoadData without PTX assembly that we actually need while cu118 calls dark api with PTX. I'm going to dig into CUDA runtime more, but I'm not sure I can solve this problem.

Roninos · 2025-01-31T00:57:53Z

Hi! When will torch 2.4.1 and above be supported? I installed hip 6.2, it works, but on torch 2.3.1. Many applications require a higher version. I tried different versions of zluda, including nightly, but to no avail. For example, triton is used to speed up generations, but requires cu124

lshqqytiger · 2025-01-31T02:47:20Z

Currently, torch cu12x is not supported. I described the reason in my previous comment.
However, nightly ZLUDA supports torch 2.4/2.5 cu118 with unofficial build of hipBLASLt.

Download hipBLASLt.
Unpack hipBLASLt.
Install upon HIP SDK 6.2.
Download and unpack ZLUDA nightly build.
Preload ZLUDA's cublasLt or replace cublasLt64_[version].dll with ZLUDA's cublasLt.dll.

Note that it is only available on gfx1100/01/02/03/50 because of hipBLASLt limitation.

Roninos · 2025-02-01T12:12:19Z

It doesn't work on gtx1010 (rx 5700 xt) with torch 2.4/2.5.

lshqqytiger · 2025-02-01T12:26:59Z

hipBLASLt does only support gfx90a, 94x, 110x as you can see here.
Also, PyTorch does not allow us to disable BLASLt on Windows. (torch.backends.cuda.preferred_blas_library)
I'm not sure environment variable TORCH_BLAS_PREFER_CUBLASLT=0 can help.

githubcto · 2025-02-03T10:53:42Z

hipBLASLt does only support gfx90a, 94x, 110x as you can see here. Also, PyTorch does not allow us to disable BLASLt on Windows. (torch.backends.cuda.preferred_blas_library) I'm not sure environment variable TORCH_BLAS_PREFER_CUBLASLT=0 can help.

RX6000 and below, need this environment variable under windows, same as linux.

ZLUDA/README.md

Line 229 in c4994b3

DISABLE_ADDMM_CUDA_LT=1

I confirmed that A1111-ZLUDA, Forge-ZLUDA and SD.Next can generate SDXL images using torch2.4.1, 2.5.1 and 2.6.0.
Radeon RX6000
windows 10
HIP SDK 6.2.4
ZLUDA v3.8.7

However, some extension which use Triton does not work. bitsandbytes either.
Because triton-windows-amdgpu and bitsandbytes-ROCm-Windows require RX7000 and above. RX6000 can not use them.

RX6000 and below,
1st, remove venv.
2nd, add these two lines on webui-user.bat
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/webui-user.bat#L6

set COMMANDLINE_ARGS=--use-zluda
set DISABLE_ADDMM_CUDA_LT=1

3rd, modify modules\zluda_installer.py,
change here from 2.3.1 to 2.4.1, or manually install torch2.4.1-cu118 (not cu124) by yourself.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/zluda_installer.py#L118
if agent.arch in (rocm.MicroArchitecture.RDNA, rocm.MicroArchitecture.CDNA,):
return "2.4.1" if hipBLASLt_enabled else "2.3.1"

In addition, OLD version may need additional one or two environment variables on webui-user.bat.
set TORCH_BLAS_PREFER_HIPBLASLT=0
set ZLUDA_NVRTC_LIB=" ...A1111 path here... .zluda\nvrtc.dll"

Latest A1111-ZLUDA, Forge-ZLUDA and SD.Next does not need them because these are configured automatically,

TORCH_BLAS_PREFER_HIPBLASLT is configured by rocm.py.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/rocm.py#L199
ZLUDA_NVRTC_LIB is configured by zluda_installer.py.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/zluda_installer.py#L87

Other application which use torch2.4.1-cu118 may need these three environment variables.

lshqqytiger · 2025-02-03T11:04:27Z

Thank you for sharing your discovery. I'll update zluda installer to set DISABLE_ADDMM_CUDA_LT=1 if hipBLASLt is unavailable.
The environment variable TORCH_BLAS_PREFER_HIPBLASLT will not be used by torch because we use torch compiled for CUDA. So I think it is not needed.
The environment variable ZLUDA_NVRTC_LIB should target the original NVIDIA Runtime Compiler library, not ZLUDA Runtime Compiler. Therefore, it should be venv/Lib/site-packages/torch/lib/nvrtc64_112_0.dll.

lshqqytiger mentioned this issue Jan 6, 2025

RuntimeError: CUDA error: operation not supported when calling conv2d #58

Closed

lshqqytiger self-assigned this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

Bluexchan commented Jan 5, 2025

lshqqytiger commented Jan 5, 2025

Bluexchan commented Jan 5, 2025

lshqqytiger commented Jan 9, 2025

Roninos commented Jan 31, 2025

lshqqytiger commented Jan 31, 2025 •

edited

Loading

Roninos commented Feb 1, 2025

lshqqytiger commented Feb 1, 2025 •

edited

Loading

githubcto commented Feb 3, 2025

lshqqytiger commented Feb 3, 2025

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

Comments

Bluexchan commented Jan 5, 2025

lshqqytiger commented Jan 5, 2025

Bluexchan commented Jan 5, 2025

lshqqytiger commented Jan 9, 2025

Roninos commented Jan 31, 2025

lshqqytiger commented Jan 31, 2025 • edited Loading

Roninos commented Feb 1, 2025

lshqqytiger commented Feb 1, 2025 • edited Loading

githubcto commented Feb 3, 2025

However, some extension which use Triton does not work. bitsandbytes either. Because triton-windows-amdgpu and bitsandbytes-ROCm-Windows require RX7000 and above. RX6000 can not use them.

lshqqytiger commented Feb 3, 2025

lshqqytiger commented Jan 31, 2025 •

edited

Loading

lshqqytiger commented Feb 1, 2025 •

edited

Loading

However, some extension which use Triton does not work. bitsandbytes either.
Because triton-windows-amdgpu and bitsandbytes-ROCm-Windows require RX7000 and above. RX6000 can not use them.