Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

Open
Bluexchan opened this issue Jan 5, 2025 · 9 comments
Open

ERROR whle running comfyui with zluda-4 + cu12.4 + torch2.5.1 #57

Bluexchan opened this issue Jan 5, 2025 · 9 comments
Assignees

Comments

@Bluexchan
Copy link

Errow while using comfyiu with zluda-4.

Hello, i am using comfyui with zluda-4 for a while. Thanks for the great work.
Well, I am using 5800X and 6750GRE 10, comfyui with cuda118 and torch 2.5.1(torch 2.3 also OK too), it works fine for the 2 or 3 months.

Then i notice the new zluda-4 released, also the HIP 6.2.4. so here was what done:

  1. install the new HIP 6.2.4 from AMD drivers. (was 6.1 2024Q3) , modify ROCM_HOME, HIP_HOME to the HIP 6.2.4 installed path, also add hip-6.2.4/bin to the PATH.

  2. patch HIP with rocm.gfx1031.for.hip.sdk.6.2.4.littlewu.s.logic(1), from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.6.2.4

  3. conda create a new venv, install torch 2.5.1 + cuda12.4, and copied cublas64_12.dll, cusparse64_12.dll, nvrtc64_120_0.dll to ....Lib\site-packages\torch\lib, those file from https://github.com/lshqqytiger/ZLUDA/releases. the <ZLUDA-windows-rocm6-amd64.zip>.

  4. those was fine while in zluda-3 + cuda112 + torch2.5.1
    torch.backends.cudnn.enabled = False
    torch.backends.cuda.enable_flash_sdp(False)
    torch.backends.cuda.enable_math_sdp(True)
    torch.backends.cuda.enable_mem_efficient_sdp(False)

  5. Then run zluda.exe python main.py.

Here's the ERROR from comfyui.

**RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.**

How should i do to solve this. Or How should do while running comfy with zluda-4 with cu12.4 ? Thansk a lot.

and btw, i notice that in https://github.com/lshqqytiger/ZLUDA/releases:
New environment variable ZLUDA_NVRTC_LIB: our new ZLUDA runtime compiler depends on the original NVIDIA runtime compiler library, so you should specify the path of it unless it is named nvrtc_cuda.dll.

I have no idea how to do this. 

Thanks again.

@lshqqytiger
Copy link
Owner

This repository is continuing project of original ZLUDA 3 project of vosen.
ZLUDA 3 does not fully support CUDA 12 and mimics the behaviors of CUDA 11.8.
torch cu118 should work.
About runtime compiler, if you meet nvrtc internal error, you will need ZLUDA_NVRTC_LIB or dll renaming. If not, it's fine because nvrtc is not being used by PyTorch in that case.

@Bluexchan
Copy link
Author

Got it, Thanks for replying.
Will this project support cu12.4 in the future. That could be cool. Thanks for the work.

@lshqqytiger
Copy link
Owner

According to my investigation and the comments of vosen, the original author of ZLUDA, we may already have everything that is needed to run CUDA 12 applications. However, for unknown reasons, CUDA runtime (the application side) behaves quite differently between torch+cu118 and +cu12x. It invokes cuLibraryLoadData without PTX assembly that we actually need while cu118 calls dark api with PTX. I'm going to dig into CUDA runtime more, but I'm not sure I can solve this problem.

@lshqqytiger lshqqytiger self-assigned this Jan 9, 2025
@Roninos
Copy link

Roninos commented Jan 31, 2025

Hi! When will torch 2.4.1 and above be supported? I installed hip 6.2, it works, but on torch 2.3.1. Many applications require a higher version. I tried different versions of zluda, including nightly, but to no avail. For example, triton is used to speed up generations, but requires cu124

@lshqqytiger
Copy link
Owner

lshqqytiger commented Jan 31, 2025

Currently, torch cu12x is not supported. I described the reason in my previous comment.
However, nightly ZLUDA supports torch 2.4/2.5 cu118 with unofficial build of hipBLASLt.

  1. Download hipBLASLt.
  2. Unpack hipBLASLt.
  3. Install upon HIP SDK 6.2.
  4. Download and unpack ZLUDA nightly build.
  5. Preload ZLUDA's cublasLt or replace cublasLt64_[version].dll with ZLUDA's cublasLt.dll.

Note that it is only available on gfx1100/01/02/03/50 because of hipBLASLt limitation.

@Roninos
Copy link

Roninos commented Feb 1, 2025

It doesn't work on gtx1010 (rx 5700 xt) with torch 2.4/2.5.

@lshqqytiger
Copy link
Owner

lshqqytiger commented Feb 1, 2025

hipBLASLt does only support gfx90a, 94x, 110x as you can see here.
Also, PyTorch does not allow us to disable BLASLt on Windows. (torch.backends.cuda.preferred_blas_library)
I'm not sure environment variable TORCH_BLAS_PREFER_CUBLASLT=0 can help.

@githubcto
Copy link

hipBLASLt does only support gfx90a, 94x, 110x as you can see here. Also, PyTorch does not allow us to disable BLASLt on Windows. (torch.backends.cuda.preferred_blas_library) I'm not sure environment variable TORCH_BLAS_PREFER_CUBLASLT=0 can help.

RX6000 and below, need this environment variable under windows, same as linux.

ZLUDA/README.md

Line 229 in c4994b3

DISABLE_ADDMM_CUDA_LT=1

DISABLE_ADDMM_CUDA_LT=1

I confirmed that A1111-ZLUDA, Forge-ZLUDA and SD.Next can generate SDXL images using torch2.4.1, 2.5.1 and 2.6.0.
Radeon RX6000
windows 10
HIP SDK 6.2.4
ZLUDA v3.8.7

However, some extension which use Triton does not work. bitsandbytes either.
Because triton-windows-amdgpu and bitsandbytes-ROCm-Windows require RX7000 and above. RX6000 can not use them.

RX6000 and below,
1st, remove venv.
2nd, add these two lines on webui-user.bat
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/webui-user.bat#L6

set COMMANDLINE_ARGS=--use-zluda
set DISABLE_ADDMM_CUDA_LT=1

3rd, modify modules\zluda_installer.py,
change here from 2.3.1 to 2.4.1, or manually install torch2.4.1-cu118 (not cu124) by yourself.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/zluda_installer.py#L118
if agent.arch in (rocm.MicroArchitecture.RDNA, rocm.MicroArchitecture.CDNA,):
return "2.4.1" if hipBLASLt_enabled else "2.3.1"

In addition, OLD version may need additional one or two environment variables on webui-user.bat.
set TORCH_BLAS_PREFER_HIPBLASLT=0
set ZLUDA_NVRTC_LIB=" ...A1111 path here... .zluda\nvrtc.dll"

Latest A1111-ZLUDA, Forge-ZLUDA and SD.Next does not need them because these are configured automatically,

TORCH_BLAS_PREFER_HIPBLASLT is configured by rocm.py.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/rocm.py#L199
ZLUDA_NVRTC_LIB is configured by zluda_installer.py.
https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/blob/e3a909943e8cbbabe198a36fa2c80722408751cc/modules/zluda_installer.py#L87

Other application which use torch2.4.1-cu118 may need these three environment variables.

@lshqqytiger
Copy link
Owner

Thank you for sharing your discovery. I'll update zluda installer to set DISABLE_ADDMM_CUDA_LT=1 if hipBLASLt is unavailable.
The environment variable TORCH_BLAS_PREFER_HIPBLASLT will not be used by torch because we use torch compiled for CUDA. So I think it is not needed.
The environment variable ZLUDA_NVRTC_LIB should target the original NVIDIA Runtime Compiler library, not ZLUDA Runtime Compiler. Therefore, it should be venv/Lib/site-packages/torch/lib/nvrtc64_112_0.dll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants