Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RX 470/480/570/580/590 support #13

Open
414owen opened this issue Mar 31, 2023 · 24 comments
Open

RX 470/480/570/580/590 support #13

414owen opened this issue Mar 31, 2023 · 24 comments

Comments

@414owen
Copy link

414owen commented Mar 31, 2023

👋 Hi. I'm a nixOS user with a radeon RX 590.

I get the error: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!".

This GPU isn't exactly top-of-the-line, but people have managed to run stable diffusion et al on it.

The process, documented here, and a bit here, seems to involve building ROCm with the ROC_ENABLE_PRE_VEGA flag.

Then again, according to this issue other OSes have patched the ROCm packages, so maybe this is an issue for nixpkgs.

Any tips/insights welcome. Does anyone else have this issue?

@414owen
Copy link
Author

414owen commented Mar 31, 2023

Come to think if it, maybe adding the flag to the rocm-runtime will help...

@MatthewCroughan
Copy link
Member

@414owen for difficult GPUs, there is an imperative step required prior to nix run documented here #2 (comment)

However, #7 bypasses this entirely on NixOS. It would be good if you could test #7 to see if it solves your problem. I don't have an AMD GPU, and haven't had the time to research a fundamental fix for this situation by modifying/patching the ROCM libraries, it's certainly possible though.

@414owen
Copy link
Author

414owen commented Apr 1, 2023

Thanks @MatthewCroughan, I'll be able to test this out properly tomorrow afternoon :)

@414owen
Copy link
Author

414owen commented Apr 1, 2023

It looks like NixOS already has the ROC_ENABLE_PRE_VEGA patch.

After following #2 (comment), I'm getting segfaults after loading a model with koboldai-amd, and invokeai-amd doesn't seem to load.

Tail of nix run github:nixified-ai/flake#invokeai-amd -- --web

>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using faster float16 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
Fetching 15 files: 100%|█████████████| 15/15 [00:00<00:00, 78939.22it/s]

I'll see if I can dive a bit deeper tomorrow.

@414owen
Copy link
Author

414owen commented Apr 2, 2023

According to this:

Some other software like InvokeAI may need a venv created with the --system-site-packages switch and removing the torch and torchvision packages from the local venv so that the ones provided by the Arch in system will be used.

The only references to rocm I can see are in modules/aipython3/overlays. I'm guessing those are packages that actually redistribute a rocm runtime, that needs to be patched...

I suppose there's a reason we're using those prebuilt packages instead of nixpkgs' torch/torchvision? The arch people seem to think it works using the system torch[vision], that probably uses the system rocm.

@414owen
Copy link
Author

414owen commented Apr 2, 2023

Okay, few updates:

Setting:

    torch = pkgs.python310Packages.torch-bin;
    torchvision = pkgs.python310Packages.torchvision-bin;

in the torchRocm override in modules/aipython3/overlays.nix is enough to get invokeai working. It's CPU-only though. Still better than nothing. I generated a few images on my ryzen 3900x. Takes about 6.3seconds per iteration for a 512x512px image with the default settings. It didn't seem to be using all my cores, so maybe that can be improved.

Maybe it would be worth adding an invokeai-cpu derivation, because honestly it was fun to play around with, even with the slow speed.


Back to AMD:

The torch in nixpkgs has optional support for rocm, which has the ROC_ENABLE_PRE_VEGA patches. This option isn't in nixos-22.11 yet...

I've checked out nixpkgs master and am currently waiting for nix-build -E '(import ./. {}).python310Packages.torch.override { rocmSupport = true; }' to build. Looks like it's going to take quite a while. After that I suspect I'll be able to run:

$ python
>>> import torch
>>> torch.cuda.is_available()
True

Because apparently torch/rocm reuses the cuda interfaces

After that, I have to figure out why torch-bin works in the torchRocm override, but torch doesn't. With the latter, I get:

>   torch 1.13.1 (/nix/store/8dzx46rnxs1chvcm7qahd75p50wkawfv-python3.10-torch-1.13.1/lib/python3.10/site-packages)
>   torch 1.13.1+cu117 (/nix/store/k7f999ns4h0v0zb3yjnpka3935pydw2w-python3.10-torch-1.13.1/lib/python3.10/site-packages)

And I'm not entirely sure where the other torch is coming from... Maybe it's because torchvision requires torch-bin? Overriding torchvision with the non-bin version has its own problems...

@414owen
Copy link
Author

414owen commented Apr 3, 2023

I'm struggling to use torch and torchvision instead of torch-bin, and torchvision-bin.

After switching, I get:

> ERROR: Could not find a version that satisfies the requirement torchvision>=0.14.1 (from invokeai) (from versions: none)
> ERROR: No matching distribution found for torchvision>=0.14.1

Does anyone know why?

If I do nix-shell -p python3Packages.torchvision, I can import torchvision just fine...

@MatthewCroughan
Copy link
Member

MatthewCroughan commented Apr 3, 2023

@414owen Nix implements isolation by patching all the references in programs to explicit /nix/store paths. The programs you're building from this repo are not aware of the existence of the python that you just obtained by using nix-shell, you need to modify the Nix expressions in this repository to effect that. You cannot just nix-shell -p anything and expect InvokeAI or any python program built with Nix and expect that program to see it or be aware of its existence. This helps with reproducibility, since this means you're still getting the same results as anyone else who runs InvokeAI, regardless of your environment.

@MatthewCroughan
Copy link
Member

In the torchRocm override in modules/aipython3/overlays.nix is enough to get invokeai working. It's CPU-only though. Still better than nothing.

I was thinking of providing an invokeai-cpu-only which would not include any of the programs compiled with cuda support, which would make the closure size much smaller to download, and allows us to perform automated testing in CI. Right now, if you want to use only CPU, you can use the nvidia package, but as you ran into the AMD libraries cause crashes when they are involved, and getting rid of ROCM fixes that.

It's really a big problem upstream, and I'm aware of all these edge cases, I appreciate you looking into it, now you know about the dirty inner workings just like we do.

@414owen
Copy link
Author

414owen commented Apr 3, 2023

Oh no I wasn't expecting the nix-shell command to make the nixified-ai command work. That was just to demonstrate that the torchvision in nixpkgs does in fact work/exist as a python module.

In my working branch, I've removed the torch and torchvision overrides in torchRocm, so that the non--bin versions are used: https://github.com/414owen/flake/blob/838a6de3821149824093739e625bbe7ace5a8bb7/modules/aipython3/overlays.nix#L112-L114

That's what I want to get working, because it's a small step from there to enabling rocmSupport.

@MatthewCroughan
Copy link
Member

demonstrate that the torchvision in nixpkgs does in fact work.

I'm pretty sure I've tried that, and it segfaults. My expectation was that these issues would eventually be fixed upstream and that we could use a later Nixpkgs and everything would just work.

@MatthewCroughan
Copy link
Member

If you've tested some difference in torchvision and it works correctly, please tell me what revision of nixpkgs you're claiming works, because I can potentially test it then. nixified.ai is currently based on

❯ nix repl
Welcome to Nix 2.14.1. Type :? for help.

nix-repl> :lf .
Added 22 variables.

nix-repl> inputs.nixpkgs.rev
"3c5319ad3aa51551182ac82ea17ab1c6b0f0df89"

@MatthewCroughan
Copy link
Member

As mentioned, I don't own an AMD GPU to solve this issue on, have you got a Discord/Matrix you can reach me on so I can try solutions with you?

@414owen
Copy link
Author

414owen commented Apr 3, 2023

I'm happy to have a call in a bit (not free for the next hour or so), but the place I'm stuck at isn't really amd-specific.

If you can figure out why this diff breaks with dependency errors, then I think getting these amd cards working will be a tractable problem. I've made the change to the nvidia version for you :)

I'll send you an email about a possible call, too.

@max-privatevoid
Copy link
Member

I've done some experimenting and I've found in which file PyTorch hardcodes the /opt/amdgpu path. Simply changing the path to one of the same length doesn't appear to work. From what I can see, there are no accidental changes being made to the binary by the substitution process. As I'm not able to spend much time on figuring this out right now, I've pushed the branch with my non-working changes in case someone else wishes to take a look in the meantime: https://github.com/nixified-ai/flake/tree/rocm-libdrm-patch-torch-bin.

@MatthewCroughan
Copy link
Member

@deftdawg The point of this repository is to avoid using Podman/Docker or container technologies or running unreproducible steps like the ones you just posted. When something is fixed in this repository, it is fixed for everyone at the same time, and when something is broken, it is broken for everyone. In your example step 5 is not going to work for everyone depending on the time of day that they run it at, and step 4 is going to produce different results every single time you perform the task. Step 2 depends on the host kernel version. And suggesting people pull a 38GB Ubuntu container is the antithesis of what this repository and project is about.

Please don't give unreproducible instructions to people.

@deftdawg
Copy link

@MatthewCroughan deleted per your wishes.

All the best in your efforts to fix this for AMD, as for me, I need something that works today, not something that may work in a perfectly reproducible way at some point in the future. To each his own.

@MatthewCroughan
Copy link
Member

@deftdawg As I said, when it works, it will continue to work forever and will not stop working for spooky reasons, that's what Nix does. If you want something that works today, please contribute Nix code. I don't own an AMD GPU, and per reports Nixified.Ai does work on the majority of AMD GPUs, just not your specific one at the moment. Let this fire put itself out, and either wait or figure out the cause. I'm pretty sure this is going to magically fix itself when we upgrade Nixpkgs at some point.

@Eisfunke
Copy link

>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using faster float16 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
Fetching 15 files: 100%|█████████████| 15/15 [00:00<00:00, 78939.22it/s]

I'll see if I can dive a bit deeper tomorrow.

I have the same issue with a segfault / core dump at the same point without any other error messages. Although for some reason, the first time running it after a reboot, there's this line as well bfore the core dump: Memory access fault by GPU node-1 (Agent handle: 0xa747a20) on address (nil). Reason: Page not present or supervisor privilege. I'm on a Vega 56, btw.

torch = pkgs.python310Packages.torch-bin;
torchvision = pkgs.python310Packages.torchvision-bin;

Indeed "fixes" that, but as @414owen said, it's CPU-only. On my Ryzen 3600X my first test image took 643 seconds, so that's not really viable for me though :D

I'm pretty sure I've tried that, and it segfaults. My expectation was that these issues would eventually be fixed upstream and that we could use a later Nixpkgs and everything would just work.

There seems to be new versions of e.g. ROCm since the versions in the flake. I tried to update the inputs myself, but that seems to generate some problems with finding the right nixpkgs revision with the right versions of pytorch... Don't have time right now to get deeper into it, but might try again some time.

@yboettcher
Copy link

I feel like I'm soo close (but I'm probably still soo far away...). I got it to compile with nixpkgs version of Rocm+torch (torch took forever though), and when I enter a develop shell, and ask torch.cuda.is_available(), it answers true (for my RX480), but when I actually try to generate something, I get

>> ESRGAN Parameters: False
>> Facetool Parameters: False
Generating:   0%|         | 0/1 [00:00<?, ?it/s]
Global seed set to 1437180482

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)                       | 0/50 [00:00<?, ?it/s]
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] hip runtime failed to load.
Error: Please provide architecture for which code is to be generated.

I found this issue, which also talks about naive_conv.cpp, but they actually get a useful compilation error message: ROCm/ROCm#1889.
I've put my changes in a fork here: https://github.com/yboettcher/nixified-ai-flake (I mostly just updated everything to get newer rocm packages from nixpkgs (especially the rocm overrides for torch), and then used the torch rocm overrides to override the global torch. And then fixed some problems that occurred along the way).

@bendlas
Copy link

bendlas commented Jul 23, 2023

@yboettcher nice work, thanks! I'll definitely have to play around with this on my 580

In the meantime: Seems like the compiler needs a flag set for older gpus: https://rocm.docs.amd.com/projects/HIP/en/latest/user_guide/hip_rtc.html#hiprtc-specific-options

Do you think that flag could be added via TORCH_NVCC_FLAGS?

@yboettcher
Copy link

Just grepping through the pytorch sources (for nvrtcCompile), it does not look like the options passed to the runtime compiler can be changed. Then again, I'm not familiar with how torch handles all of this.
Additionally, wouldn't we need to set something like nvrtc_flags instead of nvcc?

@yboettcher
Copy link

I made a small test program to see how to get hiprtc to compile stuff for my rx480 and it indeed worked with "--gpu-architecture=gfx803", as you suggested. Since I see no way of injecting this argument in an already built version of pytorch or by using CMake options, I now made a patch that just adds that argument every time nvrtcCompileProgram is called.
I am currently compiling again (also pushed it into the fork), but the last time, this took hours. Wish me luck!

@yboettcher
Copy link

yboettcher commented Jul 29, 2023

I'm still getting the same error, it's just a bit more info than before (so there is some effect)

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] hip runtime failed to load.
Error: Please provide architecture for which code is to be generated.
terminate called after throwing an instance of 'miopen::Exception'
  what():  /build/source/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp

I'll try to dig a bit more later.

Edit:
This one appears to be caused by MIOpen somehow. Running MIOPEN_FIND_MODE=1 MIOpenDriver conv gives me this exact error, with the references to miopen::Exception and hipoc/hipoc_program. On the bright side, this might mean that the original error is "fixed" and we're just on to the next one.

Edit2:
I changed MIOpen to build for gfx803 too and now get a different MIOpen issue:

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-8a85fe/input/naive_conv.cpp:39:10: fatal error: 'limits' file not found
#include <limits> // std::numeric_limits
         ^~~~~~~~
1 error generated when compiling for gfx803.
terminate called after throwing an instance of 'miopen::Exception'
  what():  /build/source/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp

Looks like I did manage to convince it to build for gfx803, but it fails due to another reason. In fact this appears to be an issue with how MIOpen/Rocm is packaged (same error as in the issue I linked in #13 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants