-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Found no NVIDIA driver on your system #92
Comments
@TobTobXX Use the NixOS module in the Flake and report back. Consider donating via GitHub sponsors if you want documentation, it's one of the goals. |
Nah still doesn't work: { pkgs, ... }:
{
imports = [
(builtins.getFlake "github:nixified-ai/flake").nixosModules.invokeai-nvidia
];
nixpkgs.config = {
allowUnfree = true;
cudaSupport = true;
};
nix.settings.trusted-substituters = ["https://ai.cachix.org"];
nix.settings.trusted-public-keys = ["ai.cachix.org-1:N9dzRK+alWwoKXQlnn0H6aUx0lU/mspIoz8hMvGvbbc="];
services.invokeai = {
enable = true;
settings = {
host = "[::]";
port = 9090;
};
};
} I'll try to investigate further, but if you have any pointers, I'd be glad. While I would like to contribute, I'm not in a situation to do so financially. However, I could very well work on expanding the documentation for you. |
@TobTobXX If you're doing a lot of |
Ok, so I did some more tests and I think the problem is most likely the mismatch between the driver's CUDA version and torch's CUDA version. Torch appears to be compiled with CUDA 11.8, as you hinted:
However my driver has the CUDA version 12.3, as seen above. I tried downgrading the driver to the version 470 (and rebooting, of course), but then I have the CUDA version 11.4, which yields the same error. (Which driver version do you use?) Is there a way to upgrade torch instead? |
Apparently you really can't run pytorch with mismatching CUDA versions, even if the driver's one is higher: https://stackoverflow.com/a/76726156 |
That's really great to have found out, thank you for the research. Perhaps we can set this up in the nixosModule, to:
Providing a GPU pass through VM script or module is also possible, but then you have to run a VM. |
(I'm new to nix, so correct me on anything I get wrong) Option A (changing the CUDA driver):
Option B (changing torch):
Aside from the build time, Option B appears to be the better option? InvokeAI runs with pytorch==2.0.1 (see log above). Is that specified anywhere? I tried searching this repo and the InvokeAI repo, but didn't find any version information. The latest version would be 2.2.2. pytorch 2.0.1 only has compatibility with CUDA 11.7 and 11.8 ref |
@TobTobXX A third option is to fix the backwards compatibility in PyTorch if you have the C++/Python skills to do so. https://docs.nvidia.com/deploy/cuda-compatibility/index.html Yes, the torch version is specified in Nixpkgs.
|
Ooff, you mean backporting pytorch? No, I don't think I'm able to do that. However, there is yet another option... waiting. NixOS 24.05 isn't too far off and in this channel pytorch should be 2.2.1 (currently in unstable). The 23.11 channel is weird anyway, because the NVIDIA driver and pytorch are essentially incompatible. I think I'll drop a question about that to the cuda maintainers. |
By the way, which driver do you use? Why doesn't this occur for your GPU? |
Pytorch doesn't (directly) link to the driver, instead it uses the impure runpath (
Must mean Start by verifying if |
Thanks a lot for dropping in!
... sigh... I'm terribly sorry for wasting all of your time. Thank you a lot. |
I'm trying to run this on a Linux server with a RTX3060 12G.
The server runs on NixOS and has the NVIDIA driver configured:
And it seems to work:
However, when I run InvokeAI, it always chooses CPU. And if I explicitely configure
cuda
orcuda:1
(what's the difference?) I get this error:What should I do?
The text was updated successfully, but these errors were encountered: