Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

Open
qu0laz opened this issue Aug 23, 2022 · 4 comments
Open

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

qu0laz opened this issue Aug 23, 2022 · 4 comments

Comments

@qu0laz
Copy link

qu0laz commented Aug 23, 2022

Hello,

I just started working with Weaviate and was able to successfully run docker compose for the CPU version, however could not get the GPU version to run; link to introduction article.

When attempting to run it I get errors across the transformers, for example:

gpu-ner-transformers-1  | /usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning: 
gpu-ner-transformers-1  | NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
gpu-ner-transformers-1  | The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
gpu-ner-transformers-1  | If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
gpu-ner-transformers-1  | 
gpu-ner-transformers-1  |   warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

This is followed by blocks of
POST /vectors/ HTTP/1.1" 500 Internal Server Error and at the very end
gpu-newspublications-1 | {'error': [{'message': 'fail with status 500: CUDA error: no kernel image is available for execution on the device'}]}

I am able to successfully run other gpu docker containers such as docker run -it --rm --gpus all ubuntu nvidia-smi without issue. The important part of the output is NVIDIA-SMI 510.85.02 | Driver Version: 510.85.02 | CUDA Version: 11.6 This is running on a fresh install of ubuntu 20.04 which comes bundled with pytorch 3.8; this tells me the error lies with the pytorch in the image. It is all running on bare metal with no virtualization outside of the docker containers in the article.

From what I can tell the issue is with the pytorch want's up to sm_70 however based on this article that means that it is restricted to older GPU's. In a cloud instance where you can select the GPU and use older ones like a P4 this makes sense. However in a self hosted environment this is more difficult.

Have you encountered any issues with newer GPU's and the version of pytorch that is built into the images in the tutorial? Is there a known workaround for this?

Thanks in advance!

@trengrj
Copy link
Member

trengrj commented Oct 26, 2022

Transferring to transformers module repository

@trengrj trengrj transferred this issue from weaviate/weaviate Oct 26, 2022
@trengrj
Copy link
Member

trengrj commented Oct 26, 2022

Can confirm this issue. Using a GTX 3090 I could fix inside the container by reinstalling torch with --extra-index-url https://download.pytorch.org/whl/cu116

[email protected]:/app$ ENABLE_CUDA=1 NVIDIA_VISIBLE_DEVICES=all uvicorn app:app --host 0.0.0.0 --port 8080
INFO:     Started server process [382]
INFO:     Waiting for application startup.
INFO:     CUDA_CORE set to cuda:0
/usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [382]
[email protected]:/app$ pip3 install --upgrade torch==1.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116
Requirement already satisfied: torch==1.12.0 in /usr/local/lib/python3.9/site-packages (1.12.0)
Collecting torch==1.12.0
  Downloading https://download.pytorch.org/whl/cu116/torch-1.12.0%2Bcu116-cp39-cp39-linux_x86_64.whl (1904.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 GB 1.1 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/site-packages (from torch==1.12.0) (4.4.0)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.12.0
    Uninstalling torch-1.12.0:
      Successfully uninstalled torch-1.12.0
Successfully installed torch-1.12.0+cu116
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 22.3 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
[email protected]:/app$ ENABLE_CUDA=1 NVIDIA_VISIBLE_DEVICES=all uvicorn app:app --host 0.0.0.0 --port 8080
INFO:     Started server process [413]
INFO:     Waiting for application startup.
INFO:     CUDA_CORE set to cuda:0
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

@clotodex
Copy link

clotodex commented Feb 2, 2023

Is there any progress on this, or an easy way to fix it without forking the repo and setting individual pytorch and cuda versions? I am running into the same issue with my RTX card.

@frostronic
Copy link

frostronic commented Sep 12, 2024

I'm also having this issue in 2024. Except I'm experiencing this issue with an RTX5000 series card. NVIDIA drivers are setup correctly and the hardware is seen and being used by the Ollama container. I see the topic is still open, is there a recommended solution/work-around?

t2v-transformers | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants