How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

qu0laz · 2022-08-23T11:01:34Z

Hello,

I just started working with Weaviate and was able to successfully run docker compose for the CPU version, however could not get the GPU version to run; link to introduction article.

When attempting to run it I get errors across the transformers, for example:

gpu-ner-transformers-1  | /usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning: 
gpu-ner-transformers-1  | NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
gpu-ner-transformers-1  | The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
gpu-ner-transformers-1  | If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
gpu-ner-transformers-1  | 
gpu-ner-transformers-1  |   warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

This is followed by blocks of
POST /vectors/ HTTP/1.1" 500 Internal Server Error and at the very end
gpu-newspublications-1 | {'error': [{'message': 'fail with status 500: CUDA error: no kernel image is available for execution on the device'}]}

I am able to successfully run other gpu docker containers such as docker run -it --rm --gpus all ubuntu nvidia-smi without issue. The important part of the output is NVIDIA-SMI 510.85.02 | Driver Version: 510.85.02 | CUDA Version: 11.6 This is running on a fresh install of ubuntu 20.04 which comes bundled with pytorch 3.8; this tells me the error lies with the pytorch in the image. It is all running on bare metal with no virtualization outside of the docker containers in the article.

From what I can tell the issue is with the pytorch want's up to sm_70 however based on this article that means that it is restricted to older GPU's. In a cloud instance where you can select the GPU and use older ones like a P4 this makes sense. However in a self hosted environment this is more difficult.

Have you encountered any issues with newer GPU's and the version of pytorch that is built into the images in the tutorial? Is there a known workaround for this?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

trengrj · 2022-10-26T01:03:49Z

Transferring to transformers module repository

trengrj · 2022-10-26T01:06:52Z

Can confirm this issue. Using a GTX 3090 I could fix inside the container by reinstalling torch with --extra-index-url https://download.pytorch.org/whl/cu116

[email protected]:/app$ ENABLE_CUDA=1 NVIDIA_VISIBLE_DEVICES=all uvicorn app:app --host 0.0.0.0 --port 8080
INFO:     Started server process [382]
INFO:     Waiting for application startup.
INFO:     CUDA_CORE set to cuda:0
/usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [382]
[email protected]:/app$ pip3 install --upgrade torch==1.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116
Requirement already satisfied: torch==1.12.0 in /usr/local/lib/python3.9/site-packages (1.12.0)
Collecting torch==1.12.0
  Downloading https://download.pytorch.org/whl/cu116/torch-1.12.0%2Bcu116-cp39-cp39-linux_x86_64.whl (1904.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 GB 1.1 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/site-packages (from torch==1.12.0) (4.4.0)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.12.0
    Uninstalling torch-1.12.0:
      Successfully uninstalled torch-1.12.0
Successfully installed torch-1.12.0+cu116
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 22.3 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
[email protected]:/app$ ENABLE_CUDA=1 NVIDIA_VISIBLE_DEVICES=all uvicorn app:app --host 0.0.0.0 --port 8080
INFO:     Started server process [413]
INFO:     Waiting for application startup.
INFO:     CUDA_CORE set to cuda:0
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

clotodex · 2023-02-02T09:50:07Z

Is there any progress on this, or an easy way to fix it without forking the repo and setting individual pytorch and cuda versions? I am running into the same issue with my RTX card.

frostronic · 2024-09-12T19:13:27Z

I'm also having this issue in 2024. Except I'm experiencing this issue with an RTX5000 series card. NVIDIA drivers are setup correctly and the hardware is seen and being used by the Ollama container. I see the topic is still open, is there a recommended solution/work-around?

t2v-transformers | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Matthieu-Tinycoaching mentioned this issue Oct 25, 2022

Error: CUDA error: no kernel image is available for execution on the device with Weaviate python client only weaviate/semantic-search-through-wikipedia-with-weaviate#9

Open

trengrj transferred this issue from weaviate/weaviate Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

qu0laz commented Aug 23, 2022

trengrj commented Oct 26, 2022

trengrj commented Oct 26, 2022

clotodex commented Feb 2, 2023

frostronic commented Sep 12, 2024 •

edited

Loading

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

How to update pytorch so it works with Ampere / 3XXX series GPU's? #35

Comments

qu0laz commented Aug 23, 2022

trengrj commented Oct 26, 2022

trengrj commented Oct 26, 2022

clotodex commented Feb 2, 2023

frostronic commented Sep 12, 2024 • edited Loading

frostronic commented Sep 12, 2024 •

edited

Loading