Cannot load model on the GPU with llama-cpp-python (Windows) #130

silicode17 · 2024-03-09T17:17:46Z

This is how I am loading the model using Python, but it uses only the CPU:

Llama(model_path="./functionary-7b-v2.q4_0.gguf", n_ctx=4096, n_gpu_layers=50)

I have also tried to re-install llama-cpp-python using the instructions below but that didn't help:

set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install --upgrade --verbose --force-reinstall llama-cpp-python --no-cache-dir

My GPU has only 8GB of VRAM, could that be the reason? I saw in the readme that this model requires 24GB of VRAM...
However, other models such as Mistral are loading on my GPU just fine. So I am assuming that my Cuda installation is correct.

The text was updated successfully, but these errors were encountered:

jeffrey-fong · 2024-03-15T03:51:22Z

Hi, we have recently integrated our models into llama-cpp-python directly. Here's how you can use it. Can you try it and see if it works now?

I tested it on my end using the following code and the model loads using 4.835GB GPU VRAM.

llm = Llama.from_pretrained(
  repo_id="meetkai/functionary-7b-v2-GGUF",
  filename="functionary-7b-v2.q4_0.gguf",
  chat_format="functionary-v2",
  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-7b-v2-GGUF"),
  n_ctx=4096,
  n_gpu_layers=-1,
)

silicode17 · 2024-03-21T21:37:33Z

Hi, we have recently integrated our models into llama-cpp-python directly. Here's how you can use it. Can you try it and see if it works now?

I tested it on my end using the following code and the model loads using 4.835GB GPU VRAM.
llm = Llama.from_pretrained(
  repo_id="meetkai/functionary-7b-v2-GGUF",
  filename="functionary-7b-v2.q4_0.gguf",
  chat_format="functionary-v2",
  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-7b-v2-GGUF"),
  n_ctx=4096,
  n_gpu_layers=-1,
)

Yes it works.
Quick question: is there a way to load a local GGUF file instead of downloading it from the hub?

jeffreymeetkai · 2024-04-12T08:37:04Z

Sorry for being so late but yes, you can load a local GGUF file by just initializing the Llama class directly. Here's a guide showing how.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load model on the GPU with llama-cpp-python (Windows) #130

Cannot load model on the GPU with llama-cpp-python (Windows) #130

silicode17 commented Mar 9, 2024

jeffrey-fong commented Mar 15, 2024

silicode17 commented Mar 21, 2024

jeffreymeetkai commented Apr 12, 2024

Cannot load model on the GPU with llama-cpp-python (Windows) #130

Cannot load model on the GPU with llama-cpp-python (Windows) #130

Comments

silicode17 commented Mar 9, 2024

jeffrey-fong commented Mar 15, 2024

silicode17 commented Mar 21, 2024

jeffreymeetkai commented Apr 12, 2024