Skip to content

Commit 03d1584

Browse files
authored
Merge branch 'main' into koppor-patch-2
2 parents 98e2f71 + 52f3309 commit 03d1584

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

en/ai/local-llm.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
# Running a local LLM model
1+
# Running a local Large Language Model (LLM)
22

33
Notice:
44

5-
1. This tutorial is intended for expert users
6-
2. (Local) LLMs require a lot of computational power
7-
3. Smaller models (in terms of parameter size) typically respond qualitatively worse than bigger ones, but they are faster, need less memory and might already be sufficient for your use case.
5+
1. LLMs require a lot of computational power and therefore lots of electricity.
6+
2. Smaller models typically respond qualitatively worse than bigger ones, but they are faster, need less memory and might already be sufficient for your use case.
7+
3. The size of a model can be measured in number of parameters in its neural network. The "b" in the model name typically stands for **b**illion parameters. It also can be measured in terms of gigabytes required to load the model into your devices RAM/VRAM.
8+
4. The model should always completely fit into VRAM (fast), otherwise layers will be offloaded to RAM (slower) and if it doesn't fit in there either, it will use SSD (abysmally slow).
9+
5. Hardware recommendation for maximize prompt processing and token generation speed: A device with high *bandwidth*. A modern GPU with lots of VRAM will satisfy this requirement best.
10+
811

912
## High-level explanation
1013

@@ -22,7 +25,7 @@ Voilà! You can use a local LLM right away in JabRef.
2225
The following steps guide you on how to use `ollama` to download and run local LLMs.
2326

2427
1. Install `ollama` from [their website](https://ollama.com/download)
25-
2. Select a model that you want to run. The `ollama` provides [a large list of models](https://ollama.com/library) to choose from (we recommend trying [`gemma2:2b`](https://ollama.com/library/gemma2:2b), or [`mistral:7b`](https://ollama.com/library/mistral), or [`tinyllama`](https://ollama.com/library/tinyllama))
28+
2. Select a model that you want to run. The `ollama` provides [a large list of models](https://ollama.com/library) to choose from. Some popular models are for instance [qwen3:30b-a3b](https://ollama.com/library/qwen3), [`granite3.1-moe:3b`](https://ollama.com/library/granite3.1-moe), [`devkit/L1-Qwen-1.5B-Max`](https://ollama.com/devkit/L1-Qwen-1.5B-Max), [`mistral:7b`](https://ollama.com/library/mistral) or [`mistral-small3.1:24b`](https://ollama.com/library/mistral-small3.1).
2629
3. When you have selected your model, type `ollama pull <MODEL>:<PARAMETERS>` in your terminal. `<MODEL>` refers to the model name like `gemma2` or `mistral`, and `<PARAMETERS>` refers to parameters count like `2b` or `9b`.
2730
4. `ollama` will download the model for you
2831
5. After that, you can run ollama serve to start a local web server. This server will accept requests and respond with LLM output. Note: The ollama server may already be running, so do not be alarmed by a cannot bind error. If it is not yet running, use the following command: `ollama run <MODEL>:<PARAMETERS>`

0 commit comments

Comments
 (0)