SBC hardware and setup

Running DeepSeek-R1 reasoning model using Ollama on an Ubuntu single board computer.

Tip

Note: Mac users can install the Ollama download described here.

No user login nor registration is needed for the following steps. The distilled DeepSeek-R1 model runs locally on Ubuntu OS without internet connection after installation.

SBC hardware and setup
Install Ollama
Run the model.
- A simple problem
- Prompt history
Examples
References
Next steps

SBC hardware and setup

I tested several single board computers rather like Raspberry Pi.

Hardware

Board	Retail	CPU	RAM	Disk	Website
OrangePi 5 Plus	~150USD	RK3588	16GB	1TB	link
OrangePi 5	~100USD	RK3588S	8GB	250GB	link
OrangePi 3B	~50USD	RK3566	4GB	microSD	link

Retail estimates do not include cost of disk or microSD, sales tax and shipping.

Software

Ubuntu 22.04.5 LTS download
Python 3.10.12

Install Ollama

This step installs Ollama using a script command from link. This was my first time using Ollama to run an ML model. Open Ubuntu Terminal and run this command.

cd
curl -fsSL https://ollama.com/install.sh | sh

Result.

orangepi@orangepi-plus:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] password for orangepi:
>>> Downloading Linux arm64 bundle
#############################################################             85.2%
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.
orangepi@orangepi-plus:~$

The install has determined there is no supported GPU so the model will be run on CPU only.

Run the model.

The smallest distilled version of DeepSeek R1 model can now be run on this single board computer. This step downloads the model (1.1GiB) first.

ollama run deepseek-r1:1.5b

Result is a command prompt.

orangepi@orangepi-plus:~$ ollama run deepseek-r1:1.5b
pulling manifest
pulling aabd4debf0c8... 100% ▕███████████████████████████████████████████████████████████████████████████████▏ 1.1 GB
pulling 369ca498f347... 100% ▕███████████████████████████████████████████████████████████████████████████████▏  387 B
pulling 6e4c38e1172f... 100% ▕███████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling f4d24e9138dd... 100% ▕███████████████████████████████████████████████████████████████████████████████▏  148 B
pulling a85fe2a2e58e... 100% ▕███████████████████████████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success
>>> Send a message (/? for help)

We can see model information and licence by typing /show info and /show license.

It is shared using the permissive open-source software MIT license.
This distilled model is based on Alibaba Cloud's Qwen team's model architecture and trained weights.
It uses 4-bit quantization same as the LLM that we used on a project called AI in Box at Useful Sensors. At Google I worked on quantizing ML models with 4-bits of weight precision with great results.

>>> /show info
  Model
    architecture        qwen2
    parameters          1.8B
    context length      131072
    embedding length    1536
    quantization        Q4_K_M

  Parameters
    stop    "<｜begin▁of▁sentence｜>"
    stop    "<｜end▁of▁sentence｜>"
    stop    "<｜User｜>"
    stop    "<｜Assistant｜>"

  License
    MIT License
    Copyright (c) 2023 DeepSeek

>>> /show license
MIT License

Copyright (c) 2023 DeepSeek

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

>>> Send a message (/? for help)

A simple problem

You can now write a natural language prompt and watch the model reason and generate an answer. For example here is a simple mathematics problem to sum 3 and 2.

>>> What is the sum of 3 + 2 ?
<think>
Okay, let's tackle this problem step by step. I'm trying to find the sum of 3 plus 2.

First, I need to understand what addition means. Addition is like combining two numbers to get a total. So, when I see "3 + 2,"
it means I have three items and I add two more items to them.

Let me visualize this: imagine I have 3 apples. If I take another 2 apples, how many apples do I have in total?

Adding 1 apple to my 3 apples gives me 4 apples.
Adding another apple brings the total to 5 apples.

So, by adding 2 to 3, I end up with **5** apples.

I can also think of it on a number line. Starting at 3 and moving 2 steps forward lands me on 5.

To make sure I'm correct, I'll do the math again:
\[
3 + 2 = 5
\]
Yes, that's right.

Therefore, the sum of 3 and 2 is **5**.
</think>

To solve the addition problem \(3 + 2\), follow these steps:

1. **Identify the numbers being added:**
   - The first number is **3**.
   - The second number is **2**.

2. **Add the two numbers together:**
   \[
   3 + 2 = 5
   \]

3. **Conclusion:**
   - The sum of \(3\) and \(2\) is **5**.

\(\boxed{5}\)

>>> Send a message (/? for help)

The text between <think> and </think> shows the "chain-of-thought" or reasoning of the model as it examines the problem. It is not deterministic and the reasoning text will vary from run to run with the same prompt.

The final text after </think> is the model's answer to the prompt and produces the expected numerical result 5. The text in the answer will vary from run to run. In my testing the numerical result of 5 is consistent.

The model generated answer has several types of text formatting (Markdown, LaTeX) which are not rendered in my block above. The answer is duplicated here with an indent to demonstrate the Markdown formatting.

To solve the addition problem (3 + 2), follow these steps:

Identify the numbers being added:

The first number is 3.

The second number is 2.

Add the two numbers together: [ 3 + 2 = 5 ]

Conclusion:

The sum of (3) and (2) is 5.

The session retains information from earlier questions for context. So if you ask a follow-up question, such as repeat the sum but first add +1 to both numbers, the model will recall the original numbers from the previous question during reasoning before providing the correct answer 7.

Type ctrl + d to quit.

Prompt history

Ollama stores a text file containing prompt history in the folder .ollama. This prompt history is available using the cursor keys to scroll back through your earlier prompts when running a model.

On a new run of the command ollama run deepseek-r1:1.5b I did not observe the model has any context from any earlier runs.

Should you wish to delete this history then delete the file.

cd
rm -f .ollama/history

Examples

This section contains several different examples showing the usage of a local version of DeepSeek-R1 model.

Tip

Users looking for desktop and cloud applications to run models might take a look at AnythingLLM.

Temperature (experimental)

DeepSeek documentation recommends changing parameter temperature based on the use case. This documentation does not state if this guidance is specific to the V3 model or the R1 model.

DeepSeek R1 model documentation also mentions setting the temperature in these usage recommendations.

To experiment with this parameter Ollama offers customization of a model using Modelfile documentation. An example Modelfile_r1_1.5b is provided in this repo with a temperature parameter.

To run the customized model first create r1. The DeepSeek R-1 1.5B model needs to be pulled first if not already done as before.

ollama pull deepseek-r1:1.5b

ollama create r1 -f ./Modelfile_r1_1.5b

Then run the customized model r1.

ollama run r1

I'm not sure I see a difference in the answers compared with running the model ollama run deepseek-r1:1.5b. This Modelfile is provided for experimentation and comments are welcome!

Benchmarking

The speed of this model version on a computer can be quantified with counting the number of tokens generated per second. The ollama application provides flag --verbose to return timing values.

1.5B model

Command.

ollama run deepseek-r1:1.5b --verbose

Result for the question "What is the sum of 3 + 2?" run on OrangePi 5 Plus.

total duration:       24.601935746s
load duration:        92.272721ms
prompt eval count:    14 token(s)
prompt eval duration: 138ms
prompt eval rate:     101.45 tokens/s
eval count:           193 token(s)
eval duration:        24.369s
eval rate:            7.92 tokens/s

The returned eval rate value for this run was 7.9 tokens per second. For five runs I saw variation in range [7.77, 8.03] tokens per second and the number of eval count tokens varied in range [131, 193].

I used an earlier deprecated script in this repo to generate the following table data for a single prompt. The DeepSeek-R1 1.5B distilled model running on OrangePi 5 Plus generated 7.8 tokens per second. I also ran the same test on lower cost OrangePi 5 board (different CPU, less RAM) which ran about 10% slower.

Model	Board	CPU	Tokens per second	Other
1.5B	OrangePi 5 Plus	RK3588	7.8	16GB RAM
1.5B	OrangePi 5	RK3588S	7.0	8GB RAM
1.5B	OrangePi 3B	RK3566	2.4	4GB RAM

The rates for the first two rows are equivalent to approximately 4-6 words per second which is faster than human speech (roughly 2 words per second). The lowest cost board (OrangePi 3B) rate is slower than human speech.

7B model

Command.

ollama run deepseek-r1:7b --verbose

The eval rate value for this run on OrangePi 5 Plus was 2.6 tokens per second. The text update at this rate is too slow to my attention. I think a distilled R1 model size in between 1.5B and 7B (say 3B or 4B) could be a good trade-off for this CPU.

Chat script

This section describes a chat example with several stored prompts using Ollama's Python API. It is more convenient for testing than using the ollama run command. Runs on Terminal command line.

Installation

In this section python requirements are installed for running the example.

First check Ollama is installed as described above.

cd
curl -fsSL https://ollama.com/install.sh | sh

Make DeepSeek model version "1.5b" available.

ollama pull deepseek-r1:1.5b

Clone this repo from GitHub.

cd
git clone git@github.com:guynich/deepseek_opi5plus

Next create a Python virtual environment called venv_ollama and install packages.

sudo apt update
sudo apt install python3.10-venv

cd
python3 -m venv venv_ollama
source ./venv_ollama/bin/activate

python3 -m pip install -r deepseek_opi5plus/requirements.txt

Run

In this example a sequence of stored prompts are passed to the model. This method creates context history for the later answers.

cd
cd deepseek_opi5plus/chat
python3 main.py

The model generates reasoning and answers with context. The rate of the model is printed in tokens per second including the session average rate. See chat folder README for more information.

Web server

This example provides a local HTML page for user input with the DeepSeek-R1 1.5B model.

Run the web server.

cd
source ./venv_ollama/bin/activate

python3 deepseek_opi5plus/browser/server.py

Navigate to http://127.0.0.1:5000 in a browser for the chat session (or the provided IP address with --network option). Tested with Chromium browser on Ubuntu 22.04 on OrangePi 5, and in Safari browser on MacOS.

The browser web page is updated after the model has finished generating text. Context history is preserved during the session.

Other models

You can select a different Ollama supported model using --model option. First make the model available by pulling it then run the server.

ollama pull deepseek-r1:7b

python3 deepseek_opi5plus/browser/server.py --model "deepseek-r1:7b"

References

Ollama.
- ollama.com. A platform and framework for running and managing LLMs. Useful for running models without internet connectivity.
- DeepSeek-R1 model documentation ollama.com/library/deepseek. Includes model quantization.
- Ollama's Python API: github.com/ollama. I found example scripts chat-with-history.py and chat-steam.py helpful.
Training the R1 1.5B model.
- DeepSeek-R1 Coldstart youtube.com. Running a DeepSeek-R1 distilled model on Ollama is demonstrated in this video.
Ubuntu OS.
- RockChip single board computers: github.com/Joshua-Riek. I used Ubuntu Desktop OS 22.04 from this repo dated 2024-10-22.
GitHub Copilot.
- I used Copilot as a pair programmer to develop the browser script and template.

Name	Name	Last commit message	Last commit date
Latest commit guynich Add BSD 3-Clause licenses Mar 1, 2025 f7f190a · Mar 1, 2025 History 43 Commits
browser	browser	Refactor chat server	Feb 28, 2025
chat	chat	Update README images	Feb 28, 2025
images	images	Update README images	Feb 28, 2025
package_licences	package_licences	Add BSD 3-Clause licenses	Mar 1, 2025
LICENSE	LICENSE	Create LICENSE	Feb 24, 2025
Modelfile_r1_1.5b	Modelfile_r1_1.5b	Add Modelfile with temperature	Feb 24, 2025
README.md	README.md	Update README images	Feb 28, 2025
README_MAC.md	README_MAC.md	Update README images	Feb 28, 2025
requirements.txt	requirements.txt	Adds web server example	Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SBC hardware and setup

Install Ollama

Run the model.

A simple problem

Prompt history

Examples

Temperature (experimental)

Benchmarking

1.5B model

7B model

Chat script

Installation

Run

Web server

Other models

References

Next steps

About

Releases

Packages

Languages

License

guynich/deepseek_opi5plus

Folders and files

Latest commit

History

Repository files navigation

SBC hardware and setup

Install Ollama

Run the model.

A simple problem

Prompt history

Examples

Temperature (experimental)

Benchmarking

1.5B model

7B model

Chat script

Installation

Run

Web server

Other models

References

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages