@@ -7,7 +7,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
7
7
| Model architectures | Gemma <br /> Llama \* <br /> Mistral + <br />Phi <br /> | | |
8
8
| Platform | Linux <br /> Windows | | |
9
9
| Architecture | x86 <br /> x64 <br /> | Arm64 | |
10
- | Hardware Acceleration | CUDA<br />DirectML<br />IpexLLM< br />OpenVINO | QNN < br /> ROCm | |
10
+ | Hardware Acceleration | CUDA<br />DirectML<br />IpexLLM | QNN < br /> ROCm | OpenVINO |
11
11
12
12
\* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
13
13
@@ -21,6 +21,8 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
21
21
## Table Content
22
22
23
23
- [ Supported Models] ( #supported-models-quick-start )
24
+ - [ Onnxruntime Models] ( ./docs/model/onnxruntime_models.md )
25
+ - [ Ipex-LLM Models] ( ./docs/model/ipex_models.md )
24
26
- [ Getting Started] ( #getting-started )
25
27
- [ Installation From Source] ( #installation )
26
28
- [ Launch OpenAI API Compatible Server] ( #launch-openai-api-compatible-server )
@@ -31,10 +33,22 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
31
33
- [ Acknowledgements] ( #acknowledgements )
32
34
33
35
## Supported Models (Quick Start)
34
- * Onnxruntime DirectML Models [ Link] ( ./docs/model/onnxruntime_directml_models.md )
35
- * Onnxruntime CPU Models [ Link] ( ./docs/model/onnxruntime_cpu_models.md )
36
- * Ipex-LLM Models [ Link] ( ./docs/model/ipex_models.md )
37
- * OpenVINO-LLM Models [ Link] ( ./docs/model/openvino_models.md )
36
+
37
+ | Models | Parameters | Context Length | Link |
38
+ | --- | --- | --- | --- |
39
+ | Gemma-2b-Instruct v1 | 2B | 8192 | [ EmbeddedLLM/gemma-2b-it-onnx] ( https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx ) |
40
+ | Llama-2-7b-chat | 7B | 4096 | [ EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml ) |
41
+ | Llama-2-13b-chat | 13B | 4096 | [ EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml ) |
42
+ | Llama-3-8b-chat | 8B | 8192 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
43
+ | Mistral-7b-v0.3-instruct | 7B | 32768 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
44
+ | Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [ EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx] ( https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4 ) |
45
+ | Phi3-mini-4k-instruct | 3.8B | 4096 | [ microsoft/Phi-3-mini-4k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx ) |
46
+ | Phi3-mini-128k-instruct | 3.8B | 128k | [ microsoft/Phi-3-mini-128k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx ) |
47
+ | Phi3-medium-4k-instruct | 17B | 4096 | [ microsoft/Phi-3-medium-4k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml ) |
48
+ | Phi3-medium-128k-instruct | 17B | 128k | [ microsoft/Phi-3-medium-128k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml ) |
49
+ | Openchat-3.6-8b | 8B | 8192 | [ EmbeddedLLM/openchat-3.6-8b-20240522-onnx] ( https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx ) |
50
+ | Yi-1.5-6b-chat | 6B | 32k | [ EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx] ( https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx ) |
51
+ | Phi-3-vision-128k-instruct | | 128k | [ EmbeddedLLM/Phi-3-vision-128k-instruct-onnx] ( https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 ) |
38
52
39
53
## Getting Started
40
54
@@ -46,7 +60,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
46
60
47
61
1 . Custom Setup:
48
62
49
- - ** IPEX(XPU)** : Requires anaconda environment. ` conda create -n ellm python=3.11 libuv; conda activate ellm ` .
63
+ - ** IPEX(XPU)** : Requires anaconda environment. ` conda create -n ellm python=3.10 libuv; conda activate ellm ` .
50
64
- ** DirectML** : If you are using Conda Environment. Install additional dependencies: ` conda install conda-forge::vs2015_runtime ` .
51
65
52
66
2 . Install embeddedllm package. ` $env:ELLM_TARGET_DEVICE='directml'; pip install -e . ` . Note: currently support ` cpu ` , ` directml ` and ` cuda ` .
@@ -67,7 +81,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
67
81
68
82
1 . Custom Setup:
69
83
70
- - ** IPEX(XPU)** : Requires anaconda environment. ` conda create -n ellm python=3.11 libuv; conda activate ellm ` .
84
+ - ** IPEX(XPU)** : Requires anaconda environment. ` conda create -n ellm python=3.10 libuv; conda activate ellm ` .
71
85
- ** DirectML** : If you are using Conda Environment. Install additional dependencies: ` conda install conda-forge::vs2015_runtime ` .
72
86
73
87
2 . Install embeddedllm package. ` ELLM_TARGET_DEVICE='directml' pip install -e . ` . Note: currently support ` cpu ` , ` directml ` and ` cuda ` .
@@ -107,7 +121,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
107
121
108
122
### Launch Chatbot Web UI
109
123
110
- 1. `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost --model_name <served_model_name> `. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
124
+ 1. `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
111
125
112
126

113
127
@@ -135,7 +149,7 @@ It is an interface that allows you to download and deploy OpenAI API compatible
135
149
ellm_server --model_path <path/to/model/weight>
136
150
137
151
# DirectML
138
- ellm_server --model_path 'EmbeddedLLM_Phi -3-mini-4k-instruct-062024- onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4 ' --port 5555
152
+ ellm_server --model_path 'EmbeddedLLM/Phi -3-mini-4k-instruct-onnx-directml ' --port 5555
139
153
140
154
# IPEX-LLM
141
155
ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'ipex' --device 'xpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
0 commit comments