EmbeddedLLM
diff --git a/‎.gitignore
Lines changed: 2 additions & 1 deletion b/‎.gitignore
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md
Lines changed: 74 additions & 51 deletions b/‎README.md
Lines changed: 74 additions & 51 deletions
diff --git a/‎docs/model/ipex_models.md
Lines changed: 65 additions & 0 deletions b/‎docs/model/ipex_models.md
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/model/onnxruntime_models.md
Lines changed: 19 additions & 0 deletions b/‎docs/model/onnxruntime_models.md
Lines changed: 19 additions & 0 deletions
diff --git a/‎requirements-build.txt
Lines changed: 1 addition & 1 deletion b/‎requirements-build.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎requirements-common.txt
Lines changed: 1 addition & 1 deletion b/‎requirements-common.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎requirements-cpu.txt
Lines changed: 2 additions & 0 deletions b/‎requirements-cpu.txt
Lines changed: 2 additions & 0 deletions
diff --git a/‎requirements-cuda.txt
Lines changed: 2 additions & 0 deletions b/‎requirements-cuda.txt
Lines changed: 2 additions & 0 deletions
diff --git a/‎requirements-directml.txt
Lines changed: 2 additions & 0 deletions b/‎requirements-directml.txt
Lines changed: 2 additions & 0 deletions
diff --git a/‎requirements-xpu.txt
Lines changed: 4 additions & 0 deletions b/‎requirements-xpu.txt
Lines changed: 4 additions & 0 deletions
@@ -10,4 +10,5 @@ test_phi3*
 scripts/*.ps1
 scripts/*.sh
 **/dist
-**/build
+**/build
+*.log
@@ -1,14 +1,13 @@
 # EmbeddedLLM
 
-Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)).
-Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
+Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
 
 | Support matrix        | Supported now                                       | Under Development | On the roadmap |
 | --------------------- | --------------------------------------------------- | ----------------- | -------------- |
 | Model architectures   | Gemma <br/> Llama \* <br/> Mistral + <br/>Phi <br/> |                   |                |
 | Platform              | Linux <br/> Windows                                 |                   |                |
 | Architecture          | x86 <br/> x64 <br/>                                 | Arm64             |                |
-| Hardware Acceleration | CUDA<br/>DirectML<br/>                              | QNN <br/> ROCm    | OpenVINO       |
+| Hardware Acceleration | CUDA<br/>DirectML<br/>IpexLLM                       | QNN <br/> ROCm    | OpenVINO       |
 
 \* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
 
@@ -19,6 +18,19 @@ Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
 - [2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
 - [2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.
 
+## Table Content
+
+- [Supported Models](#supported-models-quick-start)
+  - [Onnxruntime Models](./docs/model/onnxruntime_models.md)
+  - [Ipex-LLM Models](./docs/model/ipex_models.md)
+- [Getting Started](#getting-started)
+  - [Installation From Source](#installation)
+  - [Launch OpenAI API Compatible Server](#launch-openai-api-compatible-server)
+  - [Launch Chatbot Web UI](#launch-chatbot-web-ui)
+  - [Launch Model Management UI](#launch-model-management-ui)
+- [Compile OpenAI-API Compatible Server into Windows Executable](#compile-openai-api-compatible-server-into-windows-executable)
+- [Acknowledgements](#acknowledgements)
+
 ## Supported Models (Quick Start)
 
 | Models | Parameters | Context Length | Link |
@@ -35,83 +47,94 @@ Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
 | Phi3-medium-128k-instruct | 17B | 128k | [microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) |
 | Openchat-3.6-8b | 8B | 8192 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx) |
 | Yi-1.5-6b-chat | 6B | 32k | [EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx](https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx) |
-| Phi-3-vision-128k-instruct | | 128k | [EmbeddedLLM/Phi-3-vision-128k-instruct-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4) |
-
+| Phi-3-vision-128k-instruct |  | 128k | [EmbeddedLLM/Phi-3-vision-128k-instruct-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4) |
 
 ## Getting Started
 
 ### Installation
 
 #### From Source
 
-**Windows**
+- **Windows**
+
+  1. Custom Setup:
+
+  - **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
+  - **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
+
+  2. Install embeddedllm package. `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
 
-1. Install embeddedllm package. `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
-   - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]`
-   - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]`
-   - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
-   - **With Web UI**:
-     - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml, webui]`
-     - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu, webui]`
-     - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda, webui]`
+     - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]`
+     - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]`
+     - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
+     - **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu]`
+     - **With Web UI**:
+       - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]`
+       - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]`
+       - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]`
+       - **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu,webui]`
 
-**Linux**
+- **Linux**
 
-1. Install embeddedllm package. `ELLM_TARGET_DEVICE='directml' pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
-   - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml]`
-   - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]`
-   - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
-   - **With Web UI**:
-     - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml, webui]`
-     - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu, webui]`
-     - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda, webui]`
+  1. Custom Setup:
 
-**Note**
-1. If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
+  - **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
+  - **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
+
+  2. Install embeddedllm package. `ELLM_TARGET_DEVICE='directml' pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
+
+     - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml]`
+     - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]`
+     - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
+     - **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu]`
+     - **With Web UI**:
+       - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]`
+       - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]`
+       - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]`
+       - **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu,webui]`
 
 ### Launch OpenAI API Compatible Server
 
-```
-usage: ellm_server.exe [-h] [--port int] [--host str] [--response_role str] [--uvicorn_log_level str]
-                       [--served_model_name str] [--model_path str] [--vision bool]
+1. Custom Setup:
+
+   - **Ipex**
 
-options:
-  -h, --help            show this help message and exit
-  --port int            Server port. (default: 6979)
-  --host str            Server host. (default: 0.0.0.0)
-  --response_role str   Server response role. (default: assistant)
-  --uvicorn_log_level str
-                        Uvicorn logging level. `debug`, `info`, `trace`, `warning`, `critical` (default: info)
-  --served_model_name str
-                        Model name. (default: phi3-mini-int4)
-  --model_path str      Path to model weights. (required)
-  --vision bool         Enable vision capability, only if model supports vision input. (default: False)
-```
+     - For **Intel iGPU**:
 
-1. `ellm_server --model_path <path/to/model/weight>`.
-2. Example code to connect to the api server can be found in `scripts/python`.
+       ```cmd
+       set SYCL_CACHE_PERSISTENT=1
+       set BIGDL_LLM_XMX_DISABLED=1
+       ```
 
-## Launch Chatbot Web UI
+     - For **Intel Arc™ A-Series Graphics**:
+       ```cmd
+       set SYCL_CACHE_PERSISTENT=1
+       ```
 
-1. `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost`.
+2. `ellm_server --model_path <path/to/model/weight>`.
+3. Example code to connect to the api server can be found in `scripts/python`. **Note:** To find out more of the supported arguments. `ellm_server --help`.
 
-  ![Chatbot Web UI](asset/ellm_chatbot_vid.webp)
+### Launch Chatbot Web UI
 
-## Launch Model Management UI
-It is an interface that allows you to download and deploy OpenAI API compatible server.
-You can find out the disk space required to download the model in the UI.
+1.  `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
 
-1. `ellm_modelui --port 6678`
+        ![Chatbot Web UI](asset/ellm_chatbot_vid.webp)
 
-  ![Model Management UI](asset/ellm_modelui.png)
+### Launch Model Management UI
 
+It is an interface that allows you to download and deploy OpenAI API compatible server. You can find out the disk space required to download the model in the UI.
+
+1.  `ellm_modelui --port 6678`. **Note:** To find out more of the supported arguments. `ellm_modelui --help`.
+
+        ![Model Management UI](asset/ellm_modelui.png)
 
 ## Compile OpenAI-API Compatible Server into Windows Executable
+
 1. Install `embeddedllm`.
 2. Install PyInstaller: `pip install pyinstaller`.
 3. Compile Windows Executable: `pyinstaller .\ellm_api_server.spec`.
 4. You can find the executable in the `dist\ellm_api_server`.
 
 ## Acknowledgements
 
-- Excellent open-source projects: [vLLM](https://github.com/vllm-project/vllm.git), [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai.git) and many others.
+- Excellent open-source projects: [vLLM](https://github.com/vllm-project/vllm.git), [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai.git), [Ipex-LLM](https://github.com/intel-analytics/ipex-llm/tree/main) and many others.
@@ -0,0 +1,65 @@
+# Model Powered by Ipex-LLM
+
+## Verified Models
+| Model | Model Link |
+| --- | --- |
+| Phi-3 | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
+
+## Supported Models by Ipex-LLM
+
+| Model | Model Link |
+| --- | --- |
+| LLaMA _(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)_ |  |
+| LLaMA 2 | [link1](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [link2](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) |
+| LLaMA 3 | [link](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
+| ChatGLM |  |
+| ChatGLM2 | [link](https://huggingface.co/THUDM/chatglm2-6b) |
+| ChatGLM3 | [link](https://huggingface.co/THUDM/chatglm3-6b) |
+| GLM-4 | [link](https://huggingface.co/THUDM/glm-4-9b-chat) |
+| Mistral | [link](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) |
+| Mixtral | [link](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) |
+| Falcon | [link](https://huggingface.co/tiiuae/falcon-7b-instruct) |
+| MPT | [link](https://huggingface.co/mosaicml/mpt-7b-chat) |
+| Dolly-v1 | [link](https://huggingface.co/databricks/dolly-v1-6b) |
+| Dolly-v2 | [link](https://huggingface.co/databricks/dolly-v2-12b) |
+| Replit Code | [link](https://huggingface.co/replit/replit-code-v1-3b) |
+| RedPajama | [link](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) |
+| Phoenix | [link](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) |
+| StarCoder | [link](https://huggingface.co/bigcode/starcoder) |
+| Baichuan | [link](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) |
+| Baichuan2 | [link](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) |
+| InternLM | [link](https://huggingface.co/internlm/internlm-chat-7b) |
+| InternLM2 | [link](https://huggingface.co/internlm/internlm2-chat-7b) |
+| Qwen | [link](https://huggingface.co/Qwen/Qwen-7B-Chat) |
+| Qwen1.5 | [link](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) |
+| Qwen2 | [link](https://huggingface.co/Qwen/Qwen2-7B-Instruct) |
+| Aquila | [link](https://huggingface.co/BAAI/AquilaChat-7B) |
+| Aquila2 | [link](https://huggingface.co/BAAI/AquilaChat2-7B) |
+| Phi-1_5 | [link](https://huggingface.co/microsoft/phi-1_5) |
+| Flan-t5 | [link](https://huggingface.co/google/flan-t5-xxl) |
+| CodeLlama | [link](https://huggingface.co/codellama/CodeLlama-7b-hf) |
+| Skywork | [link](https://huggingface.co/Skywork/Skywork-13B-base) |
+| InternLM-XComposer | [link](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) |
+| CodeShell | [link](https://huggingface.co/WisdomShell/CodeShell-7B) |
+| Yi | [link](https://huggingface.co/01-ai/Yi-6B) |
+| BlueLM | [link](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) |
+| Mamba | [link1](https://huggingface.co/state-spaces/mamba-1.4b), [link2](https://huggingface.co/state-spaces/mamba-2.8b) |
+| SOLAR | [link](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) |
+| Phixtral | [link](https://huggingface.co/mlabonne/phixtral-4x2_8) |
+| RWKV4 |  |
+| RWKV5 |  |
+| DeepSeek-MoE | [link](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) |
+| Ziya-Coding-34B-v1.0 | [link](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) |
+| Phi-2 | [link](https://huggingface.co/microsoft/phi-2) |
+| Phi-3 | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
+| Yuan2 | [link](https://huggingface.co/IEITYuan/Yuan2-2B-hf) |
+| Gemma | [link1](https://huggingface.co/google/gemma-2b-it), [link2](https://huggingface.co/google/gemma-7b-it) |
+| DeciLM-7B | [link](https://huggingface.co/Deci/DeciLM-7B-instruct) |
+| Deepseek | [link](phttps://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) |
+| StableLM | [link](https://huggingface.co/stabilityai/stablelm-zephyr-3b) |
+| CodeGemma | [link](https://huggingface.co/google/codegemma-7b-it) |
+| Command-R/cohere | [link](https://huggingface.co/CohereForAI/c4ai-command-r-v01) |
+| CodeGeeX2 | [link](https://huggingface.co/THUDM/codegeex2-6b) |
+| MiniCPM | [link](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
+
+Resources from: https://github.com/intel-analytics/ipex-llm/
@@ -0,0 +1,19 @@
+# Model Powered by Onnxruntime GenAI
+
+## Supported Models
+
+| Models | Parameters | Context Length | Link |
+| --- | --- | --- | --- |
+| Gemma-2b-Instruct v1 | 2B | 8192 | [EmbeddedLLM/gemma-2b-it-onnx](https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx) |
+| Llama-2-7b-chat | 7B | 4096 | [EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml) |
+| Llama-2-13b-chat | 13B | 4096 | [EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml) |
+| Llama-3-8b-chat | 8B | 8192 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
+| Mistral-7b-v0.3-instruct | 7B | 32768 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
+| Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4) |
+| Phi3-mini-4k-instruct | 3.8B | 4096 | [microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) |
+| Phi3-mini-128k-instruct | 3.8B | 128k | [microsoft/Phi-3-mini-128k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) |
+| Phi3-medium-4k-instruct | 17B | 4096 | [microsoft/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) |
+| Phi3-medium-128k-instruct | 17B | 128k | [microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) |
+| Openchat-3.6-8b | 8B | 8192 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx) |
+| Yi-1.5-6b-chat | 6B | 32k | [EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx](https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx) |
+| Phi-3-vision-128k-instruct |  | 128k | [EmbeddedLLM/Phi-3-vision-128k-instruct-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4) |
@@ -1,5 +1,5 @@
 # Should be mirrored in pyproject.toml
 packaging
 setuptools>=49.4.0
-torch==2.3.1
+torch
 wheel
@@ -13,4 +13,4 @@ transformers
 uvicorn
 filetype~=1.2.0
 Pillow~=10.3.0
-torchvision~=0.18.1
+torchvision
@@ -1,2 +1,4 @@
+torch==2.3.1
+torchvision~=0.18.1
 onnxruntime
 onnxruntime-genai==0.3.0rc2
@@ -1,2 +1,4 @@
+torch==2.3.1
+torchvision~=0.18.1
 onnxruntime-gpu~=1.18.0
 onnxruntime-genai-cuda~=0.3.0rc2
@@ -1,2 +1,4 @@
+torch==2.3.1
+torchvision~=0.18.1
 onnxruntime-directml~=1.18.0
 onnxruntime-genai-directml~=0.3.0
@@ -0,0 +1,4 @@
+torch==2.1.0
+torchvision
+trl
+transformers~=4.42.3