[Feature] Add New Phi-3 Weight; Add Windows Compilation Steps (#4)

tjtanaa · web-flow · commit 8b6901897efd · 2024-07-02T16:06:43.000+08:00
* add pyinstaller spec, update top_k default value; update readme

* add modelui; update documentation

* Update README.md

* add windows executable compilation steps; update onnxruntime-genai-directml version

---------

Co-authored-by: tjtanaa &lt;tunjian.tan@embeddedllm.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,6 @@ test_phi3*
 **.egg-info
 
 scripts/*.ps1
-scripts/*.sh
+scripts/*.sh
+**/dist
+**/build
diff --git a/README.md b/README.md
@@ -28,6 +28,7 @@ Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
 | Llama-2-13b-chat | 13B | 4096 | [EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml) |
 | Llama-3-8b-chat | 8B | 8192 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
 | Mistral-7b-v0.3-instruct | 7B | 32768 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
+| Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4) |
 | Phi3-mini-4k-instruct | 3.8B | 4096 | [microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) |
 | Phi3-mini-128k-instruct | 3.8B | 128k | [microsoft/Phi-3-mini-128k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) |
 | Phi3-medium-4k-instruct | 17B | 4096 | [microsoft/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) |
@@ -65,6 +66,9 @@ Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS
      - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu, webui]`
      - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda, webui]`
 
+**Note**
+1. If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
+
 ### Launch OpenAI API Compatible Server
 
 ```
@@ -102,6 +106,11 @@ You can find out the disk space required to download the model in the UI.
   ![Model Management UI](asset/ellm_modelui.png)
 
 
+## Compile OpenAI-API Compatible Server into Windows Executable
+1. Install `embeddedllm`.
+2. Install PyInstaller: `pip install pyinstaller`.
+3. Compile Windows Executable: `pyinstaller .\ellm_api_server.spec`.
+4. You can find the executable in the `dist\ellm_api_server`.
 
 ## Acknowledgements
 
diff --git a/requirements-directml.txt b/requirements-directml.txt
@@ -1,2 +1,2 @@
 onnxruntime-directml~=1.18.0
-onnxruntime-genai-directml~=0.2.0
+onnxruntime-genai-directml~=0.3.0
diff --git a/src/embeddedllm/entrypoints/modelui.py b/src/embeddedllm/entrypoints/modelui.py
@@ -42,9 +42,6 @@ class Config(BaseSettings):
 
 
 config = Config()
-import subprocess
-
-from pydantic import BaseModel, Field
 
 
 class DeployedModel(BaseModel):
@@ -77,6 +74,14 @@ class ModelCard(BaseModel):
         repo_type="model",
         context_length=4096,
     ),
+    "EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx": ModelCard(
+        hf_url="https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4",
+        repo_id="EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx",
+        model_name="Phi-3-mini-4k-instruct-062024-onnx",
+        subfolder="onnx/directml/Phi-3-mini-4k-instruct-062024-int4",
+        repo_type="model",
+        context_length=4096,
+    ),
     "EmbeddedLLM/mistralai_Mistral-7B-Instruct-v0.3-int4": ModelCard(
         hf_url="https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx/tree/main/onnx/directml/mistralai_Mistral-7B-Instruct-v0.3-int4",
         repo_id="EmbeddedLLM/mistral-7b-instruct-v0.3-onnx",
@@ -433,7 +438,7 @@ def main():
             <p style="font-size: 24px; font-weight: bold; color: #007bff;">Backend: {backend}</p>
         </div>
         """
-        big_block = gr.HTML(html_content)
+        gr.HTML(html_content)
 
         with gr.Accordion("See More Model Details", open=False):
             model_info_pandas_frame = gr.Dataframe(value=None)

Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`	`1`	`onnxruntime-directml~=1.18.0`
`2`		`-onnxruntime-genai-directml~=0.2.0`
	`2`	`+onnxruntime-genai-directml~=0.3.0`