Skip to content

Commit 34ea36a

Browse files
Merge pull request #170 from flagos-ai/update/modelscope-docs-20260324-130600
ModelScope Documentation Update - 2026-03-24 13:06
2 parents 8e41621 + 2cd16da commit 34ea36a

6 files changed

+212
-28
lines changed

docs/flagrelease_en/model_list.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ FlagRelease/Qwen3-Next-80B-A3B-Instruct-metax-FlagOS
5656
FlagRelease/Qwen3-Omni-30B-A3B-Instruct-FlagOS
5757
FlagRelease/Qwen3-VL-235B-A22B-Instruct-FlagOS
5858
FlagRelease/Qwen3.5-35B-A3B-FlagOS
59+
FlagRelease/Qwen3.5-35B-A3B-iluvatar-FlagOS
5960
FlagRelease/Qwen3.5-397B-A17B-metax-FlagOS
6061
FlagRelease/Qwen3.5-397B-A17B-nvidia-FlagOS
6162
FlagRelease/Qwen3.5-397B-A17B-zhenwu-FlagOS

docs/flagrelease_en/model_readmes/FlagRelease_GLM-5-ascend-FlagOS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Environment Setup
3131

3232
### Download FlagOS Image
3333
```bash
34-
docker pull harbor.baai.ac.cn/flagrelease-public/flagreleaes_ascend_glm5
34+
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-ascend-release-model_glm-5-tree_0.4.1_ascend3.2-gems_4.2.1rc0-scale_none-cx_none-python_3.11.14-torch_npu2.9.0-pcp_cann8.5.0-gpu_ascend001-arc_arm64-driver_25.2.3:202603201037
3535
```
3636

3737
### Download Open-source Model Weights
@@ -45,7 +45,7 @@ modelscope download --model FlagRelease/GLM-5-ascend-FlagOS --local_dir /data/gl
4545
### Start the inference service
4646
```bash
4747
# Container Startup
48-
docker run -itd --name flagos -u root --privileged=true --shm-size=1000g --net=host -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/sbin:/usr/local/sbin -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime -v /etc/ascend_install.info:/etc/ascend_install.info -v /data:/data -v /root/.cache:/root/.cache harbor.baai.ac.cn/flagrelease-public/flagreleaes_ascend_glm5 bash
48+
docker run -itd --name flagos -u root --privileged=true --shm-size=1000g --net=host -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/sbin:/usr/local/sbin -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime -v /etc/ascend_install.info:/etc/ascend_install.info -v /data:/data -v /root/.cache:/root/.cache harbor.baai.ac.cn/flagrelease-public/flagrelease-ascend-release-model_glm-5-tree_0.4.1_ascend3.2-gems_4.2.1rc0-scale_none-cx_none-python_3.11.14-torch_npu2.9.0-pcp_cann8.5.0-gpu_ascend001-arc_arm64-driver_25.2.3:202603201037 bash
4949

5050
docker exec -it flagos bash
5151
```

docs/flagrelease_en/model_readmes/FlagRelease_Qwen3-8B-mthreads-FlagOS.md

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -44,30 +44,35 @@ FlagEval (Libra)** is a comprehensive evaluation system and open platform for la
4444

4545
| Metrics | Qwen3-8B-H100-CUDA | Qwen3-8B-mthreads-FlagOS |
4646
| --------- | ------------------ | ---------------------- |
47-
| AIME_0fewshot_@avg1 | 0.700 | 0.800 |
48-
| GPQA_0fewshot_@avg1 | 0.507 | 0.493 |
49-
| LiveBench-0fewshot_@avg1 | 0.502 | 0.503 |
50-
| MMLU_5fewshot_@avg1 | 0.699 | 0.706 |
51-
| MUSR_0fewshot_@avg | 0.602 | 0.603 |
47+
| AIME_0fewshot_@avg1 | 0.700 | 0.700 |
48+
| GPQA_0fewshot_@avg1 | 0.507 | 0.596 |
49+
5250

5351
# User Guide
5452

5553
**Environment Setup**
5654

57-
| Accelerator Card Driver Version | Kernel Mode Driver Version: 2.3.0 |
58-
| ------------- | ------------------------------------------------------------ |
59-
| Docker Version | Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1|
60-
| Operating System | Linux |
61-
| FlagScale | Version: 0.8.0 |
62-
| FlagGems | Version: 3.0 |
55+
| Item | Value |
56+
|------|-------|
57+
| Accelerator Card Driver Version | 2.2.0 |
58+
| Docker Version | 29.0.4 |
59+
| Operating System | Ubuntu 22.04.5 LTS (Jammy Jellyfish) |
60+
| Kernel Version | 5.15.0-105-generic |
61+
| Chip Vendor | mthreads (mthreads) |
62+
| SDK Version | MUSA N/A |
63+
| GPU Model | mthreads |
64+
| Python Version | 3.12.18 |
65+
| PyTorch Version | torch_musa: 2.5.0 |
66+
| FlagScale | Version: 0.8.0 |
67+
| FlagGems | Version: 4.1 |
68+
6369

6470
## Operation Steps
6571

6672
### Download FlagOS Image
6773

6874
```bash
69-
#docker pull harbor.baai.ac.cn/flagrelease-public/mthreads_qwen3_8b:latest
70-
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-mthreads-release-model_qwen3-8b-tree_none-gems_3.0-scale_0.8.0-cx_none-python_3.10.12-torch_musa-2.1.0-pcp_musa4.1.0-gpu_mthreads001-arc_amd64-driver_2.3.0:260310
75+
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-mthreads-release-model_qwen3-8b-tree_none-gems_4.1-scale_0.8.0-cx_none-python_3.10.18-torch_musa-2.5.0-pcp_musa3.3.2-gpu_mthreads001-arc_amd64-driver_3.3.2-server:latest
7176
```
7277

7378
### Download Open-source Model Weights
@@ -84,14 +89,16 @@ modelscope download --model Qwen/Qwen3-8B --local_dir /data/models/Qwen3-8B
8489
#Container Startup
8590
docker run --network=host --privileged -e MTHREADS_VISIBLE_DEVICES=all \
8691
-e VLLM_USE_V1=0 -e MTHREADS_DRIVER_CAPABILITIES=all --shm-size 16g -e USE_FLAGGEMS=1 \
87-
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d --name qwen3_8b -v /data/models/Qwen3-8B:/root/Qwen3-8B \
88-
--tmpfs /tmp:exec harbor.baai.ac.cn/flagrelease-public/flagrelease-mthreads-release-model_qwen3-8b-tree_none-gems_3.0-scale_0.8.0-cx_none-python_3.10.12-torch_musa-2.1.0-pcp_musa4.1.0-gpu_mthreads001-arc_amd64-driver_2.3.0:260310 sleep infinity
92+
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d --name flagos -v /data/models/Qwen3-8B:/root/Qwen3-8B \
93+
--tmpfs /tmp:exec harbor.baai.ac.cn/flagrelease-public/flagrelease-mthreads-release-model_qwen3-8b-tree_none-gems_4.1-scale_0.8.0-cx_none-python_3.10.18-torch_musa-2.5.0-pcp_musa3.3.2-gpu_mthreads001-arc_amd64-driver_3.3.2-server:latest sleep infinity
94+
docker exec -it flagos /bin/bash
8995
```
9096

9197
### Serve
9298

9399
```bash
94-
flagscale serve qwen3
100+
rm -r /root/.triton/cache # if exists
101+
QWEN3_PORT=8100 CUDA_VISIBLE_DEVICES=0 QWEN3_PATH=/root/Qwen3-8B flagscale serve qwen3
95102

96103
```
97104

@@ -103,7 +110,7 @@ flagscale serve qwen3
103110
import openai
104111
openai.api_key = "EMPTY"
105112
openai.base_url = "http://<server_ip>:8100/v1/"
106-
model = "/root/Qwen3-8B/"
113+
model = "/root/Qwen3-8B"
107114
messages = [
108115
{"role": "system", "content": "You are a helpful assistant."},
109116
{"role": "user", "content": "What's the weather like today?"}
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Introduction
2+
The Zhongzhi FlagOS community officially releases the Iluvatar image for Qwen3.5-35B-A3B, adapted based on FlagOS. Qwen3.5-35B-A3B is a new multimodal MoE model subsequently open-sourced by Alibaba Cloud Qwen team following the release of Qwen3.5 397B MoE, featuring 35 billion total parameters and 3 billion activated parameters, with native support for ultra-long contexts of 262,144 tokens. The model adopts an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), trained with early fusion on multimodal tokens, enabling unified vision-language understanding covering image, video, and other multimodal inputs, achieving comprehensive breakthroughs in reasoning, coding, Agent tasks, and visual understanding.
3+
### Integrated Deployment
4+
- Out-of-the-box inference scripts with pre-configured hardware and software parameters
5+
- Released **FlagOS-Iluvatar** container image supporting deployment within minutes
6+
### Consistency Validation
7+
- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public.
8+
9+
# Evaluation Results
10+
## Benchmark Result
11+
|Metrics|Qwen3.5-35B-A3B-Nvidia-Origin|Qwen3.5-35B-A3B-Nvidia-FlagOS|Qwen3.5-35B-A3B-Iluvatar-FlagOS|
12+
|-------|---------------|---------------|---------------|
13+
|ERQA(vision)|60| 56.5 |59.72 |
14+
|GPQA_Diamond | 78.28 | 78.28 |78.28 |
15+
16+
# User Guide
17+
Environment Setup
18+
19+
| Item | Version |
20+
|------------------|----------------------|
21+
| Docker Version | Docker version 27.1.0, build 6312585 |
22+
| Operating System | Ubuntu 20.04.6 LTS (focal) |
23+
24+
## Operation Steps
25+
26+
### Download FlagOS Image
27+
```bash
28+
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-iluvatar-release-model_qwen3.5-35b-a3b-tree_none-gems_4.2.1rc0-scale_none-cx_none-python_3.10.18-torch_2.7.1_corex.4.4.0-pcp_ix-ml4.4.0-gpu_iluvatar001-arc_amd64-driver_4.4.0:202603182010
29+
```
30+
31+
### Download Open-source Model Weights
32+
```bash
33+
pip install modelscope
34+
modelscope download --model FlagRelease/Qwen3.5-35B-A3B-iluvatar-FlagOS --local_dir /data/Qwen3.5-35B-A3B
35+
```
36+
37+
### Start the Container
38+
```bash
39+
docker run --shm-size="32g" -itd \
40+
-v /dev:/dev -v /usr/src/:/usr/src \
41+
-v /lib/modules/:/lib/modules \
42+
-v /data/:/data/ \
43+
--privileged --cap-add=ALL --pid=host --net=host \
44+
--name flagos harbor.baai.ac.cn/flagrelease-public/flagrelease-iluvatar-release-model_qwen3.5-35b-a3b-tree_none-gems_4.2.1rc0-scale_none-cx_none-python_3.10.18-torch_2.7.1_corex.4.4.0-pcp_ix-ml4.4.0-gpu_iluvatar001-arc_amd64-driver_4.4.0:202603182010 /bin/bash
45+
docker exec -it flagos /bin/bash
46+
```
47+
### Start the Server
48+
```bash
49+
export VLLM_ENGINE_ITERATION_TIMEOUT_S=36000
50+
export VLLM_RPC_TIMEOUT=36000000
51+
export VLLM_EXECUTE_MODEL_TIMEOUT_SECONDS=3600
52+
vllm serve /data/Qwen3.5-35B-A3B/ -tp 8 --served-model-name qwen --enforce-eager --port 8010 --max-model-len 262144
53+
54+
```
55+
56+
## Service Invocation
57+
### Invocation Script
58+
Input: text
59+
```python
60+
from openai import OpenAI
61+
# Set OpenAI's API key and API base to use vLLM's API server.
62+
openai_api_key = "EMPTY"
63+
openai_api_base = "http://localhost:8010/v1"
64+
65+
client = OpenAI(
66+
api_key=openai_api_key,
67+
base_url=openai_api_base,
68+
)
69+
70+
response = client.chat.completions.create(
71+
model="qwen",
72+
messages=[
73+
{"role": "user", "content": "Give me a short introduction to large language models."},
74+
],
75+
max_tokens=20,
76+
#max_tokens=1024,
77+
temperature=0.7,
78+
top_p=0.8,
79+
presence_penalty=1.5,
80+
extra_body={
81+
"top_k": 20,
82+
"chat_template_kwargs": {"enable_thinking": False},
83+
},
84+
stream=True,
85+
)
86+
87+
for chunk in response:
88+
if chunk.choices and chunk.choices[0].delta.content:
89+
print(chunk.choices[0].delta.content, end="", flush=True)
90+
```
91+
Input: image
92+
```python
93+
from openai import OpenAI
94+
# Configured by environment variables
95+
openai_api_key = "EMPTY"
96+
openai_api_base = "http://localhost:8010/v1"
97+
client = OpenAI(
98+
api_key=openai_api_key,
99+
base_url=openai_api_base,
100+
)
101+
messages = [
102+
{
103+
"role": "user",
104+
"content": [
105+
{
106+
"type": "image_url",
107+
"image_url": {
108+
"url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/CI_Demo/mathv-1327.jpg"
109+
}
110+
},
111+
{
112+
"type": "text",
113+
"text": "The centres of the four illustrated circles are in the corners of the square. The two big circles touch each other and also the two little circles. With which factor do you have to multiply the radii of the little circles to obtain the radius of the big circles?\nChoices:\n(A) $\\frac{2}{9}$\n(B) $\\sqrt{5}$\n(C) $0.8 \\cdot \\pi$\n(D) 2.5\n(E) $1+\\sqrt{2}$"
114+
}
115+
]
116+
}
117+
]
118+
response = client.chat.completions.create(
119+
model="qwen",
120+
messages=messages,
121+
max_tokens=60,
122+
temperature=1.0,
123+
top_p=0.95,
124+
presence_penalty=1.5,
125+
extra_body={
126+
"top_k": 20,
127+
},
128+
stream=False,
129+
)
130+
131+
if response.choices and response.choices[0].message.content:
132+
print(response.choices[0].message.content)
133+
```
134+
135+
### AnythingLLM Integration Guide
136+
137+
#### 1. Download & Install
138+
139+
- Visit the official site: https://anythingllm.com/
140+
- Choose the appropriate version for your OS (Windows/macOS/Linux)
141+
- Follow the installation wizard to complete the setup
142+
143+
#### 2. Configuration
144+
145+
- Launch AnythingLLM
146+
- Open settings (bottom left, fourth tab)
147+
- Configure core LLM parameters
148+
- Click "Save Settings" to apply changes
149+
150+
#### 3. Model Interaction
151+
152+
- After model loading is complete:
153+
- Click **"New Conversation"**
154+
- Enter your question (e.g., “Explain the basics of quantum computing”)
155+
- Click the send button to get a response
156+
# Technical Overview
157+
**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \<chip + open-source model\>. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application.
158+
## FlagGems
159+
FlagGems is a high-performance, generic operator libraryimplemented in [Triton](https://github.com/openai/triton) language. It is built on a collection of backend-neutralkernels that aims to accelerate LLM (Large-Language Models) training and inference across diverse hardware platforms.
160+
## FlagTree
161+
FlagTree is an open source, unified compiler for multipleAI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. Forupstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration.
162+
## FlagScale and vllm-plugin-fl
163+
Flagscale is a comprehensive toolkit designed to supportthe entire lifecycle of large models. It builds on the strengths of several prominent open-source projects, including [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [vLLM](https://github.com/vllm-project/vllm), to provide a robust, end-to-end solution for managing and scaling large models.
164+
vllm-plugin-fl is a vLLM plugin built on the FlagOS unified multi-chip backend, to help flagscale support multi-chip on vllm framework.
165+
## **FlagCX**
166+
FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.
167+
168+
## **FlagEval Evaluation Framework**
169+
FlagEval is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features:
170+
- **Multi-dimensional Evaluation**: Supports 800+ modelevaluations across NLP, CV, Audio, and Multimodal fields,covering 20+ downstream tasks including language understanding and image-text generation.
171+
- **Industry-Grade Use Cases**: Has completed horizonta1 evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation.
172+
# Contributing
173+
174+
We warmly welcome global developers to join us:
175+
176+
1. Submit Issues to report problems
177+
2. Create Pull Requests to contribute code
178+
3. Improve technical documentation
179+
4. Expand hardware adaptation support
180+
# License
181+
The model weights are sourced from Qwen/Qwen3.5-35B-A3B and open-sourced under the Apache 2.0 license: https://www.apache.org/licenses/LICENSE-2.0.txt。

docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.5-397B-A17B-nvidia-FlagOS.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Environment Setup
3434

3535
### Download FlagOS Image
3636
```bash
37-
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-nvidia-release-model_qwen3.5-397b-a17b-tree_none-gems_4.2.1rc0-scale_none-cx_none-python_3.12.3-torch_2.10.0-pcp_cuda13.1-gpu_nvidia003-arc_amd64-driver_570.158.01:2602171855
37+
docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-nvidia-release-model_qwen3.5-397b-a17b-tree_0.4.1_3.5-gems_4.2.1rc0-scale_none-cx_none-python_3.12.3-torch_2.10.0-pcp_cuda13.1-gpu_nvidia003-arc_amd64-driver_570.158.01:202603191445
3838
```
3939

4040
### Download Open-source Model Weights
@@ -50,7 +50,7 @@ docker run --init --detach --net=host --user 0 --ipc=host \
5050
-v /data:/data --security-opt=seccomp=unconfined \
5151
--privileged --ulimit=stack=67108864 --ulimit=memlock=-1 \
5252
--shm-size=512G --gpus all \
53-
--name flagos harbor.baai.ac.cn/flagrelease-public/flagrelease-nvidia-release-model_qwen3.5-397b-a17b-tree_none-gems_4.2.1rc0-scale_none-cx_none-python_3.12.3-torch_2.10.0-pcp_cuda13.1-gpu_nvidia003-arc_amd64-driver_570.158.01:2602171855 sleep infinity
53+
--name flagos harbor.baai.ac.cn/flagrelease-public/flagrelease-nvidia-release-model_qwen3.5-397b-a17b-tree_0.4.1_3.5-gems_4.2.1rc0-scale_none-cx_none-python_3.12.3-torch_2.10.0-pcp_cuda13.1-gpu_nvidia003-arc_amd64-driver_570.158.01:202603191445 sleep infinity
5454
docker exec -it flagos bash
5555
```
5656
### Serve
@@ -133,5 +133,4 @@ We warmly welcome global developers to join us:
133133
4. Expand hardware adaptation support
134134
135135
# License
136-
137-
本模型的权重来源于Qwen/Qwen3.5-397B-A17B,以apache2.0协议https://www.apache.org/licenses/LICENSE-2.0.txt开源。
136+
The model weights are sourced from Qwen/Qwen3.5-35B-A3B and open-sourced under the Apache 2.0 license

docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.5-397B-A17B-zhenwu-FlagOS.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -142,9 +142,5 @@ We warmly welcome global developers to join us:
142142
4. Expand hardware adaptation support
143143

144144
# License
145-
146-
147-
148-
149-
本模型的权重来源于Qwen/Qwen3.5-397B-A17B,以apache2.0协议开源:https://www.apache.org/licenses/LICENSE-2.0.txt
145+
The model weights are sourced from Qwen/Qwen3.5-35B-A3B and open-sourced under the Apache 2.0 license: https://www.apache.org/licenses/LICENSE-2.0.txt
150146

0 commit comments

Comments
 (0)