Skip to content

Commit 541b116

Browse files
authored
Add files via upload
1 parent f8bf2cf commit 541b116

File tree

1 file changed

+217
-0
lines changed

1 file changed

+217
-0
lines changed

docs/Run_Aria_on_AMD.md

+217
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
2+
# Steps to deploy Aria inference on AMD chips
3+
4+
## Step 1. Build vllm docker for ROCm
5+
6+
Building a Docker image from a Dockerfile.rocm file is similar to building a Docker image from any Dockerfile, with the main difference being that you need to explicitly specify the file name since it’s not named Dockerfile by default.
7+
8+
1. Ensure Docker is installed.
9+
10+
Verify that Docker is installed and running on your machine. Use:
11+
12+
```
13+
docker --version
14+
```
15+
16+
2. Navigate to the directory containing Dockerfile.rocm.
17+
18+
Change to the directory where the Dockerfile.rocm is located:
19+
20+
```
21+
cd /path/to/directory
22+
```
23+
24+
3. Build the Docker image.
25+
26+
Use the -f flag to specify the Dockerfile.rocm file and -t to tag the resulting Docker image:
27+
28+
```
29+
docker build -f Dockerfile.rocm -t your-image-name:your-tag .
30+
```
31+
32+
Replace your-image-name with the desired name of your image and your-tag with the version tag (e.g., latest).
33+
34+
Example:
35+
36+
```
37+
docker build -f Dockerfile.rocm -t my-rocm-image:latest .
38+
```
39+
40+
4. Verify the image is built.
41+
42+
After the build process completes, verify that the image was created successfully by listing all Docker images:
43+
44+
```
45+
docker images
46+
```
47+
48+
Example Output:
49+
50+
```
51+
REPOSITORY TAG IMAGE ID CREATED SIZE
52+
my-rocm-image latest abcdef123456 1 minute ago 1.5GB
53+
```
54+
55+
5. Run the Docker container (optional).
56+
57+
To test the image, you can run it in a container:
58+
59+
```
60+
docker run --rm -it my-rocm-image:latest
61+
```
62+
63+
Use --gpus all if you want the container to have access to ROCm-enabled GPUs:
64+
65+
```
66+
docker run --rm -it --gpus all my-rocm-image:latest
67+
```
68+
69+
70+
> Notes
71+
>
72+
> - Dependencies: Ensure you have the necessary dependencies for ROCm installed on your host machine. For ROCm-enabled systems, GPU drivers and the ROCm toolkit should be properly configured.
73+
>
74+
> - Permissions: If you encounter permission issues with Docker, prepend sudo to the commands or configure Docker for non-root users.
75+
>
76+
> - Custom build context: If your Dockerfile.rocm relies on other files in the directory, ensure they are in the build context (i.e., the directory specified by the . at the end of the docker build command).
77+
78+
## Step 2. Run the docker
79+
80+
```
81+
CACHE_DIR=${CACHE_DIR:-"$HOME/.cache"}
82+
83+
docker run -d --rm --privileged --net=host --cap-add=CAP_SYS_ADMIN \
84+
--device=/dev/kfd --device=/dev/dri --device=/dev/mem \
85+
--shm-size 200G --group-add video --cap-add=SYS_PTRACE \
86+
--security-opt seccomp=unconfined -v $CACHE_DIR:/root/.cache \
87+
my-rocm-image:latest sleep infinity
88+
```
89+
90+
## Step 3. Start vllm server to host the Aria model
91+
92+
```
93+
#!/user/bin
94+
95+
OMP_NUM_THREADS=4 VLLM_WORKER_MULTIPROC_METHOD=spawn IMAGE_MAX_SIZE=980 python -m vllm.entrypoints.openai.api_server \
96+
--model /path/to/aria/ckpt \
97+
--tokenizer /path/to/aria/tokenizer \
98+
--tokenizer-mode slow \
99+
--port 8080 \
100+
--served-model-name Aria \
101+
--tensor-parallel-size 1 \
102+
--trust-remote-code \
103+
--max-model-len 4096 \
104+
--max-logprobs 128 \
105+
--gpu-memory-utilization 0.8 \
106+
--max-num-seqs 1 \
107+
--enforce-eager \
108+
--worker-use-ray
109+
```
110+
111+
## Step 4. Test the inference on the client side
112+
113+
```
114+
import base64
115+
import requests
116+
from openai import OpenAI
117+
118+
# Modify OpenAI's API key and API base to use vLLM's API server.
119+
openai_api_key = "EMPTY"
120+
openai_api_base = "http://localhost:8080/v1"
121+
122+
client = OpenAI(
123+
# defaults to os.environ.get("OPENAI_API_KEY")
124+
api_key=openai_api_key,
125+
base_url=openai_api_base,
126+
)
127+
128+
models = client.models.list()
129+
model = models.data[0].id
130+
131+
image_url = "https://i0.hdslb.com/bfs/archive/ac72ae36271a6970f92b1de485e6ae6c9e4c1ebb.jpg"
132+
image_url = "https://cdn.fstoppers.com/styles/full/s3/media/2019/12/04/nando-jpeg-quality-001.jpg"
133+
image_url = "https://tinyjpg.com/images/social/website.jpg"
134+
# Use image url in the payload
135+
chat_completion_from_url = client.chat.completions.create(
136+
messages=[{
137+
"role":
138+
"user",
139+
"content": [
140+
{
141+
"type": "text",
142+
"text": "What's in this image?<image>"
143+
},
144+
{
145+
"type": "image_url",
146+
"image_url": {
147+
"url": image_url
148+
},
149+
},
150+
],
151+
}],
152+
model=model,
153+
max_tokens=128
154+
)
155+
156+
result = chat_completion_from_url.choices[0].message.content
157+
print(f"Chat completion output:{result}")
158+
159+
# Use base64 encoded image in the payload
160+
def encode_image_base64_from_url(image_url: str) -> str:
161+
"""Encode an image retrieved from a remote url to base64 format."""
162+
163+
with requests.get(image_url) as response:
164+
response.raise_for_status()
165+
result = base64.b64encode(response.content).decode('utf-8')
166+
167+
return result
168+
169+
image_base64 = encode_image_base64_from_url(image_url=image_url)
170+
chat_completion_from_base64 = client.chat.completions.create(
171+
messages=[{
172+
"role":
173+
"user",
174+
"content": [
175+
{
176+
"type": "text",
177+
"text": "What's in this image?<image><image>"
178+
},
179+
{
180+
"type": "image_url",
181+
"image_url": {
182+
"url": f"data:image/jpeg;base64,{image_base64}"
183+
},
184+
},
185+
{
186+
"type": "image_url",
187+
"image_url": {
188+
"url": f"data:image/jpeg;base64,{image_base64}"
189+
},
190+
},
191+
],
192+
193+
}],
194+
model=model,
195+
max_tokens=128
196+
)
197+
198+
result = chat_completion_from_base64.choices[0].message.content
199+
print(f"Chat completion output:{result}")
200+
201+
```
202+
203+
## Tuning for the best performance on AMD chips
204+
205+
It is suggested to follow the [official instruction](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html) from AMD as the start point to optimize the workload.
206+
207+
For instance, it is highly suggested to disable numa_balancing, etc
208+
209+
```
210+
sudo sysctl kernel.numa_balancing=0
211+
```
212+
213+
## References
214+
215+
- [Inferencing and serving with vLLM on AMD GPUs](https://rocm.blogs.amd.com/artificial-intelligence/vllm/README.html)
216+
217+
- [AMD Instinct MI300X workload optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html)

0 commit comments

Comments
 (0)