forked from InferenceMAX/InferenceMAX
-
Notifications
You must be signed in to change notification settings - Fork 0
Add Atom inference support for MI355X #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
8825f3e
feat: add atom inference support and fix workflow parameter issue
indianspeedster 60dee07
Update dsr1_fp8_mi355x_atom_docker.sh
indianspeedster 3f25556
Delete runners/launch_mi355x-amdatomtw.sh
indianspeedster 141a99f
chore: update atom image to rocm7.1.1-ubuntu24.04-pytorch2.9-atom0.1.…
indianspeedster f592dce
fix: resource cleanup to only clean bmk-server container
indianspeedster 6d4db29
chore: update atom image tag to MI350x
indianspeedster File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # ========= Required Env Vars ========= | ||
| # HF_TOKEN | ||
| # HF_HUB_CACHE | ||
| # MODEL | ||
| # PORT | ||
| # TP | ||
| # CONC | ||
| # MAX_MODEL_LEN | ||
|
|
||
| # Calculate max-model-len based on ISL and OSL | ||
| if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then | ||
| CALCULATED_MAX_MODEL_LEN="" | ||
| else | ||
| CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 " | ||
| fi | ||
|
|
||
| set -x | ||
| python3 -m atom.entrypoints.openai_server \ | ||
| --model $MODEL \ | ||
| --server-port $PORT \ | ||
| -tp $TP \ | ||
| --kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # ========= Required Env Vars ========= | ||
| # HF_TOKEN | ||
| # HF_HUB_CACHE | ||
| # MODEL | ||
| # PORT | ||
| # TP | ||
| # CONC | ||
| # MAX_MODEL_LEN | ||
|
|
||
| # Calculate max-model-len based on ISL and OSL | ||
| if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then | ||
| CALCULATED_MAX_MODEL_LEN="" | ||
| else | ||
| CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 " | ||
| fi | ||
|
|
||
| set -x | ||
|
|
||
| BLOCK_SIZE=${BLOCK_SIZE:-16} | ||
| python3 -m atom.entrypoints.openai_server \ | ||
| --model $MODEL \ | ||
| --server-port $PORT \ | ||
| -tp $TP \ | ||
| --kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN \ | ||
| --block-size $BLOCK_SIZE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # ========= Required Env Vars ========= | ||
| # HF_TOKEN | ||
| # HF_HUB_CACHE | ||
| # MODEL | ||
| # PORT | ||
| # TP | ||
| # CONC | ||
| # MAX_MODEL_LEN | ||
|
|
||
| # Calculate max-model-len based on ISL and OSL | ||
| if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then | ||
| CALCULATED_MAX_MODEL_LEN="" | ||
| else | ||
| CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 " | ||
| fi | ||
|
|
||
| set -x | ||
| export HSA_NO_SCRATCH_RECLAIM=1 | ||
| python3 -m atom.entrypoints.openai_server \ | ||
| --model $MODEL \ | ||
| --server-port $PORT \ | ||
| -tp $TP \ | ||
| --kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # === Workflow-defined Env Vars === | ||
| # IMAGE | ||
| # MODEL | ||
| # TP | ||
| # HF_HUB_CACHE | ||
| # ISL | ||
| # OSL | ||
| # MAX_MODEL_LEN | ||
| # RANDOM_RANGE_RATIO | ||
| # CONC | ||
| # GITHUB_WORKSPACE | ||
| # RESULT_FILENAME | ||
| # HF_TOKEN | ||
| # FRAMEWORK | ||
|
|
||
| HF_HUB_CACHE_MOUNT="/mnt/hf_hub_cache/" # Temp solution | ||
| PORT=8888 | ||
|
|
||
| # Determine framework suffix for benchmark script | ||
| FRAMEWORK_SUFFIX=$([[ "$FRAMEWORK" == "atom" ]] && printf '_atom' || printf '') | ||
|
|
||
| network_name="bmk-net" | ||
| server_name="bmk-server" | ||
| client_name="bmk-client" | ||
|
|
||
| # Cleanup: stop server container and remove network | ||
| docker stop $server_name 2>/dev/null || true | ||
| docker rm $server_name 2>/dev/null || true | ||
| docker network rm $network_name 2>/dev/null || true | ||
|
|
||
| docker network create $network_name | ||
|
|
||
| set -x | ||
| docker pull $IMAGE | ||
| DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' "$IMAGE" | cut -d'@' -f2) | ||
| echo "The image digest is: $DIGEST" | ||
|
|
||
| set -x | ||
| docker run --rm -d --ipc=host --shm-size=16g --network=$network_name --name=$server_name \ | ||
| --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem \ | ||
| --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ | ||
| -v $HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE \ | ||
| -v $GITHUB_WORKSPACE:/workspace/ -w /workspace/ \ | ||
| -e HF_TOKEN -e HF_HUB_CACHE -e MODEL -e TP -e CONC -e MAX_MODEL_LEN -e PORT=$PORT \ | ||
| -e ISL -e OSL \ | ||
| --entrypoint=/bin/bash \ | ||
| $IMAGE \ | ||
| benchmarks/"${EXP_NAME%%_*}_${PRECISION}_mi355x${FRAMEWORK_SUFFIX}_docker.sh" | ||
|
|
||
| set +x | ||
| while IFS= read -r line; do | ||
| printf '%s\n' "$line" | ||
| if [[ "$line" =~ Application\ startup\ complete ]]; then | ||
| break | ||
| fi | ||
| done < <(docker logs -f --tail=0 $server_name 2>&1) | ||
|
|
||
| if [[ "$MODEL" == "amd/DeepSeek-R1-0528-MXFP4-Preview" || "$MODEL" == "deepseek-ai/DeepSeek-R1-0528" ]]; then | ||
| if [[ "$OSL" == "8192" ]]; then | ||
| #NUM_PROMPTS=$(( CONC * 20 )) | ||
| NUM_PROMPTS=$(( CONC * 2 )) # atom has no much compilation overhead for dsr1 | ||
| else | ||
| #NUM_PROMPTS=$(( CONC * 50 )) | ||
| NUM_PROMPTS=$(( CONC * 10 )) # atom has no much compilation overhead for dsr1 | ||
| fi | ||
| else | ||
| if [[ "$OSL" == "8192" ]]; then | ||
| NUM_PROMPTS=$(( CONC * 2 )) | ||
| else | ||
| NUM_PROMPTS=$(( CONC * 10 )) | ||
| fi | ||
| fi | ||
|
|
||
| set -x | ||
| echo $GITHUB_WORKSPACE | ||
| git clone https://github.com/kimbochen/bench_serving.git | ||
| git clone https://github.com/kimbochen/bench_serving.git $GITHUB_WORKSPACE/bench_serving | ||
|
|
||
| sleep 5 | ||
|
|
||
| set -x | ||
| docker run --rm --network=$network_name --name=$client_name \ | ||
| -v $GITHUB_WORKSPACE:/workspace/ -w /workspace/ \ | ||
| -e HF_TOKEN -e PYTHONPYCACHEPREFIX=/tmp/pycache/ \ | ||
| --entrypoint=/bin/bash \ | ||
| $(echo "$IMAGE" | sed 's/#/\//') \ | ||
| -lc "pip install -q datasets pandas && \ | ||
| python3 bench_serving/benchmark_serving.py \ | ||
| --model=$MODEL --backend=vllm --base-url="http://$server_name:$PORT" \ | ||
indianspeedster marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| --dataset-name=random \ | ||
| --random-input-len=$ISL --random-output-len=$OSL --random-range-ratio=$RANDOM_RANGE_RATIO \ | ||
| --num-prompts=$NUM_PROMPTS \ | ||
| --max-concurrency=$CONC \ | ||
| --trust-remote-code \ | ||
| --request-rate=inf --ignore-eos \ | ||
| --save-result --percentile-metrics="ttft,tpot,itl,e2el" \ | ||
| --result-dir=/workspace/ --result-filename=$RESULT_FILENAME.json" | ||
|
|
||
| if ls gpucore.* 1> /dev/null 2>&1; then | ||
| echo "gpucore files exist. not good" | ||
| rm -f gpucore.* | ||
| fi | ||
|
|
||
|
|
||
| # Cleanup: stop server container and remove network | ||
| docker stop $server_name 2>/dev/null || true | ||
| docker rm $server_name 2>/dev/null || true | ||
| docker network rm $network_name 2>/dev/null || true | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.