Problem
The vLLM benchmark scripts under Magpie/scripts/benchmark/ duplicate a large, near-identical server-boot phase across four files (vllm_mi300x.sh, vllm_mi355x.sh, vllm_mi300x_mm.sh, vllm_mi355x_mm.sh): PHASE parsing, BENCHMARK_BASE_URL→client handling, env checks, MEC firmware check, ROCR/HIP_VISIBLE_DEVICES setup, profiler args, vllm serve launch, wait_for_server_ready, and PID-file/disown handling (~120 lines each).
This duplication has already caused a silent drift bug: vllm_mi300x_mm.sh shipped with the MEC firmware threshold -lt 0 instead of -lt 177 (fixed in PR #44), so HSA_NO_SCRATCH_RECLAIM was effectively never set on that path.
Additional finding: the mi300x vs mi355x scripts differ only cosmetically (header comment, [vllm_miXXXx] echo label, two comment strings) — same firmware threshold, same AITER default, same server logic. So the GPU split itself is near-pure duplication.
Proposed refactor
Extract the shared server-boot phase into a sourced helper (e.g. vllm_bench_common.sh) used by all vLLM scripts. Two viable shapes:
- Sourced helper, keep filename dispatch. Each
vllm_{gpu}{,_mm}.sh sources the common file and calls magpie_vllm_serve(), then runs only its own client phase (text: InferenceX run_benchmark_serving / random; mm: upstream vllm bench serve --dataset-name random-mm). GPU label derived from $0 basename. Preserves Hyperloom's filename-based dispatch (vllm_{gpu}_mm.sh).
- Single script, branch client phase on
DATASET. Collapse to one vllm_{gpu}.sh that picks the text vs random-mm client driver based on DATASET=random-mm. Cleaner, but ripples into Hyperloom's auto-wire (which currently sets benchmark_script=vllm_{gpu}_mm.sh) — would change that to set DATASET only.
Option 1 is lower-blast-radius; Option 2 eliminates the most files.
Why not in the VL PR (#44)
The refactor necessarily modifies the pre-existing, validated text scripts (and/or the Hyperloom dispatch), and changes the exact artifacts validated end-to-end on Qwen3-VL-235B-FP8. Re-validation requires the GPU container + Claude API tunnel, which was unavailable at the time. Deferred so it can be done where both text and mm paths are re-validated.
Acceptance
- One shared server-boot helper; no duplicated server phase across vLLM scripts.
- Firmware threshold / AITER defaults defined once.
- Text and mm benchmark paths re-validated (text: existing flow; mm:
random-mm on a VL model).
Problem
The vLLM benchmark scripts under
Magpie/scripts/benchmark/duplicate a large, near-identical server-boot phase across four files (vllm_mi300x.sh,vllm_mi355x.sh,vllm_mi300x_mm.sh,vllm_mi355x_mm.sh): PHASE parsing,BENCHMARK_BASE_URL→client handling, env checks, MEC firmware check,ROCR/HIP_VISIBLE_DEVICESsetup, profiler args,vllm servelaunch,wait_for_server_ready, and PID-file/disown handling (~120 lines each).This duplication has already caused a silent drift bug:
vllm_mi300x_mm.shshipped with the MEC firmware threshold-lt 0instead of-lt 177(fixed in PR #44), soHSA_NO_SCRATCH_RECLAIMwas effectively never set on that path.Additional finding: the
mi300xvsmi355xscripts differ only cosmetically (header comment,[vllm_miXXXx]echo label, two comment strings) — same firmware threshold, same AITER default, same server logic. So the GPU split itself is near-pure duplication.Proposed refactor
Extract the shared server-boot phase into a sourced helper (e.g.
vllm_bench_common.sh) used by all vLLM scripts. Two viable shapes:vllm_{gpu}{,_mm}.shsources the common file and callsmagpie_vllm_serve(), then runs only its own client phase (text: InferenceXrun_benchmark_serving/random; mm: upstreamvllm bench serve --dataset-name random-mm). GPU label derived from$0basename. Preserves Hyperloom's filename-based dispatch (vllm_{gpu}_mm.sh).DATASET. Collapse to onevllm_{gpu}.shthat picks the text vsrandom-mmclient driver based onDATASET=random-mm. Cleaner, but ripples into Hyperloom's auto-wire (which currently setsbenchmark_script=vllm_{gpu}_mm.sh) — would change that to setDATASETonly.Option 1 is lower-blast-radius; Option 2 eliminates the most files.
Why not in the VL PR (#44)
The refactor necessarily modifies the pre-existing, validated text scripts (and/or the Hyperloom dispatch), and changes the exact artifacts validated end-to-end on Qwen3-VL-235B-FP8. Re-validation requires the GPU container + Claude API tunnel, which was unavailable at the time. Deferred so it can be done where both text and mm paths are re-validated.
Acceptance
random-mmon a VL model).