Skip to content

refactor(vllm-bench): de-duplicate server-boot phase across vllm benchmark scripts #45

Description

@vorapolsiloai

Problem

The vLLM benchmark scripts under Magpie/scripts/benchmark/ duplicate a large, near-identical server-boot phase across four files (vllm_mi300x.sh, vllm_mi355x.sh, vllm_mi300x_mm.sh, vllm_mi355x_mm.sh): PHASE parsing, BENCHMARK_BASE_URL→client handling, env checks, MEC firmware check, ROCR/HIP_VISIBLE_DEVICES setup, profiler args, vllm serve launch, wait_for_server_ready, and PID-file/disown handling (~120 lines each).

This duplication has already caused a silent drift bug: vllm_mi300x_mm.sh shipped with the MEC firmware threshold -lt 0 instead of -lt 177 (fixed in PR #44), so HSA_NO_SCRATCH_RECLAIM was effectively never set on that path.

Additional finding: the mi300x vs mi355x scripts differ only cosmetically (header comment, [vllm_miXXXx] echo label, two comment strings) — same firmware threshold, same AITER default, same server logic. So the GPU split itself is near-pure duplication.

Proposed refactor

Extract the shared server-boot phase into a sourced helper (e.g. vllm_bench_common.sh) used by all vLLM scripts. Two viable shapes:

  1. Sourced helper, keep filename dispatch. Each vllm_{gpu}{,_mm}.sh sources the common file and calls magpie_vllm_serve(), then runs only its own client phase (text: InferenceX run_benchmark_serving / random; mm: upstream vllm bench serve --dataset-name random-mm). GPU label derived from $0 basename. Preserves Hyperloom's filename-based dispatch (vllm_{gpu}_mm.sh).
  2. Single script, branch client phase on DATASET. Collapse to one vllm_{gpu}.sh that picks the text vs random-mm client driver based on DATASET=random-mm. Cleaner, but ripples into Hyperloom's auto-wire (which currently sets benchmark_script=vllm_{gpu}_mm.sh) — would change that to set DATASET only.

Option 1 is lower-blast-radius; Option 2 eliminates the most files.

Why not in the VL PR (#44)

The refactor necessarily modifies the pre-existing, validated text scripts (and/or the Hyperloom dispatch), and changes the exact artifacts validated end-to-end on Qwen3-VL-235B-FP8. Re-validation requires the GPU container + Claude API tunnel, which was unavailable at the time. Deferred so it can be done where both text and mm paths are re-validated.

Acceptance

  • One shared server-boot helper; no duplicated server phase across vLLM scripts.
  • Firmware threshold / AITER defaults defined once.
  • Text and mm benchmark paths re-validated (text: existing flow; mm: random-mm on a VL model).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions