refactor(vllm-bench): de-duplicate server-boot phase across vllm benchmark scripts

## Problem

The vLLM benchmark scripts under `Magpie/scripts/benchmark/` duplicate a large, near-identical **server-boot phase** across four files (`vllm_mi300x.sh`, `vllm_mi355x.sh`, `vllm_mi300x_mm.sh`, `vllm_mi355x_mm.sh`): PHASE parsing, `BENCHMARK_BASE_URL`→client handling, env checks, MEC firmware check, `ROCR/HIP_VISIBLE_DEVICES` setup, profiler args, `vllm serve` launch, `wait_for_server_ready`, and PID-file/disown handling (~120 lines each).

This duplication has already caused a silent drift bug: `vllm_mi300x_mm.sh` shipped with the MEC firmware threshold `-lt 0` instead of `-lt 177` (fixed in PR #44), so `HSA_NO_SCRATCH_RECLAIM` was effectively never set on that path.

Additional finding: the `mi300x` vs `mi355x` scripts differ **only cosmetically** (header comment, `[vllm_miXXXx]` echo label, two comment strings) — same firmware threshold, same AITER default, same server logic. So the GPU split itself is near-pure duplication.

## Proposed refactor

Extract the shared server-boot phase into a sourced helper (e.g. `vllm_bench_common.sh`) used by all vLLM scripts. Two viable shapes:

1. **Sourced helper, keep filename dispatch.** Each `vllm_{gpu}{,_mm}.sh` sources the common file and calls `magpie_vllm_serve()`, then runs only its own client phase (text: InferenceX `run_benchmark_serving` / `random`; mm: upstream `vllm bench serve --dataset-name random-mm`). GPU label derived from `$0` basename. Preserves Hyperloom's filename-based dispatch (`vllm_{gpu}_mm.sh`).
2. **Single script, branch client phase on `DATASET`.** Collapse to one `vllm_{gpu}.sh` that picks the text vs `random-mm` client driver based on `DATASET=random-mm`. Cleaner, but ripples into Hyperloom's auto-wire (which currently sets `benchmark_script=vllm_{gpu}_mm.sh`) — would change that to set `DATASET` only.

Option 1 is lower-blast-radius; Option 2 eliminates the most files.

## Why not in the VL PR (#44)

The refactor necessarily modifies the pre-existing, validated text scripts (and/or the Hyperloom dispatch), and changes the exact artifacts validated end-to-end on Qwen3-VL-235B-FP8. Re-validation requires the GPU container + Claude API tunnel, which was unavailable at the time. Deferred so it can be done where both text and mm paths are re-validated.

## Acceptance
- One shared server-boot helper; no duplicated server phase across vLLM scripts.
- Firmware threshold / AITER defaults defined once.
- Text and mm benchmark paths re-validated (text: existing flow; mm: `random-mm` on a VL model).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(vllm-bench): de-duplicate server-boot phase across vllm benchmark scripts #45

Problem

Proposed refactor

Why not in the VL PR (#44)

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

refactor(vllm-bench): de-duplicate server-boot phase across vllm benchmark scripts #45

Description

Problem

Proposed refactor

Why not in the VL PR (#44)

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions