Summary
Magpie currently supports single-node vLLM/SGLang server launch, benchmarking, and trace collection. As inference workloads move toward larger models (e.g. DeepSeek-R1, GLM-5, MiniMax-M2.5, Qwen3.5) that no longer fit on a single node, we need first-class support for multi-node serving and benchmarking in Magpie.
Use case
Downstream automation tools that wrap Magpie need to optimize models that require TP / PP / EP across multiple nodes. Today this either requires (a) bypassing Magpie and manually launching multi-node servers, or (b) restricting optimization to single-node configurations, which excludes the large-model class entirely.
Required capabilities
1. Multi-node server
- Launch vLLM / SGLang server with
TP * PP * DP > single-node-GPUs spanning multiple nodes
- Coordinate worker discovery (head node + worker nodes), shared NFS / object store for model weights, and inter-node networking (RCCL/NCCL, RDMA where available)
- Reuse the existing
scheduler.ray execution path where possible — Ray cluster already supports multi-node, but the launch + readiness + health-check logic for multi-node servers needs explicit coverage
- Configurable in
Magpie/config.yaml (e.g. scheduler.nodes, scheduler.head_node, scheduler.worker_nodes) and surfaced in the benchmark config
2. Multi-node benchmark
- Benchmark client able to target a multi-node server through a single endpoint (head node) with correct request distribution
- Per-node and aggregated metrics: throughput, TTFT, TPOT, GPU utilization
- Failure handling: if a worker node dies mid-run, surface a clear error (vs. silently degraded numbers)
3. Multi-node trace collection
- Coordinate
torch.profiler / SGLANG_TORCH_PROFILER_DIR / vLLM --profiler-config activation simultaneously across all nodes
- Collect per-rank trace files from every worker node back to the head node (or to shared NFS)
- Naming convention that preserves node + TP rank information, e.g.
node{N}-TP{R}.trace.json.gz, so downstream tools (TraceLens) can identify per-node behavior
- Optional aggregation / merging step for cross-node analysis (e.g. comm overlap across nodes)
Out of scope (for this issue)
- Auto-scaling the cluster size based on workload
- GUI for multi-node topology visualization
- Failover / restart of a dead worker mid-run
Suggested approach
- Build on top of the existing
Remote Ray Cluster execution environment
- Add multi-node launch helpers under
Magpie/scheduler/ (or wherever the Ray integration lives today)
- Update
examples/ with a multi-node benchmark config sample
- Document the new flow in
README.md and docs/
Summary
Magpie currently supports single-node vLLM/SGLang server launch, benchmarking, and trace collection. As inference workloads move toward larger models (e.g. DeepSeek-R1, GLM-5, MiniMax-M2.5, Qwen3.5) that no longer fit on a single node, we need first-class support for multi-node serving and benchmarking in Magpie.
Use case
Downstream automation tools that wrap Magpie need to optimize models that require TP / PP / EP across multiple nodes. Today this either requires (a) bypassing Magpie and manually launching multi-node servers, or (b) restricting optimization to single-node configurations, which excludes the large-model class entirely.
Required capabilities
1. Multi-node server
TP * PP * DP > single-node-GPUsspanning multiple nodesscheduler.rayexecution path where possible — Ray cluster already supports multi-node, but the launch + readiness + health-check logic for multi-node servers needs explicit coverageMagpie/config.yaml(e.g.scheduler.nodes,scheduler.head_node,scheduler.worker_nodes) and surfaced in the benchmark config2. Multi-node benchmark
3. Multi-node trace collection
torch.profiler/SGLANG_TORCH_PROFILER_DIR/ vLLM--profiler-configactivation simultaneously across all nodesnode{N}-TP{R}.trace.json.gz, so downstream tools (TraceLens) can identify per-node behaviorOut of scope (for this issue)
Suggested approach
Remote Ray Clusterexecution environmentMagpie/scheduler/(or wherever the Ray integration lives today)examples/with a multi-node benchmark config sampleREADME.mdanddocs/