Skip to content

[Feature Request] Multi-node support for server, benchmark, and trace collection #30

Description

@xiaofei-zheng

Summary

Magpie currently supports single-node vLLM/SGLang server launch, benchmarking, and trace collection. As inference workloads move toward larger models (e.g. DeepSeek-R1, GLM-5, MiniMax-M2.5, Qwen3.5) that no longer fit on a single node, we need first-class support for multi-node serving and benchmarking in Magpie.

Use case

Downstream automation tools that wrap Magpie need to optimize models that require TP / PP / EP across multiple nodes. Today this either requires (a) bypassing Magpie and manually launching multi-node servers, or (b) restricting optimization to single-node configurations, which excludes the large-model class entirely.

Required capabilities

1. Multi-node server

  • Launch vLLM / SGLang server with TP * PP * DP > single-node-GPUs spanning multiple nodes
  • Coordinate worker discovery (head node + worker nodes), shared NFS / object store for model weights, and inter-node networking (RCCL/NCCL, RDMA where available)
  • Reuse the existing scheduler.ray execution path where possible — Ray cluster already supports multi-node, but the launch + readiness + health-check logic for multi-node servers needs explicit coverage
  • Configurable in Magpie/config.yaml (e.g. scheduler.nodes, scheduler.head_node, scheduler.worker_nodes) and surfaced in the benchmark config

2. Multi-node benchmark

  • Benchmark client able to target a multi-node server through a single endpoint (head node) with correct request distribution
  • Per-node and aggregated metrics: throughput, TTFT, TPOT, GPU utilization
  • Failure handling: if a worker node dies mid-run, surface a clear error (vs. silently degraded numbers)

3. Multi-node trace collection

  • Coordinate torch.profiler / SGLANG_TORCH_PROFILER_DIR / vLLM --profiler-config activation simultaneously across all nodes
  • Collect per-rank trace files from every worker node back to the head node (or to shared NFS)
  • Naming convention that preserves node + TP rank information, e.g. node{N}-TP{R}.trace.json.gz, so downstream tools (TraceLens) can identify per-node behavior
  • Optional aggregation / merging step for cross-node analysis (e.g. comm overlap across nodes)

Out of scope (for this issue)

  • Auto-scaling the cluster size based on workload
  • GUI for multi-node topology visualization
  • Failover / restart of a dead worker mid-run

Suggested approach

  • Build on top of the existing Remote Ray Cluster execution environment
  • Add multi-node launch helpers under Magpie/scheduler/ (or wherever the Ray integration lives today)
  • Update examples/ with a multi-node benchmark config sample
  • Document the new flow in README.md and docs/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions