Skip to content

refactor(bench): split bench_serving.rs into modules#340

Open
scatyf3 wants to merge 8 commits into
openinfer-project:mainfrom
scatyf3:refactor/bench-serving-split
Open

refactor(bench): split bench_serving.rs into modules#340
scatyf3 wants to merge 8 commits into
openinfer-project:mainfrom
scatyf3:refactor/bench-serving-split

Conversation

@scatyf3

@scatyf3 scatyf3 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Motivation

To solve Issue #244, we want to add a mixed-load benchmark to bench_serving.rs. But this file is already ~2000 LOC, and the mixed-load code would push it to ~3000(currently in [my fork](pegainfer/pegainfer-server/src/bin/bench_serving.rs at feat/qwen3-mixed-load-itl · scatyf3/pegainfer · GitHub)) — which raises two issues:

  1. Hard to read / review — we split a file once it grows past ~1000 LOC.
  2. Shared utilities get duplicated — mixed-load and the existing benchmark
    code drive the scheduler the same way: submit a request, then drain its
    token stream. That loop is already copy-pasted between timed_generation
    and timed_generation_batch, and mixed-load would copy it three more times
    (spawn_background_streams, run_injector, mixed_warmup). Extracting it
    once lets both old and new code share it instead of duplicating.

so the first step is to refactor bench_serving.rs

Change

refactor bench_serving.rs

module LOC responsibility
main.rs 334 orchestration: model loader, dispatch, main, tests
cli.rs 231 Cli / Command / *Args
report.rs 171 serde report + metric types
metrics.rs 74 percentiles, stats, token traces
render.rs 295 comfy_table renderers
prompt.rs 124 prompt resolution / synthetic / salting
exec.rs 408 BenchModel, run_timed, run_scheduler_stream, adapters
runners.rs 452 request/matrix/curve drivers + report emission
snapshot.rs 379 snapshot generation + git-baseline compare

Verification

cargo fmt --check clean; default build clean.

Next Step

implemment mixed-load ITL based on current refactored codebase

scatyf3 and others added 8 commits June 10, 2026 12:44
…der match

Dedup bench_serving.rs ahead of re-applying the mixed-load harness:

- Extract the duplicated submit + token-stream receive loop (previously
  inline in both BenchModel::timed_generation and timed_generation_batch)
  into a single run_scheduler_stream() free fn. Owns its args and borrows
  the handle so it composes inside the batch path's thread::spawn(move)
  with a cloned SchedulerHandle. Error/closed strings preserved verbatim.
- Collapse the shared tail of the four scheduler-backed loader arms
  (DeepSeekV4, KimiK2, Qwen3, Qwen35) into a finish() closure; the
  per-model arms now differ only in engine construction. DeepSeekV2Lite
  keeps its own tail (distinct, non-scheduler bench type).

Behavior-preserving (-117/+74). Verified: default build, deepseek-v2-lite
build + 11 unit tests pass. kimi-k2/deepseek-v4 arms are structurally
identical to the qwen arms; their builds are gated by toolchain
prerequisites (NCCL / CuTe DSL) unrelated to this change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First slice of the bench_serving module split. Relayout the binary to the
canonical multi-file form (src/bin/bench_serving/main.rs) so submodules
resolve as siblings, and move the CLI surface — Cli, Command, OutputFormat,
CliEpBackend, all *Args structs, and their example/default consts — into
cli.rs. Items and fields that main.rs still reads are pub(crate); main.rs
pulls them back via `use cli::*`.

Pure code motion, no behavior change. Verified: default build clean (only
the pre-existing ParallelConfig kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second slice of the split. Move the serializable report/metric types —
RunInfo, DurationStats, CountStats, RequestMetrics, SnapshotReport, the
Matrix/Curve report structs, and the BenchReport enum — into report.rs.
Fields are constructed and read by the runners in main.rs, so they're
pub(crate); main.rs pulls the types back via `use report::*`.

Pure code motion, no behavior change. Verified: default build clean (only
the pre-existing ParallelConfig kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Next slices of the split:
- metrics.rs: dur_ms, percentiles, summarize_durations/counts,
  aggregate_tok_s, generated_token_hash/trace.
- render.rs: comfy_table table helpers and all render_* text renderers.

Both pull report types via `use crate::report::*`; main.rs re-exports each
module's pub(crate) items. Trimmed the now-unused comfy_table/io imports
from main.rs (Cell/CellAlignment stay — inline tables remain in the
snapshot/compare paths).

Pure code motion, no behavior change. Verified: default build clean (only
the pre-existing ParallelConfig kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move prompt resolution — truncate_preview, synthetic_prompt_tokens,
synthetic_random_prompt (+ token-id bounds), PromptSpec, resolve_prompt_input,
and the SYNTHETIC_PATTERN label — into prompt.rs. Items main.rs still uses
are pub(crate); trimmed the now-unused rand::RngExt import from main.rs.

Pure code motion, no behavior change. Verified: default build clean (only
the pre-existing ParallelConfig kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the generation timing harness into exec.rs: GenTimings, the BenchModel
trait, run_timed, run_scheduler_stream, SchedulerBenchModel, and the
feature-gated DeepSeekV2LiteBenchModel + its attribution/timing helpers.
Items main.rs constructs or calls (trait, models + fields, GenTimings +
fields, the dsv2 helpers) are pub(crate).

The #[cfg(test)] module stays in main.rs — it exercises both the dsv2
helpers and build_request_metrics (a runner that stays here), so the crate
root is its natural home; it reaches exec's fns via `use exec::*`. Trimmed
the now-unused thread/mpsc/SchedulerRequest/TokenEvent imports from main.rs.

Pure code motion, no behavior change. Verified: default build clean and all
11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig
kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the request/matrix/curve benchmark drivers into runners.rs:
measure_timings, build_request_metrics/iterations, run_info, the three
bench_* drivers, render_text, emit_report, run_command, plus normalize_sizes
and validate_run_args (and the DEFAULT_* prompt consts they use). All are
pub(crate); runners glob-imports its sibling modules.

The cfg(test, deepseek-v2-lite) module's now-local Duration/SamplingParams
imports moved into the test module (they were only used there); trimmed the
profiler/rng imports that left main.rs with measure_timings.

Pure code motion, no behavior change. Verified: default build clean and all
11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig
kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final slice of the split. Move the snapshot + git-baseline-compare subsystem
into snapshot.rs: the SNAPSHOT_*/REGRESSION_* consts, run_snapshot,
render_snapshot_text, run_compare, render_comparison, and the
git/gpu/date/delta provenance helpers. snapshot glob-imports its siblings.

main.rs is now a ~330-line orchestrator (command_seed, kimi_parallel_config,
dispatch, main, the dsv2-lite test module). Trimmed its now-unused imports;
qualified the lone kimi-gated ensure! call site so the import could go.

Pure code motion, no behavior change. Verified: default build clean and all
11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig
kimi-gated unused-import warning remains).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xiaguan

xiaguan commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

LGTM. Could u rebase or merge latest main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants