refactor(bench): split bench_serving.rs into modules by scatyf3 · Pull Request #340 · openinfer-project/openinfer

scatyf3 · 2026-06-10T17:10:01Z

Motivation

To solve Issue #244, we want to add a mixed-load benchmark to bench_serving.rs. But this file is already ~2000 LOC, and the mixed-load code would push it to ~3000(currently in [my fork](pegainfer/pegainfer-server/src/bin/bench_serving.rs at feat/qwen3-mixed-load-itl · scatyf3/pegainfer · GitHub)) — which raises two issues:

Hard to read / review — we split a file once it grows past ~1000 LOC.
Shared utilities get duplicated — mixed-load and the existing benchmark
code drive the scheduler the same way: submit a request, then drain its
token stream. That loop is already copy-pasted between timed_generation
and timed_generation_batch, and mixed-load would copy it three more times
(spawn_background_streams, run_injector, mixed_warmup). Extracting it
once lets both old and new code share it instead of duplicating.

so the first step is to refactor bench_serving.rs

Change

refactor bench_serving.rs

module	LOC	responsibility
`main.rs`	334	orchestration: model loader, `dispatch`, `main`, tests
`cli.rs`	231	`Cli` / `Command` / `*Args`
`report.rs`	171	serde report + metric types
`metrics.rs`	74	percentiles, stats, token traces
`render.rs`	295	comfy_table renderers
`prompt.rs`	124	prompt resolution / synthetic / salting
`exec.rs`	408	`BenchModel`, `run_timed`, `run_scheduler_stream`, adapters
`runners.rs`	452	request/matrix/curve drivers + report emission
`snapshot.rs`	379	snapshot generation + git-baseline compare

Verification

cargo fmt --check clean; default build clean.

Next Step

implemment mixed-load ITL based on current refactored codebase

…der match Dedup bench_serving.rs ahead of re-applying the mixed-load harness: - Extract the duplicated submit + token-stream receive loop (previously inline in both BenchModel::timed_generation and timed_generation_batch) into a single run_scheduler_stream() free fn. Owns its args and borrows the handle so it composes inside the batch path's thread::spawn(move) with a cloned SchedulerHandle. Error/closed strings preserved verbatim. - Collapse the shared tail of the four scheduler-backed loader arms (DeepSeekV4, KimiK2, Qwen3, Qwen35) into a finish() closure; the per-model arms now differ only in engine construction. DeepSeekV2Lite keeps its own tail (distinct, non-scheduler bench type). Behavior-preserving (-117/+74). Verified: default build, deepseek-v2-lite build + 11 unit tests pass. kimi-k2/deepseek-v4 arms are structurally identical to the qwen arms; their builds are gated by toolchain prerequisites (NCCL / CuTe DSL) unrelated to this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

First slice of the bench_serving module split. Relayout the binary to the canonical multi-file form (src/bin/bench_serving/main.rs) so submodules resolve as siblings, and move the CLI surface — Cli, Command, OutputFormat, CliEpBackend, all *Args structs, and their example/default consts — into cli.rs. Items and fields that main.rs still reads are pub(crate); main.rs pulls them back via `use cli::*`. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Second slice of the split. Move the serializable report/metric types — RunInfo, DurationStats, CountStats, RequestMetrics, SnapshotReport, the Matrix/Curve report structs, and the BenchReport enum — into report.rs. Fields are constructed and read by the runners in main.rs, so they're pub(crate); main.rs pulls the types back via `use report::*`. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Next slices of the split: - metrics.rs: dur_ms, percentiles, summarize_durations/counts, aggregate_tok_s, generated_token_hash/trace. - render.rs: comfy_table table helpers and all render_* text renderers. Both pull report types via `use crate::report::*`; main.rs re-exports each module's pub(crate) items. Trimmed the now-unused comfy_table/io imports from main.rs (Cell/CellAlignment stay — inline tables remain in the snapshot/compare paths). Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move prompt resolution — truncate_preview, synthetic_prompt_tokens, synthetic_random_prompt (+ token-id bounds), PromptSpec, resolve_prompt_input, and the SYNTHETIC_PATTERN label — into prompt.rs. Items main.rs still uses are pub(crate); trimmed the now-unused rand::RngExt import from main.rs. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the generation timing harness into exec.rs: GenTimings, the BenchModel trait, run_timed, run_scheduler_stream, SchedulerBenchModel, and the feature-gated DeepSeekV2LiteBenchModel + its attribution/timing helpers. Items main.rs constructs or calls (trait, models + fields, GenTimings + fields, the dsv2 helpers) are pub(crate). The #[cfg(test)] module stays in main.rs — it exercises both the dsv2 helpers and build_request_metrics (a runner that stays here), so the crate root is its natural home; it reaches exec's fns via `use exec::*`. Trimmed the now-unused thread/mpsc/SchedulerRequest/TokenEvent imports from main.rs. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the request/matrix/curve benchmark drivers into runners.rs: measure_timings, build_request_metrics/iterations, run_info, the three bench_* drivers, render_text, emit_report, run_command, plus normalize_sizes and validate_run_args (and the DEFAULT_* prompt consts they use). All are pub(crate); runners glob-imports its sibling modules. The cfg(test, deepseek-v2-lite) module's now-local Duration/SamplingParams imports moved into the test module (they were only used there); trimmed the profiler/rng imports that left main.rs with measure_timings. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Final slice of the split. Move the snapshot + git-baseline-compare subsystem into snapshot.rs: the SNAPSHOT_*/REGRESSION_* consts, run_snapshot, render_snapshot_text, run_compare, render_comparison, and the git/gpu/date/delta provenance helpers. snapshot glob-imports its siblings. main.rs is now a ~330-line orchestrator (command_seed, kimi_parallel_config, dispatch, main, the dsv2-lite test module). Trimmed its now-unused imports; qualified the lone kimi-gated ensure! call site so the import could go. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

xiaguan · 2026-06-11T07:40:30Z

LGTM. Could u rebase or merge latest main?

scatyf3 and others added 8 commits June 10, 2026 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bench): split bench_serving.rs into modules#340

refactor(bench): split bench_serving.rs into modules#340
scatyf3 wants to merge 8 commits into
openinfer-project:mainfrom
scatyf3:refactor/bench-serving-split

scatyf3 commented Jun 10, 2026 •

edited

Loading

Uh oh!

xiaguan commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scatyf3 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Change

Verification

Next Step

Uh oh!

xiaguan commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scatyf3 commented Jun 10, 2026 •

edited

Loading