refactor(bench): split bench_serving.rs into modules#340
Open
scatyf3 wants to merge 8 commits into
Open
Conversation
…der match Dedup bench_serving.rs ahead of re-applying the mixed-load harness: - Extract the duplicated submit + token-stream receive loop (previously inline in both BenchModel::timed_generation and timed_generation_batch) into a single run_scheduler_stream() free fn. Owns its args and borrows the handle so it composes inside the batch path's thread::spawn(move) with a cloned SchedulerHandle. Error/closed strings preserved verbatim. - Collapse the shared tail of the four scheduler-backed loader arms (DeepSeekV4, KimiK2, Qwen3, Qwen35) into a finish() closure; the per-model arms now differ only in engine construction. DeepSeekV2Lite keeps its own tail (distinct, non-scheduler bench type). Behavior-preserving (-117/+74). Verified: default build, deepseek-v2-lite build + 11 unit tests pass. kimi-k2/deepseek-v4 arms are structurally identical to the qwen arms; their builds are gated by toolchain prerequisites (NCCL / CuTe DSL) unrelated to this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First slice of the bench_serving module split. Relayout the binary to the canonical multi-file form (src/bin/bench_serving/main.rs) so submodules resolve as siblings, and move the CLI surface — Cli, Command, OutputFormat, CliEpBackend, all *Args structs, and their example/default consts — into cli.rs. Items and fields that main.rs still reads are pub(crate); main.rs pulls them back via `use cli::*`. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second slice of the split. Move the serializable report/metric types — RunInfo, DurationStats, CountStats, RequestMetrics, SnapshotReport, the Matrix/Curve report structs, and the BenchReport enum — into report.rs. Fields are constructed and read by the runners in main.rs, so they're pub(crate); main.rs pulls the types back via `use report::*`. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Next slices of the split: - metrics.rs: dur_ms, percentiles, summarize_durations/counts, aggregate_tok_s, generated_token_hash/trace. - render.rs: comfy_table table helpers and all render_* text renderers. Both pull report types via `use crate::report::*`; main.rs re-exports each module's pub(crate) items. Trimmed the now-unused comfy_table/io imports from main.rs (Cell/CellAlignment stay — inline tables remain in the snapshot/compare paths). Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move prompt resolution — truncate_preview, synthetic_prompt_tokens, synthetic_random_prompt (+ token-id bounds), PromptSpec, resolve_prompt_input, and the SYNTHETIC_PATTERN label — into prompt.rs. Items main.rs still uses are pub(crate); trimmed the now-unused rand::RngExt import from main.rs. Pure code motion, no behavior change. Verified: default build clean (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the generation timing harness into exec.rs: GenTimings, the BenchModel trait, run_timed, run_scheduler_stream, SchedulerBenchModel, and the feature-gated DeepSeekV2LiteBenchModel + its attribution/timing helpers. Items main.rs constructs or calls (trait, models + fields, GenTimings + fields, the dsv2 helpers) are pub(crate). The #[cfg(test)] module stays in main.rs — it exercises both the dsv2 helpers and build_request_metrics (a runner that stays here), so the crate root is its natural home; it reaches exec's fns via `use exec::*`. Trimmed the now-unused thread/mpsc/SchedulerRequest/TokenEvent imports from main.rs. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the request/matrix/curve benchmark drivers into runners.rs: measure_timings, build_request_metrics/iterations, run_info, the three bench_* drivers, render_text, emit_report, run_command, plus normalize_sizes and validate_run_args (and the DEFAULT_* prompt consts they use). All are pub(crate); runners glob-imports its sibling modules. The cfg(test, deepseek-v2-lite) module's now-local Duration/SamplingParams imports moved into the test module (they were only used there); trimmed the profiler/rng imports that left main.rs with measure_timings. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final slice of the split. Move the snapshot + git-baseline-compare subsystem into snapshot.rs: the SNAPSHOT_*/REGRESSION_* consts, run_snapshot, render_snapshot_text, run_compare, render_comparison, and the git/gpu/date/delta provenance helpers. snapshot glob-imports its siblings. main.rs is now a ~330-line orchestrator (command_seed, kimi_parallel_config, dispatch, main, the dsv2-lite test module). Trimmed its now-unused imports; qualified the lone kimi-gated ensure! call site so the import could go. Pure code motion, no behavior change. Verified: default build clean and all 11 deepseek-v2-lite unit tests pass (only the pre-existing ParallelConfig kimi-gated unused-import warning remains). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
|
LGTM. Could u rebase or merge latest main? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
To solve Issue #244, we want to add a mixed-load benchmark to bench_serving.rs. But this file is already ~2000 LOC, and the mixed-load code would push it to ~3000(currently in [my fork](pegainfer/pegainfer-server/src/bin/bench_serving.rs at feat/qwen3-mixed-load-itl · scatyf3/pegainfer · GitHub)) — which raises two issues:
code drive the scheduler the same way: submit a request, then drain its
token stream. That loop is already copy-pasted between
timed_generationand
timed_generation_batch, and mixed-load would copy it three more times(
spawn_background_streams,run_injector,mixed_warmup). Extracting itonce lets both old and new code share it instead of duplicating.
so the first step is to refactor
bench_serving.rsChange
refactor
bench_serving.rsmain.rsdispatch,main, testscli.rsCli/Command/*Argsreport.rsmetrics.rsrender.rsprompt.rsexec.rsBenchModel,run_timed,run_scheduler_stream, adaptersrunners.rssnapshot.rsVerification
cargo fmt --checkclean; default build clean.Next Step
implemment mixed-load ITL based on current refactored codebase