chore(qwen3): docs/dead-code cleanup — stale records, deleted-tool references, unused consts#304
chore(qwen3): docs/dead-code cleanup — stale records, deleted-tool references, unused consts#304odysa wants to merge 2 commits into
Conversation
…ferences, unused consts Closes xiaguan#248. Dead code: - drop unused pub consts HIDDEN_SIZE/INTERMEDIATE_SIZE from batch_decode_trace.rs - drop uncalled probe_model() from pegainfer-qwen3-4b and the now-orphaned ModelInfo from pegainfer-engine (server inlines its own detection; qwen35's pair was removed in xiaguan#258) Docs: - collapse model-crate.md + kernels-crate.md (obsolete crates/ layout, deleted qwen3_kernel_snapshot bench) into a slim crate-layout.md describing the crate that exists; load-bearing split-K facts and CUPTI/bench gotchas kept - rewrite tp-design.md around the implemented controller/worker runtime and promote the 3 real open items (TP correctness coverage, vocab-parallel embedding/lm_head, TP CUDA-graph) - lift the issue-xiaguan#85 KV admission lessons into lessons/kv-full-lifetime-reservation.md and delete kv-pressure-hang.md - update index.md rows and the qwen3/qwen35 roadmap cleanup ledgers
…es/ paths Follow-up to the xiaguan#248 deletions — surviving docs no longer point at deleted files or pre-workspace-refactor paths: - kernel-op-reports.md: model-crate.md/kernels-crate.md references now route to crate-layout.md (with at-the-time naming kept for history), crates/ prefixes dropped, and the stale qwen3_kernel_snapshot check line rewritten to point at the Step 5 report-bin commands - deepseek-v4/kernel-paths.md, qwen35/model-crate.md: same rerouting - deepseek-v4/pplx-ep-integration.md: pplx wrapper path corrected to pegainfer-comm/crates/pegainfer-comm-p2p-all-to-all/ - the corrected submodule init command was run and verified locally
There was a problem hiding this comment.
Code Review
This pull request performs a documentation and dead-code cleanup for the Qwen3-4B model. It consolidates Qwen3-4B crate documentation into a single layout file, updates the tensor parallelism design document to reflect the implemented state, and extracts KV pressure lessons into a dedicated lesson file. Additionally, it removes unused dead code, including the probe_model function, the ModelInfo struct, and unused constants in the Qwen3-4B crate. There are no review comments, so no further feedback is provided.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Closes #248.
HIDDEN_SIZE/INTERMEDIATE_SIZEconsts,probe_model(), and the now-orphanedModelInfomodel-crate.md+kernels-crate.mdinto a slimcrate-layout.md; rewritetp-design.mdaround the implemented TP runtime; lift the bug: QPS=2 vLLM bench can hang Qwen3-4B serving under KV cache pressure #85 KV admission lessons intolessons/kv-full-lifetime-reservation.md; updateindex.mdand surviving doc cross-referencesDraft pending a CUDA-host
clippy/test run (pegainfer-qwen3-4bcan't compile locally without nvcc). CPU-reachable crates (pegainfer-engine,pegainfer-vllm-frontend) check clean.