[None][feat] Nano-v3 stack PRs v2 #9062

Wanli-Jiang · 2025-11-11T05:54:20Z

Features:

This PR stacked the following PRs on top of TRTLLM main.

[None][fixes] Add tool call parsing fixes and Qwen3 coder parser #8817 Qwen3 code tool parser
[#8763][feature] AutoDeploy: configurable dtype for caching #8812 mamba cache dtype config (Can be bf16 or fp32)
[None][feat] AutoDeploy: Perf improvement for mamba layers #8991 fuse silu to causal conv1d
fix prefill nv-auto-deploy/TensorRT-LLM#156 fix prefill
[#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 #9011 TRTLLM cutlass MoE kernel
Added nano_v3_bench.yaml and nano_v3_accuracy.yaml for bench and accuracy checking.

CMD to launch trtllm-serve for bench:

trtllm-serve <ckpt_folder_path> \
--host 0.0.0.0 \
--port 8000 \
--backend _autodeploy \
--trust_remote_code \
--extra_llm_api_options examples/auto_deploy/nano_v3_bench.yaml

Signed-off-by: William Zhang <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Chenghao Zhang <[email protected]> Add conv act fusion Signed-off-by: Chenghao Zhang <[email protected]> fix unit tests Signed-off-by: Suyog Gupta <[email protected]> fix tests Signed-off-by: Suyog Gupta <[email protected]> Address reviewer's comments Signed-off-by: Chenghao Zhang <[email protected]>

Signed-off-by: Suyog Gupta <[email protected]> fix typo Signed-off-by: Suyog Gupta <[email protected]>

Signed-off-by: Neta Zmora <[email protected]> Fixes and UT Signed-off-by: Neta Zmora <[email protected]> Use trtllm moe for relu2 mlp case Signed-off-by: Chenghao Zhang <[email protected]> Fix the runGemmProfile Signed-off-by: Chenghao Zhang <[email protected]> Replace the FP8 fused MoE backend Before: torch.ops.auto_deploy.triton_quant_fp8_moe After: torch.ops.auto_deploy.trtllm_quant_fp8moe_fused Signed-off-by: Neta Zmora <[email protected]> Code refactoring Signed-off-by: Neta Zmora <[email protected]> syntax error fixes Signed-off-by: Neta Zmora <[email protected]> remove dead code Signed-off-by: Neta Zmora <[email protected]> fix moe operator function name Signed-off-by: Neta Zmora <[email protected]> Add skips if not hopper+ Signed-off-by: Neta Zmora <[email protected]> remove unused code Signed-off-by: Neta Zmora <[email protected]>

Signed-off-by: Wanli Jiang <[email protected]>

lucaslie · 2025-11-11T06:29:58Z

fyi, just merged #8812 --> so you can drop the PR from the list next time you rebase

2ez4bz and others added 6 commits November 10, 2025 19:07

[None][fixes] Add tool call parsing fixes and Qwen3 coder parser

72b2505

Signed-off-by: William Zhang <[email protected]>

configurable kvcache/mamba cache

5eb77f9

Signed-off-by: Lucas Liebenwein <[email protected]>

reference config for nano v3

d3ffa69

Signed-off-by: Lucas Liebenwein <[email protected]>

fix prefill

6e8037a

Signed-off-by: Suyog Gupta <[email protected]> fix typo Signed-off-by: Suyog Gupta <[email protected]>

Wanli-Jiang mentioned this pull request Nov 11, 2025

[None][feat] Nano-v3 stack PRs v1 #8941

Closed

1 task

Add specific AD configs for nano-v3

7664abb

Signed-off-by: Wanli Jiang <[email protected]>

Wanli-Jiang force-pushed the user/williamj/nano-v3-fp8-stack branch from 49a0c40 to 7664abb Compare November 11, 2025 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] Nano-v3 stack PRs v2 #9062

[None][feat] Nano-v3 stack PRs v2 #9062

Uh oh!

Wanli-Jiang commented Nov 11, 2025

Uh oh!

lucaslie commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[None][feat] Nano-v3 stack PRs v2 #9062

Are you sure you want to change the base?

[None][feat] Nano-v3 stack PRs v2 #9062

Uh oh!

Conversation

Wanli-Jiang commented Nov 11, 2025

Features:

CMD to launch trtllm-serve for bench:

Uh oh!

lucaslie commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants