Skip to content

Conversation

@Wanli-Jiang
Copy link
Collaborator

Features:

This PR stacked the following PRs on top of TRTLLM main.

CMD to launch trtllm-serve for bench:

trtllm-serve <ckpt_folder_path> \
--host 0.0.0.0 \
--port 8000 \
--backend _autodeploy \
--trust_remote_code \
--extra_llm_api_options examples/auto_deploy/nano_v3_bench.yaml

2ez4bz and others added 6 commits November 10, 2025 19:07
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Chenghao Zhang <[email protected]>

Add conv act fusion

Signed-off-by: Chenghao Zhang <[email protected]>

fix unit tests

Signed-off-by: Suyog Gupta <[email protected]>

fix tests

Signed-off-by: Suyog Gupta <[email protected]>

Address reviewer's comments

Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Suyog Gupta <[email protected]>

fix typo

Signed-off-by: Suyog Gupta <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>

Fixes and UT

Signed-off-by: Neta Zmora <[email protected]>

Use trtllm moe for relu2 mlp case

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the runGemmProfile

Signed-off-by: Chenghao Zhang <[email protected]>

Replace the FP8 fused MoE backend

Before: torch.ops.auto_deploy.triton_quant_fp8_moe
After: torch.ops.auto_deploy.trtllm_quant_fp8moe_fused
Signed-off-by: Neta Zmora <[email protected]>

Code refactoring

Signed-off-by: Neta Zmora <[email protected]>

syntax error fixes

Signed-off-by: Neta Zmora <[email protected]>

remove dead code

Signed-off-by: Neta Zmora <[email protected]>

fix moe operator function name

Signed-off-by: Neta Zmora <[email protected]>

Add skips if not hopper+

Signed-off-by: Neta Zmora <[email protected]>

remove unused code

Signed-off-by: Neta Zmora <[email protected]>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/nano-v3-fp8-stack branch from 49a0c40 to 7664abb Compare November 11, 2025 06:14
@lucaslie
Copy link
Member

fyi, just merged #8812 --> so you can drop the PR from the list next time you rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants