[None][feat] Nano-v3 stack PRs v2 #9062
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Features:
This PR stacked the following PRs on top of TRTLLM main.
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser #8817 Qwen3 code tool parser
[#8763][feature] AutoDeploy: configurable dtype for caching #8812 mamba cache dtype config (Can be bf16 or fp32)
[None][feat] AutoDeploy: Perf improvement for mamba layers #8991 fuse silu to causal conv1d
fix prefill nv-auto-deploy/TensorRT-LLM#156 fix prefill
[#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 #9011 TRTLLM cutlass MoE kernel
Added nano_v3_bench.yaml and nano_v3_accuracy.yaml for bench and accuracy checking.
CMD to launch trtllm-serve for bench: