Skip to content

Conversation

@nvchenghaoz
Copy link
Collaborator

@nvchenghaoz nvchenghaoz commented Nov 7, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Fixed decoding phase calculations in Mamba model operations for improved correctness during inference.

Signed-off-by: Chenghao Zhang <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

📝 Walkthrough

Walkthrough

Both files contain decode-phase optimizations for the Mamba model. The CUDA backend simplifies decoding index calculation by replacing index-based copying with direct slicing using offsets. The Triton backend removes redundant dt_pre computation, instead passing dt_hp directly to selective_state_update with softplus enabled.

Changes

Cohort / File(s) Summary
Mamba CUDA backend decode optimization
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
Replaces decoding index calculation and in-place index_copy_ operation with direct sliced copy_. Uses total_prefill_tokens and num_decode offsets for explicit slice bounds, ensuring dtype consistency via to().
Mamba Triton backend dt computation
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py
Removes dt_pre computation (softplus and clipping) in decode path. Passes dt_hp directly to selective_state_update with dt_bias_hp as bias and dt_softplus enabled, replacing previous zero-bias non-softplus path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Triton backend changes may require verification that dt_softplus parameter produces numerically equivalent results and doesn't affect model accuracy
  • CUDA backend dtype handling should be verified to ensure the explicit .to(y_flat.dtype) conversion doesn't introduce unexpected precision changes
  • Both files involve low-level GPU kernels where subtle logic changes could have significant performance or numerical implications

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. Only '@coderabbitai summary' was provided without any actual description, test coverage, or checklist completion. Complete the PR description with sections explaining the issue/solution, test coverage, and completion of the PR checklist as specified in the template.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly and directly summarizes the main change: a performance improvement for mamba layers in the AutoDeploy module, which matches the file modifications and optimization changes described in the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nvchenghaoz nvchenghaoz changed the title [None][Feat] AutoDeploy: Perf improvement for mamba layers. [None][feat] AutoDeploy: Perf improvement for mamba layers. Nov 7, 2025
@nvchenghaoz nvchenghaoz changed the title [None][feat] AutoDeploy: Perf improvement for mamba layers. [None][feat] AutoDeploy: Perf improvement for mamba layers Nov 7, 2025
@nvchenghaoz
Copy link
Collaborator Author

/bot run

@github-project-automation github-project-automation bot moved this from Backlog to In review in AutoDeploy Board Nov 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #23878 [ run ] triggered by Bot. Commit: 45fbb9d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23878 [ run ] completed with state SUCCESS. Commit: 45fbb9d
/LLM/main/L0_MergeRequest_PR pipeline #17975 completed with status: 'FAILURE'

Signed-off-by: Chenghao Zhang <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23903 [ run ] triggered by Bot. Commit: 76530a4

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23903 [ run ] completed with state SUCCESS. Commit: 76530a4
/LLM/main/L0_MergeRequest_PR pipeline #17995 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23905 [ run ] triggered by Bot. Commit: c63abe0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23905 [ run ] completed with state SUCCESS. Commit: c63abe0
/LLM/main/L0_MergeRequest_PR pipeline #17997 completed with status: 'FAILURE'

Signed-off-by: Suyog Gupta <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23907 [ run ] triggered by Bot. Commit: 8eb0c25

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23907 [ run ] completed with state SUCCESS. Commit: 8eb0c25
/LLM/main/L0_MergeRequest_PR pipeline #17999 completed with status: 'FAILURE'

Signed-off-by: Suyog Gupta <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23908 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23908 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18000 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23914 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23914 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18004 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23916 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23916 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18006 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23930 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23930 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18019 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24039 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24039 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18113 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@nvchenghaoz nvchenghaoz self-assigned this Nov 10, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #24044 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24044 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18118 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24060 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24060 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18132 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24105 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24105 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18168 completed with status: 'SUCCESS'

@nvchenghaoz nvchenghaoz merged commit ec9cf71 into NVIDIA:main Nov 11, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in AutoDeploy Board Nov 11, 2025
suyoggupta added a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Nov 12, 2025
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Suyog Gupta <[email protected]>
Co-authored-by: Suyog Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants