-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][feat] AutoDeploy: Perf improvement for mamba layers #8991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][feat] AutoDeploy: Perf improvement for mamba layers #8991
Conversation
Signed-off-by: Chenghao Zhang <[email protected]>
📝 WalkthroughWalkthroughBoth files contain decode-phase optimizations for the Mamba model. The CUDA backend simplifies decoding index calculation by replacing index-based copying with direct slicing using offsets. The Triton backend removes redundant dt_pre computation, instead passing dt_hp directly to selective_state_update with softplus enabled. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/bot run |
|
PR_Github #23878 [ run ] triggered by Bot. Commit: |
|
PR_Github #23878 [ run ] completed with state |
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run |
|
PR_Github #23903 [ run ] triggered by Bot. Commit: |
|
PR_Github #23903 [ run ] completed with state |
|
/bot run |
|
PR_Github #23905 [ run ] triggered by Bot. Commit: |
|
PR_Github #23905 [ run ] completed with state |
Signed-off-by: Suyog Gupta <[email protected]>
|
/bot run |
|
PR_Github #23907 [ run ] triggered by Bot. Commit: |
|
PR_Github #23907 [ run ] completed with state |
Signed-off-by: Suyog Gupta <[email protected]>
|
/bot run |
|
PR_Github #23908 [ run ] triggered by Bot. Commit: |
|
PR_Github #23908 [ run ] completed with state |
|
PR_Github #23914 [ run ] triggered by Bot. Commit: |
|
PR_Github #23914 [ run ] completed with state |
|
/bot run |
|
PR_Github #23916 [ run ] triggered by Bot. Commit: |
|
PR_Github #23916 [ run ] completed with state |
|
/bot run |
|
PR_Github #23930 [ run ] triggered by Bot. Commit: |
|
PR_Github #23930 [ run ] completed with state |
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
Outdated
Show resolved
Hide resolved
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py
Outdated
Show resolved
Hide resolved
|
/bot run |
|
PR_Github #24039 [ run ] triggered by Bot. Commit: |
|
PR_Github #24039 [ run ] completed with state |
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run |
|
PR_Github #24044 [ run ] triggered by Bot. Commit: |
|
PR_Github #24044 [ run ] completed with state |
|
/bot run |
|
PR_Github #24060 [ run ] triggered by Bot. Commit: |
|
PR_Github #24060 [ run ] completed with state |
|
/bot run |
|
PR_Github #24105 [ run ] triggered by Bot. Commit: |
|
PR_Github #24105 [ run ] completed with state |
Signed-off-by: Chenghao Zhang <[email protected]> Signed-off-by: Suyog Gupta <[email protected]> Co-authored-by: Suyog Gupta <[email protected]>
Summary by CodeRabbit