Skip to content

Conversation

@wenscarl
Copy link
Collaborator

@wenscarl wenscarl commented Nov 6, 2025

Motivation

Upstreaming the new trtllm_mnnvl_fused_allreduce_add_rmsnorm. Depends on flashinfer-ai/flashinfer#2118

This will improve throughput for multi-GPU decode for NVL systems.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@github-actions github-actions bot added documentation Improvements or additions to documentation quant LLM Quantization labels Nov 6, 2025
@Fridge003 Fridge003 added high priority and removed documentation Improvements or additions to documentation labels Nov 6, 2025
@Fridge003
Copy link
Collaborator

@wenscarl Is this PR ready for review?

@wenscarl
Copy link
Collaborator Author

@wenscarl Is this PR ready for review?

There is still some issue with the kernel in flashinfer.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 19, 2025
@anurlybayev anurlybayev added Grace Blackwell and removed blackwell SM100/SM120 labels Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Grace Blackwell high priority nvidia quant LLM Quantization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants