[WIP] Fix RMS and test MoE for batch invariance [4/n] #26136

bwasti · 2025-10-02T21:23:24Z

This change gets Qwen/Qwen3-30B-A3B on 1GPU working at bitwise parity across batch sizes.
It also improves the e2e testing logic (makes it more harsh). Please only look at the last commit (it is rebased on #25769)

Purpose

Add RMS kernel directly and override in Python. The change to the csrc version doesn't provide full coverage, but does increase invariance.

Test Plan

VLLM_TEST_MODEL="Qwen/Qwen3-30B-A3B" VLLM_ATTENTION_BACKEND=FLASHINFER VLLM_KERNEL_OVERRIDE_BATCH_INVARIANT=1 HF_HUB_DISABLE_XET=1 pytest -s -v tests/v1/generation/test_batch_invariance.py -k test_logprobs_bitwise_batch_invariance_bs1_vs_bsN

Test Result

Pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Bram Wasti <[email protected]>

gemini-code-assist

Code Review

This pull request introduces significant improvements for batch invariance, particularly for RMSNorm and MoE layers, and strengthens the e2e testing framework. The addition of a batch-invariant Triton kernel for RMSNorm and the corresponding Python overrides are key changes. The updates to the testing logic, making it more rigorous by using randomized batch sizes and prompts, are a great enhancement. The modifications for the FlashInfer backend to ensure deterministic behavior under batch invariance are also well-implemented. However, I've identified a critical issue in the fused_add_rms_norm implementation and a high-severity issue in the MoE softmax kernel that could undermine the goal of batch invariance.

vllm/model_executor/layers/layernorm.py

csrc/moe/topk_softmax_kernels.cu

bwasti · 2025-10-02T21:30:18Z

btw, I'm counting on some RMS test to run here. this shouldn't land until that passes

Signed-off-by: Bram Wasti <[email protected]>

yewentao256

btw, I'm counting on some RMS test to run here. this shouldn't land until that passes

Sounds great, please change the title once it is ready

mergify · 2025-10-04T04:43:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bwasti.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Add batch invariant kernel override for flashinfer

64930d4

Signed-off-by: Bram Wasti <[email protected]>

bwasti requested a review from mgoin as a code owner October 2, 2025 21:23

mergify bot added the v1 label Oct 2, 2025

bwasti force-pushed the det_moe branch from c50f2f7 to abe3db6 Compare October 2, 2025 21:24

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes

vllm/model_executor/layers/layernorm.py Outdated Show resolved Hide resolved

csrc/moe/topk_softmax_kernels.cu Show resolved Hide resolved

bwasti force-pushed the det_moe branch 3 times, most recently from f867388 to 648ce25 Compare October 2, 2025 22:40

moe working

4b6fa50

Signed-off-by: Bram Wasti <[email protected]>

bwasti force-pushed the det_moe branch from 648ce25 to 4b6fa50 Compare October 2, 2025 22:48

yewentao256 added this to Batch-invariant Inference Oct 3, 2025

yewentao256 moved this to In Progress in Batch-invariant Inference Oct 3, 2025

yewentao256 reviewed Oct 3, 2025

View reviewed changes

yewentao256 changed the title ~~Fix RMS and test MoE for batch invariance [4/n]~~ [WIP] Fix RMS and test MoE for batch invariance [4/n] Oct 3, 2025

mergify bot added the needs-rebase label Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Fix RMS and test MoE for batch invariance [4/n] #26136

[WIP] Fix RMS and test MoE for batch invariance [4/n] #26136

bwasti commented Oct 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

bwasti commented Oct 2, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

mergify bot commented Oct 4, 2025

Uh oh!

Uh oh!

Uh oh!

[WIP] Fix RMS and test MoE for batch invariance [4/n] #26136

Are you sure you want to change the base?

[WIP] Fix RMS and test MoE for batch invariance [4/n] #26136

Conversation

bwasti commented Oct 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

bwasti commented Oct 2, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 4, 2025

Uh oh!

Uh oh!

bwasti commented Oct 2, 2025 •

edited by github-actions bot

Loading