Restore ScalelessRMSNorm hand-rolled forward for eager numerical parity by billmguo · Pull Request #19654 · pytorch/executorch

billmguo · 2026-05-18T21:04:42Z

Summary:
D104258950 changed ScalelessRMSNorm from a hand-rolled fp32 decomposition to a torch.nn.RMSNorm subclass so that QNN and other backends see a proper RMSNorm op for lowering. However, removing the custom forward meant eager execution now uses torch.nn.RMSNorm's fused CUDA kernel, which has different internal precision handling than the hand-rolled x.float() * rsqrt(mean(x^2) + eps) decomposition used by the rlformers reference model.

This caused both test_llm_backbone_correctness_cuda and test_llm_backbone_correctness_decode to fail:

fp32 case: SNR dropped from inf to 67-85 dB (same decoded text, different logits)
quantized case: SNR dropped to 1-35 dB with negative per-step values and divergent decoded text, because the precision difference was amplified by quantization noise

The fix restores the original hand-rolled forward override on ScalelessRMSNorm while keeping torch.nn.RMSNorm as the base class. A torch.compiler.is_compiling() guard ensures that during torch.export (for QNN, XNNPACK, or any backend), the fused torch.nn.RMSNorm op is used instead — preserving the export-path fix from D104258950.

Differential Revision: D105593738

Summary: D104258950 changed `ScalelessRMSNorm` from a hand-rolled fp32 decomposition to a `torch.nn.RMSNorm` subclass so that QNN and other backends see a proper RMSNorm op for lowering. However, removing the custom `forward` meant eager execution now uses `torch.nn.RMSNorm`'s fused CUDA kernel, which has different internal precision handling than the hand-rolled `x.float() * rsqrt(mean(x^2) + eps)` decomposition used by the rlformers reference model. This caused both `test_llm_backbone_correctness_cuda` and `test_llm_backbone_correctness_decode` to fail: - **fp32 case**: SNR dropped from `inf` to 67-85 dB (same decoded text, different logits) - **quantized case**: SNR dropped to 1-35 dB with negative per-step values and divergent decoded text, because the precision difference was amplified by quantization noise The fix restores the original hand-rolled `forward` override on `ScalelessRMSNorm` while keeping `torch.nn.RMSNorm` as the base class. A `torch.compiler.is_compiling()` guard ensures that during `torch.export` (for QNN, XNNPACK, or any backend), the fused `torch.nn.RMSNorm` op is used instead — preserving the export-path fix from D104258950. Differential Revision: D105593738

pytorch-bot · 2026-05-18T21:04:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19654

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 1 New Failure, 2 Unclassified Failures

As of commit c93f866 with merge base 7c495fa ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv2_model

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Process completed with exit code 1.
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-18T21:04:50Z

@billmguo has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105593738.

github-actions · 2026-05-18T21:11:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

billmguo requested a review from lucylq as a code owner May 18, 2026 21:04

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2026

meta-codesync Bot added fb-exported meta-exported labels May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore ScalelessRMSNorm hand-rolled forward for eager numerical parity#19654

Restore ScalelessRMSNorm hand-rolled forward for eager numerical parity#19654
billmguo wants to merge 1 commit into
pytorch:mainfrom
billmguo:export-D105593738

billmguo commented May 18, 2026

Uh oh!

pytorch-bot Bot commented May 18, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

billmguo commented May 18, 2026

Uh oh!

pytorch-bot Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19654

❗ 1 Active SEVs

❌ 1 New Failure, 2 Unclassified Failures

Uh oh!

meta-codesync Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 18, 2026 •

edited

Loading

This PR needs a `release notes:` label