llama/rope: gate fp64 hf_precompute_freqs_cis on cos/sin scaling by rascani · Pull Request #19308 · pytorch/executorch

rascani · 2026-05-05T21:10:32Z

Summary

a79521b ("Add LongRoPE support and fp64 RoPE precompute for Phi-3 / Phi-4 family") unconditionally moved hf_precompute_freqs_cis to fp64 cos/sin precompute with a final cast to fp32. That works for the Phi-4 device validation that motivated the commit, but it broke test_static_attention.py::test_within_transformer on the Linux unittest runners (pull, pull-editable, trunk-release have been 100% red since the commit landed).

The test compares mha_transformer (built with use_hf_rope=False, taking the pure-fp32 precompute_freqs_cis path) against static_transformer (built with use_hf_rope=True, taking hf_precompute_freqs_cis) at rtol=1e-3, with shared weights. Before a79521b, both paths produced bit-identical fp32 cos/sin tables (verified empirically: 0/192 entries differed). After the commit, HF cos/sin diverge from non-HF by ~1 ULP in 38/192 entries; that drift compounds across 4 transformer layers and tips past rtol=1e-3 on the CI runners (Python 3.10, source-built torch). Local Python 3.12 stayed just barely within tolerance, which is why review missed it.

Gate the fp64 precompute on the property the original commit was actually protecting: a non-trivial cos/sin scale being applied. That is either LongRoPE active (Phi-3 / Phi-4 set short_factor and long_factor via config) or an explicit attention_factor != 1.0 passed through. Both cases preserve fp64; vanilla HF RoPE (Llama family, the test config) goes back to fp32 throughout and re-establishes bit-identical agreement with the non-HF path.

Authored with Claude Code.

Test plan

CI

a79521b ("Add LongRoPE support and fp64 RoPE precompute for Phi-3 / Phi-4 family") unconditionally moved hf_precompute_freqs_cis to fp64 cos/sin precompute with a final cast to fp32. That works for the Phi-4 device validation that motivated the commit, but it broke test_static_attention.py::test_within_transformer on the Linux unittest runners (pull, pull-editable, trunk-release have been 100% red since the commit landed). The test compares mha_transformer (built with use_hf_rope=False, taking the pure-fp32 precompute_freqs_cis path) against static_transformer (built with use_hf_rope=True, taking hf_precompute_freqs_cis) at rtol=1e-3, with shared weights. Before a79521b, both paths produced bit-identical fp32 cos/sin tables (verified empirically: 0/192 entries differed). After the commit, HF cos/sin diverge from non-HF by ~1 ULP in 38/192 entries; that drift compounds across 4 transformer layers and tips past rtol=1e-3 on the CI runners (Python 3.10, source-built torch). Local Python 3.12 stayed just barely within tolerance, which is why review missed it. Gate the fp64 precompute on the property the original commit was actually protecting: a non-trivial cos/sin scale being applied. That is either LongRoPE active (Phi-3 / Phi-4 set short_factor and long_factor via config) or an explicit attention_factor != 1.0 passed through. Both cases preserve fp64; vanilla HF RoPE (Llama family, the test config) goes back to fp32 throughout and re-establishes bit-identical agreement with the non-HF path. Authored with Claude Code.

pytorch-bot · 2026-05-05T21:10:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19308

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 2 Unrelated Failures

As of commit 4bae62e with merge base acffcb0 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-05T21:11:51Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026

digantdesai requested a review from SS-JIA May 5, 2026 21:58

rascani marked this pull request as ready for review May 5, 2026 22:11

rascani requested a review from lucylq as a code owner May 5, 2026 22:11

JacobSzwejbka approved these changes May 5, 2026

View reviewed changes

JacobSzwejbka merged commit 5dcf0ed into pytorch:main May 6, 2026
185 of 194 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama/rope: gate fp64 hf_precompute_freqs_cis on cos/sin scaling#19308

llama/rope: gate fp64 hf_precompute_freqs_cis on cos/sin scaling#19308
JacobSzwejbka merged 1 commit intopytorch:mainfrom
rascani:main

rascani commented May 5, 2026

Uh oh!

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rascani commented May 5, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19308

⏳ 1 Pending, 2 Unrelated Failures

Uh oh!

github-actions Bot commented May 5, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

This PR needs a `release notes:` label