Align DeepSeek V4 per-output compare with reference tolerance by zhangqi-chen · Pull Request #277 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-05-14T05:09:51Z

Summary

Switches per-output validation in the DeepSeek V4 kernels from broad torch.allclose(rtol, atol) defaults to ratio_allclose with per-output tolerances matching the upstream reference scheme.

hc_pre: x_mixed atol=1e-4 rtol=1/128; post/comb atol=2.5e-5 rtol=5e-3.
qkv_proj_rope: q/kv atol=1e-4 rtol=1/128; qr INT8 LSB exact (atol=1 rtol=0 max_error_ratio=0); qr_scale atol=2.5e-5 rtol=5e-3. Drops the bespoke int8_lsb_compare helper.
sparse_attn: attn_out atol=1e-4 rtol=1/128 across all three compress_ratio paths (0 / 4 / 128).
attention_swa: x_out atol=1e-4 rtol=1/128 (end-to-end fused kernel).
compressor_ratio4 / compressor_ratio128 / indexer_compressor: kv atol=1e-4 rtol=1/128, kv_state/score_state atol=1e-3 rtol=1e-3, all with max_error_ratio=0 to mirror strict allclose. kv_cache keeps bf16_allclose_or_ulp() (no reference counterpart).
indexer: score atol=1e-4 rtol=1/128 (closest analog to the prolog's weights output); idx_kv_cache and topk_idxs comparators unchanged.

Validation results (device 8, `a2a3`)

All single-stage kernels pass under the new tolerances: hc_pre, qkv_proj_rope, sparse_attn (×3 ratios), indexer, indexer_compressor. The compressor ratio=4/128 kv outputs fall just outside strict allclose (~0.085% and ~0.39% bad points) but well within the 0.5% outlier escape used by attention-class outputs. The end-to-end attention_swa kernel exceeds the single-stage tolerance due to error accumulation across hc_pre → qkv_proj_rope (W8A8) → sparse_attn → hc_post.

Related Issues

Switches per-output validation in the v4 kernels from the broad torch.allclose(rtol, atol) defaults to ratio_allclose with per-output tolerances matching the upstream reference scheme: * hc_pre: x_mixed atol=1e-4 rtol=1/128; post/comb atol=2.5e-5 rtol=5e-3. * qkv_proj_rope: q/kv atol=1e-4 rtol=1/128; qr INT8 LSB exact (atol=1 rtol=0 max_error_ratio=0); qr_scale atol=2.5e-5 rtol=5e-3. Drops the bespoke int8_lsb_compare helper. * sparse_attn: attn_out atol=1e-4 rtol=1/128 across all three compress_ratio paths (0 / 4 / 128). * attention_swa: x_out atol=1e-4 rtol=1/128 (end-to-end fused kernel; see compare_settings_vs_gitcode.md notes on accumulated error). * compressor_ratio4 / compressor_ratio128 / indexer_compressor: kv atol=1e-4 rtol=1/128, kv_state/score_state atol=1e-3 rtol=1e-3, all with max_error_ratio=0 to mirror strict allclose. kv_cache keeps bf16_allclose_or_ulp() which has no reference counterpart. * indexer: score atol=1e-4 rtol=1/128 (closest analog to the prolog's weights output); idx_kv_cache and topk_idxs comparators unchanged. All single-stage kernels pass under the new tolerances. The compressor ratio=4/128 paths fall just outside strict allclose on kv (~0.085% and ~0.39% bad points) but well within the 0.5% outlier escape used by attention-class outputs. The end-to-end attention_swa kernel exceeds the single-stage tolerance due to error accumulation across hc_pre → qkv_proj_rope (W8A8) → sparse_attn → hc_post — see models/deepseek/v4/compare_settings_vs_gitcode.md (local reference, not committed) for the per-stage cross-walk.

coderabbitai · 2026-05-14T05:10:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0065403c-c514-4d73-b65b-71ef783feb2a

📥 Commits

Reviewing files that changed from the base of the PR and between 8f7af9a and 7c33367.

📒 Files selected for processing (8)

models/deepseek/v4/attention_swa.py
models/deepseek/v4/compressor_ratio128.py
models/deepseek/v4/compressor_ratio4.py
models/deepseek/v4/hc_pre.py
models/deepseek/v4/indexer.py
models/deepseek/v4/indexer_compressor.py
models/deepseek/v4/qkv_proj_rope.py
models/deepseek/v4/sparse_attn.py

💤 Files with no reviewable changes (1)

models/deepseek/v4/attention_swa.py

📝 Walkthrough

Walkthrough

This PR updates test harnesses across eight Deepseek v4 model files to adopt a new ratio_allclose comparison utility. Seven files add imports and update output validation to use ratio-based tolerances; one file removes redundant comment text. All changes standardize JIT test correctness checking.

Changes

Test Harness Unification with ratio_allclose

Layer / File(s)	Summary
Harness imports for ratio_allclose `models/deepseek/v4/compressor_ratio128.py`, `models/deepseek/v4/compressor_ratio4.py`, `models/deepseek/v4/hc_pre.py`, `models/deepseek/v4/indexer.py`, `models/deepseek/v4/indexer_compressor.py`, `models/deepseek/v4/qkv_proj_rope.py`, `models/deepseek/v4/sparse_attn.py`	All test harnesses import `ratio_allclose` from golden module, providing the foundation for standardized tolerance-based output validation across multiple files.
Multi-output validation for compressor tests `models/deepseek/v4/compressor_ratio128.py`, `models/deepseek/v4/compressor_ratio4.py`, `models/deepseek/v4/indexer_compressor.py`	Compressor and indexer-compressor tests expand `compare_fn` configurations from single `kv_cache` validation to multi-output checking of `kv`, `kv_state`, and `score_state` using `ratio_allclose` with per-tensor tolerance scaling (e.g., `rtol=1.0/128` for `kv`), while retaining `bf16_allclose_or_ulp` for bfloat16 tensor comparisons.
Single/targeted output validation updates `models/deepseek/v4/hc_pre.py`, `models/deepseek/v4/indexer.py`, `models/deepseek/v4/qkv_proj_rope.py`, `models/deepseek/v4/sparse_attn.py`	Four harnesses add or update `compare_fn` configurations: hc_pre validates `x_mixed`, `post`, and `comb`; indexer replaces score validation to use `ratio_allclose(atol=1e-4, rtol=1.0/128)`; qkv_proj_rope standardizes `q`, `kv`, `qr`, and `qr_scale` validation replacing a bespoke `int8_lsb_compare` helper; sparse_attn adds `attn_out` validation with scaled ratio tolerance.
Comment cleanup `models/deepseek/v4/attention_swa.py`	Removes an explanatory comment block preceding `RunConfig` tolerance configuration, while preserving numeric tolerance settings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

hw-native-sys/pypto-lib#273: Adds and exports ratio_allclose to the golden module, which is the foundation that these test harnesses depend on.
hw-native-sys/pypto-lib#270: Updates compressor APIs and state layouts (kv, kv_state, score_state) that align with the multi-output validation changes in compressor_ratio128 and compressor_ratio4 test harnesses.
hw-native-sys/pypto-lib#265: Introduces indexer compressor tensors (kv, kv_state, score_state) that correspond to the expanded compare_fn validation in indexer_compressor and compressor_ratio4 tests.

Poem

🐰 A rabbit hops through test files with glee,
Ratio checks now unified and free,
From sparse to compressor, each harness aligned,
With tolerance logic, so carefully designed,
Golden comparisons shine—QA's dream, you see! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the changeset: aligning per-output comparison tolerances in DeepSeek V4 kernels with reference standards.
Description check	✅ Passed	The description comprehensively explains the changes, including specific tolerance parameters for each affected file and validation results demonstrating the changes work correctly.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the validation logic across several DeepSeek v4 model components by replacing or augmenting existing comparison functions with ratio_allclose. Key changes include the removal of manual INT8 comparison logic in qkv_proj_rope.py and the addition of detailed compare_fn entries for outputs such as kv, kv_state, and score_state in the compressor and indexer modules. Feedback from the reviewer points out that explicitly setting max_error_ratio=0.0 in the compressor modules will cause validation failures, as it prevents the 0.5% outlier allowance intended for these kernels.

gemini-code-assist · 2026-05-14T05:15:26Z

            ),
-            compare_fn={"kv_cache": bf16_allclose_or_ulp()},
+            compare_fn={
+                "kv":          ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),


The PR summary states that the kv outputs for ratio 4/128 fall just outside strict allclose (~0.085% and ~0.39% bad points) but are within the 0.5% outlier allowance. However, the code here explicitly sets max_error_ratio=0.0, which enforces strict allclose and will cause the validation to fail for these kernels. You should remove the max_error_ratio=0.0 argument for the kv output to allow the default 0.5% outlier escape.

Suggested change

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128),

gemini-code-assist · 2026-05-14T05:15:26Z

            ),
-            compare_fn={"kv_cache": bf16_allclose_or_ulp()},
+            compare_fn={
+                "kv":          ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),


The PR summary states that the kv outputs for ratio 4/128 fall just outside strict allclose (~0.085% and ~0.39% bad points) but are within the 0.5% outlier allowance. However, the code here explicitly sets max_error_ratio=0.0, which enforces strict allclose and will cause the validation to fail for these kernels. You should remove the max_error_ratio=0.0 argument for the kv output to allow the default 0.5% outlier escape.

Suggested change

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128),

gemini-code-assist · 2026-05-14T05:15:26Z

            ),
-            compare_fn={"kv_cache": bf16_allclose_or_ulp()},
+            compare_fn={
+                "kv":          ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),


The PR summary states that the kv outputs for ratio 4/128 fall just outside strict allclose (~0.085% and ~0.39% bad points) but are within the 0.5% outlier allowance. However, the code here explicitly sets max_error_ratio=0.0, which enforces strict allclose and will cause the validation to fail for these kernels. You should remove the max_error_ratio=0.0 argument for the kv output to allow the default 0.5% outlier escape.

Suggested change

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),

"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128),

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

zhangqi-chen merged commit 3888a86 into hw-native-sys:main May 14, 2026
6 checks passed

zhangqi-chen deleted the feat/v4-align-compare-with-gitcode branch May 14, 2026 05:42

This was referenced May 15, 2026

Align DeepSeek V4 ratio4 compressor test inputs #284

Merged

Align DeepSeek V4 compressor init with PyPTO2.0 #290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align DeepSeek V4 per-output compare with reference tolerance#277

Align DeepSeek V4 per-output compare with reference tolerance#277
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
zhangqi-chen:feat/v4-align-compare-with-gitcode

zhangqi-chen commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128, max_error_ratio=0.0),
	"kv": ratio_allclose(atol=1e-4, rtol=1.0 / 128),

Conversation

zhangqi-chen commented May 14, 2026

Summary

Validation results (device 8, a2a3)

Related Issues

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Validation results (device 8, `a2a3`)

coderabbitai Bot commented May 14, 2026 •

edited

Loading