Update DeepSeek attention fixtures by high-cloud · Pull Request #256 · hw-native-sys/pypto-lib

high-cloud · 2026-05-12T07:30:27Z

Summary

Pass sparse attention local RoPE selector tensors through SWA and HCA
Initialize SWA/HCA KV cache fixtures with unseeded non-zero random data
Align SWA and HCA decode precision tolerances after NPU validation

Related Issues

None

- Pass sparse attention local RoPE selector tensors from SWA and HCA examples - Initialize KV cache fixtures with deterministic non-zero data - Align SWA and HCA decode precision tolerances after NPU validation

coderabbitai · 2026-05-12T07:35:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds local sparse‑RoPE selector tensors and sizing constants, wires them through HCA and SWA into sparse_attn (and golden references), refactors CSA to split ori/cmp KV pools with deterministic top‑k and INT8 quantization, updates tensor initializers to seeded normalized caches, and adjusts JIT tolerances.

Changes

Sparse RoPE Local Selector Integration & CSA refactor

Layer / File(s)	Summary
Sparse RoPE sizing constants `models/deepseek/v4/attention_hca.py`, `models/deepseek/v4/attention_swa.py`, `models/deepseek/v4/compressor_ratio4.py`	Introduces `SPARSE_ROPE_CHUNK` and `SPARSE_ROPE_INTERLEAVE_CHUNK`; updates SWA/compressor `START_POS` to `127`.
CSA orchestration and golden wiring `models/deepseek/v4/attention_csa_draft.py`	Switches CSA imports to non-draft kernels, expands `attention_csa`/`attention_csa_test` to use split `ori`/`cmp` KV pools and block tables, builds deterministic top‑k for ratio=4, and wires local `even_select_local`/`odd_select_local` into `sparse_attn` and golden references.
CSA tensor specs, quantization, and inits `models/deepseek/v4/attention_csa_draft.py`	Adds INT8 quantization helpers and per-block scales, selector initializers, `init_normalized_cache`, and updates `build_tensor_specs()` to include split KV pools, block tables, selector tensors, and INT8 scales; loosens JIT tolerances.
HCA decode orchestration and tests `models/deepseek/v4/attention_hca.py`	Extends `attention_hca`/`attention_hca_test` signatures with `even_select_local`/`odd_select_local`, forwards them into `sparse_attn`/`golden_sparse_attn`, adds selector init helpers, switches KV-cache fixtures to seeded normalized BF16, and relaxes JIT tolerances.
SWA decode orchestration and tests `models/deepseek/v4/attention_swa.py`	Extends `attention_swa`/`attention_swa_test` signatures with `even_select_local`/`odd_select_local`, forwards them into `sparse_attn`/`golden_sparse_attn`, adds selector inits, replaces zero KV-cache fixtures with seeded normalized caches, and updates JIT comparison tolerances.
Compressor config `models/deepseek/v4/compressor_ratio4.py`	Updates `START_POS` from `3` to `127`, changing compression decision timing and derived indices used by the compressor and tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

hw-native-sys/pypto-lib#201: Modifies DeepSeek V4 attention orchestration and kernel wiring similar to this PR.
hw-native-sys/pypto-lib#244: Threads interleaved even/odd RoPE selector tensors through DeepSeek V4 attention paths.
hw-native-sys/pypto-lib#248: Prior refactor that this PR builds on; touches attention orchestration and sparse_attn interfaces.

Poem

🐰 I hop through RoPE rows with a little beat,
even and odd selectors tapping their feet.
Seeded caches sprout where zeros once lay,
CSA splits pools and quant scales come to play.
Tolerances tuned — the rabbit smiles today.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Update DeepSeek attention fixtures' directly summarizes the main change: updating fixture initialization and parameters across multiple DeepSeek attention modules (SWA, HCA, CSA) to pass RoPE selector tensors and update KV cache initialization.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description accurately aligns with the changeset, covering all major modifications: sparse attention local RoPE selector tensor threading, KV cache initialization updates, and tolerance adjustments across SWA and HCA modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements sparse RoPE chunking for HCA and SWA attention mechanisms by introducing local selection tensors and related constants. The test suites for both modules were updated to use randomized, normalized KV caches instead of zero-initialized ones, and the precision tolerances were adjusted accordingly. I have no feedback to provide.

coderabbitai

🧹 Nitpick comments (1)

models/deepseek/v4/attention_hca.py (1)
577-593: ⚡ Quick win

Consider extracting duplicated test helpers to a shared module.

The functions init_even_select_local, init_odd_select_local, and init_normalized_cache are duplicated identically in attention_swa.py (lines 444-460). Extracting these to a shared test utilities module would improve maintainability and ensure consistency if these initialization strategies evolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/attention_hca.py` around lines 577 - 593, The three helper
functions init_even_select_local, init_odd_select_local, and
init_normalized_cache are duplicated; extract them into a shared test utilities
module (e.g., tests.utils or models.deepseek.utils) and replace the local
definitions in attention_hca.py and attention_swa.py with imports from that
module; ensure the shared module exposes the same function names and signatures,
update any relative imports, and run tests to verify behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/attention_hca.py`:
- Around line 577-593: The three helper functions init_even_select_local,
init_odd_select_local, and init_normalized_cache are duplicated; extract them
into a shared test utilities module (e.g., tests.utils or models.deepseek.utils)
and replace the local definitions in attention_hca.py and attention_swa.py with
imports from that module; ensure the shared module exposes the same function
names and signatures, update any relative imports, and run tests to verify
behavior unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95a9a9ef-798e-405b-a78a-dd2b6dcb6616

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe6b76 and 7735545.

📒 Files selected for processing (2)

models/deepseek/v4/attention_hca.py
models/deepseek/v4/attention_swa.py

- Remove fixed seeds from SWA and HCA KV cache initialization - Keep normalized non-zero cache values for decode attention precision coverage

coderabbitai

🧹 Nitpick comments (1)

models/deepseek/v4/attention_csa_draft.py (1)
228-243: 💤 Low value

Pass an actual identity (or drop the parameter) instead of an uninitialized tensor named hadamard_identity.

hadamard_identity is created via pl.create_tensor(...) with no initializer and then handed to compressor(...). It works today only because compressor_ratio4.compressor has ROTATE=False and never reads the hadamard argument — but the name implies it’s an identity matrix and the golden_compressor explicitly passes torch.eye(HEAD_DIM, dtype=torch.bfloat16) for the same slot. If compressor ever consumes hadamard, this fixture will silently drive it with garbage.
♻️ One option: explicitly initialize, matching the golden
     cmp_out = pl.create_tensor([B, HEAD_DIM], dtype=pl.BF16)
-    hadamard_identity = pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16)
+    # `compressor_ratio4` has ROTATE=False and ignores `hadamard`, but the
+    # golden passes torch.eye; keep the kernel side consistent so a future
+    # ROTATE=True flip does not silently use uninitialized memory.
+    hadamard_identity = pl.full([HEAD_DIM, HEAD_DIM], dtype=pl.BF16, value=0.0)
+    # (or build an actual identity via an eye-like helper if available)
Alternatively, drop the hadamard parameter from the inline compressor signature since the ratio-4 path is fixed at ROTATE=False.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/attention_csa_draft.py` around lines 228 - 243, The
uninitialized tensor hadamard_identity passed into compressor should be a real
identity matrix (or the parameter removed); replace the
pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16) placeholder with an
explicit identity tensor matching HEAD_DIM and dtype (same semantics as
golden_compressor's torch.eye(..., dtype=torch.bfloat16)) so compressor(x_mixed,
..., hadamard_identity, ...) receives a valid identity, or if you prefer and
ROTATE is guaranteed False, remove the hadamard argument from the compressor
signature and all call sites (including the inline compressor and any
compressor_ratio4.compressor usages).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/attention_csa_draft.py`:
- Around line 228-243: The uninitialized tensor hadamard_identity passed into
compressor should be a real identity matrix (or the parameter removed); replace
the pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16) placeholder with an
explicit identity tensor matching HEAD_DIM and dtype (same semantics as
golden_compressor's torch.eye(..., dtype=torch.bfloat16)) so compressor(x_mixed,
..., hadamard_identity, ...) receives a valid identity, or if you prefer and
ROTATE is guaranteed False, remove the hadamard argument from the compressor
signature and all call sites (including the inline compressor and any
compressor_ratio4.compressor usages).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cde4e06-c00b-4506-af1a-ebe8b23d5f0c

📥 Commits

Reviewing files that changed from the base of the PR and between 7735545 and 151d74d.

📒 Files selected for processing (4)

models/deepseek/v4/attention_csa_draft.py
models/deepseek/v4/attention_hca.py
models/deepseek/v4/attention_swa.py
models/deepseek/v4/compressor_ratio4.py

🚧 Files skipped from review as they are similar to previous changes (2)

models/deepseek/v4/attention_hca.py
models/deepseek/v4/attention_swa.py

Update: adapt DeepSeek attention fixtures

7735545

- Pass sparse attention local RoPE selector tensors from SWA and HCA examples - Initialize KV cache fixtures with deterministic non-zero data - Align SWA and HCA decode precision tolerances after NPU validation

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Update: randomize DeepSeek KV cache fixtures

cf59218

- Remove fixed seeds from SWA and HCA KV cache initialization - Keep normalized non-zero cache values for decode attention precision coverage

high-cloud force-pushed the fix/deepseek-attention-fixtures branch from 151d74d to cf59218 Compare May 12, 2026 09:52

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

zhangqi-chen merged commit 056be6d into hw-native-sys:main May 12, 2026
6 checks passed

This was referenced May 12, 2026

Update DeepSeek V4 indexer and ratio-4 compressor #260

Merged

Refactor: centralize DeepSeek-V4 kernel config into config.py #263

Merged

Add FLASH config support to DSv4 CSA attention #296

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update DeepSeek attention fixtures#256

Update DeepSeek attention fixtures#256
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
high-cloud:fix/deepseek-attention-fixtures

high-cloud commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

high-cloud commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

high-cloud commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading