Skip to content

Update DeepSeek attention fixtures#256

Merged
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
high-cloud:fix/deepseek-attention-fixtures
May 12, 2026
Merged

Update DeepSeek attention fixtures#256
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
high-cloud:fix/deepseek-attention-fixtures

Conversation

@high-cloud
Copy link
Copy Markdown
Contributor

@high-cloud high-cloud commented May 12, 2026

Summary

  • Pass sparse attention local RoPE selector tensors through SWA and HCA
  • Initialize SWA/HCA KV cache fixtures with unseeded non-zero random data
  • Align SWA and HCA decode precision tolerances after NPU validation

Related Issues

None

- Pass sparse attention local RoPE selector tensors from SWA and HCA examples
- Initialize KV cache fixtures with deterministic non-zero data
- Align SWA and HCA decode precision tolerances after NPU validation
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds local sparse‑RoPE selector tensors and sizing constants, wires them through HCA and SWA into sparse_attn (and golden references), refactors CSA to split ori/cmp KV pools with deterministic top‑k and INT8 quantization, updates tensor initializers to seeded normalized caches, and adjusts JIT tolerances.

Changes

Sparse RoPE Local Selector Integration & CSA refactor

Layer / File(s) Summary
Sparse RoPE sizing constants
models/deepseek/v4/attention_hca.py, models/deepseek/v4/attention_swa.py, models/deepseek/v4/compressor_ratio4.py
Introduces SPARSE_ROPE_CHUNK and SPARSE_ROPE_INTERLEAVE_CHUNK; updates SWA/compressor START_POS to 127.
CSA orchestration and golden wiring
models/deepseek/v4/attention_csa_draft.py
Switches CSA imports to non-draft kernels, expands attention_csa/attention_csa_test to use split ori/cmp KV pools and block tables, builds deterministic top‑k for ratio=4, and wires local even_select_local/odd_select_local into sparse_attn and golden references.
CSA tensor specs, quantization, and inits
models/deepseek/v4/attention_csa_draft.py
Adds INT8 quantization helpers and per-block scales, selector initializers, init_normalized_cache, and updates build_tensor_specs() to include split KV pools, block tables, selector tensors, and INT8 scales; loosens JIT tolerances.
HCA decode orchestration and tests
models/deepseek/v4/attention_hca.py
Extends attention_hca/attention_hca_test signatures with even_select_local/odd_select_local, forwards them into sparse_attn/golden_sparse_attn, adds selector init helpers, switches KV-cache fixtures to seeded normalized BF16, and relaxes JIT tolerances.
SWA decode orchestration and tests
models/deepseek/v4/attention_swa.py
Extends attention_swa/attention_swa_test signatures with even_select_local/odd_select_local, forwards them into sparse_attn/golden_sparse_attn, adds selector inits, replaces zero KV-cache fixtures with seeded normalized caches, and updates JIT comparison tolerances.
Compressor config
models/deepseek/v4/compressor_ratio4.py
Updates START_POS from 3 to 127, changing compression decision timing and derived indices used by the compressor and tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through RoPE rows with a little beat,
even and odd selectors tapping their feet.
Seeded caches sprout where zeros once lay,
CSA splits pools and quant scales come to play.
Tolerances tuned — the rabbit smiles today.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Update DeepSeek attention fixtures' directly summarizes the main change: updating fixture initialization and parameters across multiple DeepSeek attention modules (SWA, HCA, CSA) to pass RoPE selector tensors and update KV cache initialization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description accurately aligns with the changeset, covering all major modifications: sparse attention local RoPE selector tensor threading, KV cache initialization updates, and tolerance adjustments across SWA and HCA modules.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements sparse RoPE chunking for HCA and SWA attention mechanisms by introducing local selection tensors and related constants. The test suites for both modules were updated to use randomized, normalized KV caches instead of zero-initialized ones, and the precision tolerances were adjusted accordingly. I have no feedback to provide.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
models/deepseek/v4/attention_hca.py (1)

577-593: ⚡ Quick win

Consider extracting duplicated test helpers to a shared module.

The functions init_even_select_local, init_odd_select_local, and init_normalized_cache are duplicated identically in attention_swa.py (lines 444-460). Extracting these to a shared test utilities module would improve maintainability and ensure consistency if these initialization strategies evolve.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/attention_hca.py` around lines 577 - 593, The three helper
functions init_even_select_local, init_odd_select_local, and
init_normalized_cache are duplicated; extract them into a shared test utilities
module (e.g., tests.utils or models.deepseek.utils) and replace the local
definitions in attention_hca.py and attention_swa.py with imports from that
module; ensure the shared module exposes the same function names and signatures,
update any relative imports, and run tests to verify behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/attention_hca.py`:
- Around line 577-593: The three helper functions init_even_select_local,
init_odd_select_local, and init_normalized_cache are duplicated; extract them
into a shared test utilities module (e.g., tests.utils or models.deepseek.utils)
and replace the local definitions in attention_hca.py and attention_swa.py with
imports from that module; ensure the shared module exposes the same function
names and signatures, update any relative imports, and run tests to verify
behavior unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95a9a9ef-798e-405b-a78a-dd2b6dcb6616

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe6b76 and 7735545.

📒 Files selected for processing (2)
  • models/deepseek/v4/attention_hca.py
  • models/deepseek/v4/attention_swa.py

- Remove fixed seeds from SWA and HCA KV cache initialization
- Keep normalized non-zero cache values for decode attention precision coverage
@high-cloud high-cloud force-pushed the fix/deepseek-attention-fixtures branch from 151d74d to cf59218 Compare May 12, 2026 09:52
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
models/deepseek/v4/attention_csa_draft.py (1)

228-243: 💤 Low value

Pass an actual identity (or drop the parameter) instead of an uninitialized tensor named hadamard_identity.

hadamard_identity is created via pl.create_tensor(...) with no initializer and then handed to compressor(...). It works today only because compressor_ratio4.compressor has ROTATE=False and never reads the hadamard argument — but the name implies it’s an identity matrix and the golden_compressor explicitly passes torch.eye(HEAD_DIM, dtype=torch.bfloat16) for the same slot. If compressor ever consumes hadamard, this fixture will silently drive it with garbage.

♻️ One option: explicitly initialize, matching the golden
     cmp_out = pl.create_tensor([B, HEAD_DIM], dtype=pl.BF16)
-    hadamard_identity = pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16)
+    # `compressor_ratio4` has ROTATE=False and ignores `hadamard`, but the
+    # golden passes torch.eye; keep the kernel side consistent so a future
+    # ROTATE=True flip does not silently use uninitialized memory.
+    hadamard_identity = pl.full([HEAD_DIM, HEAD_DIM], dtype=pl.BF16, value=0.0)
+    # (or build an actual identity via an eye-like helper if available)

Alternatively, drop the hadamard parameter from the inline compressor signature since the ratio-4 path is fixed at ROTATE=False.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/attention_csa_draft.py` around lines 228 - 243, The
uninitialized tensor hadamard_identity passed into compressor should be a real
identity matrix (or the parameter removed); replace the
pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16) placeholder with an
explicit identity tensor matching HEAD_DIM and dtype (same semantics as
golden_compressor's torch.eye(..., dtype=torch.bfloat16)) so compressor(x_mixed,
..., hadamard_identity, ...) receives a valid identity, or if you prefer and
ROTATE is guaranteed False, remove the hadamard argument from the compressor
signature and all call sites (including the inline compressor and any
compressor_ratio4.compressor usages).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/attention_csa_draft.py`:
- Around line 228-243: The uninitialized tensor hadamard_identity passed into
compressor should be a real identity matrix (or the parameter removed); replace
the pl.create_tensor([HEAD_DIM, HEAD_DIM], dtype=pl.BF16) placeholder with an
explicit identity tensor matching HEAD_DIM and dtype (same semantics as
golden_compressor's torch.eye(..., dtype=torch.bfloat16)) so compressor(x_mixed,
..., hadamard_identity, ...) receives a valid identity, or if you prefer and
ROTATE is guaranteed False, remove the hadamard argument from the compressor
signature and all call sites (including the inline compressor and any
compressor_ratio4.compressor usages).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cde4e06-c00b-4506-af1a-ebe8b23d5f0c

📥 Commits

Reviewing files that changed from the base of the PR and between 7735545 and 151d74d.

📒 Files selected for processing (4)
  • models/deepseek/v4/attention_csa_draft.py
  • models/deepseek/v4/attention_hca.py
  • models/deepseek/v4/attention_swa.py
  • models/deepseek/v4/compressor_ratio4.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • models/deepseek/v4/attention_hca.py
  • models/deepseek/v4/attention_swa.py

@zhangqi-chen zhangqi-chen merged commit 056be6d into hw-native-sys:main May 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants