Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE by sjduan · Pull Request #253 · hw-native-sys/pypto-lib

sjduan · 2026-05-11T12:20:06Z

Summary

Refactor inverse RoPE to use interleave pattern instead of split-half layout
Add even_select and odd_select selector matrices to extract even/odd lanes from interleaved RoPE dimension via matmul
Use pl.load/pl.store with pl.unroll to reassemble even/odd results back into interleaved output layout
Separate attention, RoPE extraction, RoPE application, and packing stages into distinct loop blocks for clearer orchestration
Update torch golden to match the interleave execution order using unflatten + stack + flatten

Dependencies:

Repo	Commit
pypto	e2e9257f (feat: Add tensor/tile sin/cos ops with LowerCompositeOps pass #1328)
simpler	551a79c (Refactor: decouple AICore L2Perf writes via stable per-core staging ring #709)
pto-isa	881379ce (docs: sync flash attention performance references)

coderabbitai · 2026-05-11T12:20:22Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a80c3232-93c8-4599-9e2c-9af4961ce7a7

📥 Commits

Reviewing files that changed from the base of the PR and between f1e24b1 and 9b8ff7e.

📒 Files selected for processing (1)

models/deepseek/v4/sparse_attn.py

🚧 Files skipped from review as they are similar to previous changes (1)

models/deepseek/v4/sparse_attn.py

📝 Walkthrough

Walkthrough

Updates DeepSeek-V4 sparse decode attention to compute inverse-RoPE using precomputed even/odd selector tensors, replaces the previous inverse-RoPE path with selector-driven chunked matmuls and interleaving, updates the PyTorch golden reference, and extends the test harness to generate and pass selector inputs.

Changes

Inverse-RoPE Selector-Based Computation

Layer / File(s)	Summary
Constants and chunking `models/deepseek/v4/sparse_attn.py`	Adds `ROPE_CHUNK`/`ROPE_INTERLEAVE_CHUNK` constants for selector-based chunked ROPE processing.
Function signature and test forwarding `models/deepseek/v4/sparse_attn.py`	Extends `sparse_attn` and `sparse_attn_test` signatures to accept `even_select_local` and `odd_select_local` and forwards them through the test harness.
Intermediate buffers `models/deepseek/v4/sparse_attn.py`	Declares new temporaries (`o_proj_even`, `o_proj_odd`, `rope_even`, `rope_odd`, `o_rope_interleave`) to hold even/odd projections and interleaved ROPE.
Attention loop rename `models/deepseek/v4/sparse_attn.py`	Updates per-head attention-row variable naming/handling to `attn_head_row`.
Inverse-RoPE core `models/deepseek/v4/sparse_attn.py`	Slices ROPE lanes from attention stage, computes even/odd accumulators via selector-driven chunked matmuls, applies inverse-rotation mixing (`freqs_cos`/`freqs_sin`), interleaves reconstructed ROPE lanes, and packs NOPE+ROPE for grouped projections.
Reference implementation `models/deepseek/v4/sparse_attn.py`	`golden_sparse_attn` updated to pair ROPE lanes into even/odd, apply half-ROPE cosine/sine mixing, and re-interleave into final `o` layout.
Test selector constructors & specs `models/deepseek/v4/sparse_attn.py`	Adds `init_odd_select_local()` and `init_even_select_local()` deterministic constructors and includes `TensorSpec` entries for the selectors in `build_tensor_specs()`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

hw-native-sys/pypto-lib#209: Modifies DeepSeek V4 sparse_attn/inverse-RoPE decode path; closely related to selector-driven changes.
hw-native-sys/pypto-lib#225: Adds fused decode sparse-attn with grouped projection and inverse-RoPE logic that overlaps with the selector pattern here.
hw-native-sys/pypto-lib#244: Threads selector matrices through DeepSeek V4 decode/SWA stack; implements similar interleaved even/odd RoPE flow.

Poem

🐰 I nibble lanes of RoPE with delight,
Even and odd dance, chunked through the night,
Selectors whisper, matmuls hum along,
Cos and sin mend the inverse song,
Interleaved output, snug and bright.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title directly and clearly describes the main change: refactoring the inverse RoPE logic in DSv4 decode sparse attention to use an interleave pattern, which matches the core objective of the changeset.
Description check	✅ Passed	The PR description provides relevant context about the refactoring, including specific technical details about selector matrices, load/store operations, loop block separation, and Torch golden updates that align with the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

models/deepseek/v4/sparse_attn.py (1)

19-27: ⚡ Quick win

Document the new selector parameters.

The docstring should be updated to include the new even_select and odd_select parameters for completeness.

📝 Proposed documentation addition

 Inputs:
 - q                 : query tensor from MLA prolog (RoPE already applied)
 - ori_kv / cmp_kv   : paged sliding-window and paged compressed KV pools
 - cmp_sparse_indices: per-token absolute indices (window + compressed concat)
                       computed by orchestrator (window topk + indexer/HCA topk)
 - attn_sink         : per-head sink term added inside softmax
 - freqs_cos/sin     : split-half inverse-RoPE tables for the sparse-attn output
+- even_select       : selector matrix to extract even lanes from interleaved RoPE dimension
+- odd_select        : selector matrix to extract odd lanes from interleaved RoPE dimension
 - wo_a              : grouped first-stage output-projection weights from model.py:537-541
 - wo_b / wo_b_scale : grouped second-stage output-projection W8 per-channel weights

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/sparse_attn.py` around lines 19 - 27, Update the
module/function docstring that lists "Inputs:" in sparse_attn.py to document the
two new selector parameters even_select and odd_select: specify their names,
expected types (e.g., boolean mask or index tensor), shape/semantics (per-head
or per-token selector that chooses even/odd positions for the sparse attention
pipeline), and how they affect the attention computation (which indices are
included/excluded or routed to ori_kv/cmp_kv). Place these descriptions
alongside the existing Input entries (near q, ori_kv/cmp_kv, cmp_sparse_indices,
attn_sink, freqs_cos/sin, wo_a, wo_b/wo_b_scale) so the docstring fully reflects
the current function signature.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 271-273: Remove the no-op loop "for r0 in pl.range(ROPE_CHUNK,
HALF_ROPE, ROPE_CHUNK): pass" — it is dead code; delete that loop from the
function (search for the exact statement in models/deepseek/v4/sparse_attn.py)
so the surrounding logic remains unchanged and no-op iterations are eliminated.

---

Nitpick comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 19-27: Update the module/function docstring that lists "Inputs:"
in sparse_attn.py to document the two new selector parameters even_select and
odd_select: specify their names, expected types (e.g., boolean mask or index
tensor), shape/semantics (per-head or per-token selector that chooses even/odd
positions for the sparse attention pipeline), and how they affect the attention
computation (which indices are included/excluded or routed to ori_kv/cmp_kv).
Place these descriptions alongside the existing Input entries (near q,
ori_kv/cmp_kv, cmp_sparse_indices, attn_sink, freqs_cos/sin, wo_a,
wo_b/wo_b_scale) so the docstring fully reflects the current function signature.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5416a047-df9b-4117-a998-b29373e088ba

📥 Commits

Reviewing files that changed from the base of the PR and between 5259b91 and f1e24b1.

📒 Files selected for processing (1)

models/deepseek/v4/sparse_attn.py

gemini-code-assist

Code Review

This pull request refactors the Rotary Positional Embedding (RoPE) logic in the sparse_attn module to implement an interleaved layout using new even_select and odd_select tensors. Feedback suggests optimizing the RoPE assembly stage by replacing manual row-by-row interleaving with a matrix multiplication pattern to better leverage hardware units. Additionally, an empty loop leftover from development should be removed.

gemini-code-assist · 2026-05-11T12:26:15Z

+            for r0 in pl.range(ROPE_CHUNK, HALF_ROPE, ROPE_CHUNK):
+                pass


This loop is empty and appears to be a leftover from development. It should be removed to keep the code clean.

…erse RoPE - Refactor inverse RoPE to use interleave pattern instead of split-half layout - Add even_select and odd_select selector matrices to extract even/odd lanes from interleaved RoPE dimension via matmul - Use pl.load/pl.store with pl.unroll to reassemble even/odd results back into interleaved output layout - Separate attention, RoPE extraction, RoPE application, and packing stages into distinct loop blocks for clearer orchestration - Update torch golden to match interleave execution order using unflatten+stack+flatten

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread models/deepseek/v4/sparse_attn.py Outdated

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

sjduan force-pushed the feat/dsv4-sparse-attn-interleave-inverse-rope branch from f1e24b1 to 9b8ff7e Compare May 12, 2026 03:28

zhangqi-chen merged commit 2fe6b76 into hw-native-sys:main May 12, 2026
6 checks passed

This was referenced May 12, 2026

Update DeepSeek attention fixtures #256

Merged

Fix scalar-indexed slices in V3.2 / Qwen3-32B + sparse_attn rope refactor #288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE#253

Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE#253
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
sjduan:feat/dsv4-sparse-attn-interleave-inverse-rope

sjduan commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sjduan commented May 11, 2026

Summary

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 11, 2026 •

edited

Loading