Skip to content

Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE#253

Merged
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
sjduan:feat/dsv4-sparse-attn-interleave-inverse-rope
May 12, 2026
Merged

Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE#253
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
sjduan:feat/dsv4-sparse-attn-interleave-inverse-rope

Conversation

@sjduan
Copy link
Copy Markdown
Contributor

@sjduan sjduan commented May 11, 2026

Summary

  • Refactor inverse RoPE to use interleave pattern instead of split-half layout
  • Add even_select and odd_select selector matrices to extract even/odd lanes from interleaved RoPE dimension via matmul
  • Use pl.load/pl.store with pl.unroll to reassemble even/odd results back into interleaved output layout
  • Separate attention, RoPE extraction, RoPE application, and packing stages into distinct loop blocks for clearer orchestration
  • Update torch golden to match the interleave execution order using unflatten + stack + flatten

Dependencies:

Repo Commit
pypto e2e9257f (feat: Add tensor/tile sin/cos ops with LowerCompositeOps pass #1328)
simpler 551a79c (Refactor: decouple AICore L2Perf writes via stable per-core staging ring #709)
pto-isa 881379ce (docs: sync flash attention performance references)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a80c3232-93c8-4599-9e2c-9af4961ce7a7

📥 Commits

Reviewing files that changed from the base of the PR and between f1e24b1 and 9b8ff7e.

📒 Files selected for processing (1)
  • models/deepseek/v4/sparse_attn.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • models/deepseek/v4/sparse_attn.py

📝 Walkthrough

Walkthrough

Updates DeepSeek-V4 sparse decode attention to compute inverse-RoPE using precomputed even/odd selector tensors, replaces the previous inverse-RoPE path with selector-driven chunked matmuls and interleaving, updates the PyTorch golden reference, and extends the test harness to generate and pass selector inputs.

Changes

Inverse-RoPE Selector-Based Computation

Layer / File(s) Summary
Constants and chunking
models/deepseek/v4/sparse_attn.py
Adds ROPE_CHUNK/ROPE_INTERLEAVE_CHUNK constants for selector-based chunked ROPE processing.
Function signature and test forwarding
models/deepseek/v4/sparse_attn.py
Extends sparse_attn and sparse_attn_test signatures to accept even_select_local and odd_select_local and forwards them through the test harness.
Intermediate buffers
models/deepseek/v4/sparse_attn.py
Declares new temporaries (o_proj_even, o_proj_odd, rope_even, rope_odd, o_rope_interleave) to hold even/odd projections and interleaved ROPE.
Attention loop rename
models/deepseek/v4/sparse_attn.py
Updates per-head attention-row variable naming/handling to attn_head_row.
Inverse-RoPE core
models/deepseek/v4/sparse_attn.py
Slices ROPE lanes from attention stage, computes even/odd accumulators via selector-driven chunked matmuls, applies inverse-rotation mixing (freqs_cos/freqs_sin), interleaves reconstructed ROPE lanes, and packs NOPE+ROPE for grouped projections.
Reference implementation
models/deepseek/v4/sparse_attn.py
golden_sparse_attn updated to pair ROPE lanes into even/odd, apply half-ROPE cosine/sine mixing, and re-interleave into final o layout.
Test selector constructors & specs
models/deepseek/v4/sparse_attn.py
Adds init_odd_select_local() and init_even_select_local() deterministic constructors and includes TensorSpec entries for the selectors in build_tensor_specs().

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hw-native-sys/pypto-lib#209: Modifies DeepSeek V4 sparse_attn/inverse-RoPE decode path; closely related to selector-driven changes.
  • hw-native-sys/pypto-lib#225: Adds fused decode sparse-attn with grouped projection and inverse-RoPE logic that overlaps with the selector pattern here.
  • hw-native-sys/pypto-lib#244: Threads selector matrices through DeepSeek V4 decode/SWA stack; implements similar interleaved even/odd RoPE flow.

Poem

🐰 I nibble lanes of RoPE with delight,
Even and odd dance, chunked through the night,
Selectors whisper, matmuls hum along,
Cos and sin mend the inverse song,
Interleaved output, snug and bright.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly and clearly describes the main change: refactoring the inverse RoPE logic in DSv4 decode sparse attention to use an interleave pattern, which matches the core objective of the changeset.
Description check ✅ Passed The PR description provides relevant context about the refactoring, including specific technical details about selector matrices, load/store operations, loop block separation, and Torch golden updates that align with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
models/deepseek/v4/sparse_attn.py (1)

19-27: ⚡ Quick win

Document the new selector parameters.

The docstring should be updated to include the new even_select and odd_select parameters for completeness.

📝 Proposed documentation addition
 Inputs:
 - q                 : query tensor from MLA prolog (RoPE already applied)
 - ori_kv / cmp_kv   : paged sliding-window and paged compressed KV pools
 - cmp_sparse_indices: per-token absolute indices (window + compressed concat)
                       computed by orchestrator (window topk + indexer/HCA topk)
 - attn_sink         : per-head sink term added inside softmax
 - freqs_cos/sin     : split-half inverse-RoPE tables for the sparse-attn output
+- even_select       : selector matrix to extract even lanes from interleaved RoPE dimension
+- odd_select        : selector matrix to extract odd lanes from interleaved RoPE dimension
 - wo_a              : grouped first-stage output-projection weights from model.py:537-541
 - wo_b / wo_b_scale : grouped second-stage output-projection W8 per-channel weights
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/sparse_attn.py` around lines 19 - 27, Update the
module/function docstring that lists "Inputs:" in sparse_attn.py to document the
two new selector parameters even_select and odd_select: specify their names,
expected types (e.g., boolean mask or index tensor), shape/semantics (per-head
or per-token selector that chooses even/odd positions for the sparse attention
pipeline), and how they affect the attention computation (which indices are
included/excluded or routed to ori_kv/cmp_kv). Place these descriptions
alongside the existing Input entries (near q, ori_kv/cmp_kv, cmp_sparse_indices,
attn_sink, freqs_cos/sin, wo_a, wo_b/wo_b_scale) so the docstring fully reflects
the current function signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 271-273: Remove the no-op loop "for r0 in pl.range(ROPE_CHUNK,
HALF_ROPE, ROPE_CHUNK): pass" — it is dead code; delete that loop from the
function (search for the exact statement in models/deepseek/v4/sparse_attn.py)
so the surrounding logic remains unchanged and no-op iterations are eliminated.

---

Nitpick comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 19-27: Update the module/function docstring that lists "Inputs:"
in sparse_attn.py to document the two new selector parameters even_select and
odd_select: specify their names, expected types (e.g., boolean mask or index
tensor), shape/semantics (per-head or per-token selector that chooses even/odd
positions for the sparse attention pipeline), and how they affect the attention
computation (which indices are included/excluded or routed to ori_kv/cmp_kv).
Place these descriptions alongside the existing Input entries (near q,
ori_kv/cmp_kv, cmp_sparse_indices, attn_sink, freqs_cos/sin, wo_a,
wo_b/wo_b_scale) so the docstring fully reflects the current function signature.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5416a047-df9b-4117-a998-b29373e088ba

📥 Commits

Reviewing files that changed from the base of the PR and between 5259b91 and f1e24b1.

📒 Files selected for processing (1)
  • models/deepseek/v4/sparse_attn.py

Comment thread models/deepseek/v4/sparse_attn.py Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Rotary Positional Embedding (RoPE) logic in the sparse_attn module to implement an interleaved layout using new even_select and odd_select tensors. Feedback suggests optimizing the RoPE assembly stage by replacing manual row-by-row interleaving with a matrix multiplication pattern to better leverage hardware units. Additionally, an empty loop leftover from development should be removed.

Comment thread models/deepseek/v4/sparse_attn.py Outdated
Comment thread models/deepseek/v4/sparse_attn.py Outdated
Comment on lines +271 to +272
for r0 in pl.range(ROPE_CHUNK, HALF_ROPE, ROPE_CHUNK):
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop is empty and appears to be a leftover from development. It should be removed to keep the code clean.

…erse RoPE

- Refactor inverse RoPE to use interleave pattern instead of split-half layout
- Add even_select and odd_select selector matrices to extract even/odd lanes
  from interleaved RoPE dimension via matmul
- Use pl.load/pl.store with pl.unroll to reassemble even/odd results back
  into interleaved output layout
- Separate attention, RoPE extraction, RoPE application, and packing stages
  into distinct loop blocks for clearer orchestration
- Update torch golden to match interleave execution order using unflatten+stack+flatten
@sjduan sjduan force-pushed the feat/dsv4-sparse-attn-interleave-inverse-rope branch from f1e24b1 to 9b8ff7e Compare May 12, 2026 03:28
@zhangqi-chen zhangqi-chen merged commit 2fe6b76 into hw-native-sys:main May 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants