Refactor: use interleave pattern for DSv4 decode sparse attention inverse RoPE#253
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughUpdates DeepSeek-V4 sparse decode attention to compute inverse-RoPE using precomputed even/odd selector tensors, replaces the previous inverse-RoPE path with selector-driven chunked matmuls and interleaving, updates the PyTorch golden reference, and extends the test harness to generate and pass selector inputs. ChangesInverse-RoPE Selector-Based Computation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
models/deepseek/v4/sparse_attn.py (1)
19-27: ⚡ Quick winDocument the new selector parameters.
The docstring should be updated to include the new
even_selectandodd_selectparameters for completeness.📝 Proposed documentation addition
Inputs: - q : query tensor from MLA prolog (RoPE already applied) - ori_kv / cmp_kv : paged sliding-window and paged compressed KV pools - cmp_sparse_indices: per-token absolute indices (window + compressed concat) computed by orchestrator (window topk + indexer/HCA topk) - attn_sink : per-head sink term added inside softmax - freqs_cos/sin : split-half inverse-RoPE tables for the sparse-attn output +- even_select : selector matrix to extract even lanes from interleaved RoPE dimension +- odd_select : selector matrix to extract odd lanes from interleaved RoPE dimension - wo_a : grouped first-stage output-projection weights from model.py:537-541 - wo_b / wo_b_scale : grouped second-stage output-projection W8 per-channel weights🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/deepseek/v4/sparse_attn.py` around lines 19 - 27, Update the module/function docstring that lists "Inputs:" in sparse_attn.py to document the two new selector parameters even_select and odd_select: specify their names, expected types (e.g., boolean mask or index tensor), shape/semantics (per-head or per-token selector that chooses even/odd positions for the sparse attention pipeline), and how they affect the attention computation (which indices are included/excluded or routed to ori_kv/cmp_kv). Place these descriptions alongside the existing Input entries (near q, ori_kv/cmp_kv, cmp_sparse_indices, attn_sink, freqs_cos/sin, wo_a, wo_b/wo_b_scale) so the docstring fully reflects the current function signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 271-273: Remove the no-op loop "for r0 in pl.range(ROPE_CHUNK,
HALF_ROPE, ROPE_CHUNK): pass" — it is dead code; delete that loop from the
function (search for the exact statement in models/deepseek/v4/sparse_attn.py)
so the surrounding logic remains unchanged and no-op iterations are eliminated.
---
Nitpick comments:
In `@models/deepseek/v4/sparse_attn.py`:
- Around line 19-27: Update the module/function docstring that lists "Inputs:"
in sparse_attn.py to document the two new selector parameters even_select and
odd_select: specify their names, expected types (e.g., boolean mask or index
tensor), shape/semantics (per-head or per-token selector that chooses even/odd
positions for the sparse attention pipeline), and how they affect the attention
computation (which indices are included/excluded or routed to ori_kv/cmp_kv).
Place these descriptions alongside the existing Input entries (near q,
ori_kv/cmp_kv, cmp_sparse_indices, attn_sink, freqs_cos/sin, wo_a,
wo_b/wo_b_scale) so the docstring fully reflects the current function signature.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5416a047-df9b-4117-a998-b29373e088ba
📒 Files selected for processing (1)
models/deepseek/v4/sparse_attn.py
There was a problem hiding this comment.
Code Review
This pull request refactors the Rotary Positional Embedding (RoPE) logic in the sparse_attn module to implement an interleaved layout using new even_select and odd_select tensors. Feedback suggests optimizing the RoPE assembly stage by replacing manual row-by-row interleaving with a matrix multiplication pattern to better leverage hardware units. Additionally, an empty loop leftover from development should be removed.
| for r0 in pl.range(ROPE_CHUNK, HALF_ROPE, ROPE_CHUNK): | ||
| pass |
…erse RoPE - Refactor inverse RoPE to use interleave pattern instead of split-half layout - Add even_select and odd_select selector matrices to extract even/odd lanes from interleaved RoPE dimension via matmul - Use pl.load/pl.store with pl.unroll to reassemble even/odd results back into interleaved output layout - Separate attention, RoPE extraction, RoPE application, and packing stages into distinct loop blocks for clearer orchestration - Update torch golden to match interleave execution order using unflatten+stack+flatten
f1e24b1 to
9b8ff7e
Compare
Summary
even_selectandodd_selectselector matrices to extract even/odd lanes from interleaved RoPE dimension via matmulpl.load/pl.storewithpl.unrollto reassemble even/odd results back into interleaved output layoutunflatten+stack+flattenDependencies: