Add FLASH config support to DSv4 CSA attention by sjduan · Pull Request #296 · hw-native-sys/pypto-lib

sjduan · 2026-05-15T05:17:44Z

Summary

Change attention_csa.py to use FLASH config instead of DEMO
Rename compressor to indexer_compressor in indexer_compressor.py
Update indexer.py to use indexer_compressor and set IDX_TOPK from config (M.index_topk)
Update sparse_attn.py to align with FLASH config parameters

Dependencies:

Repo	Commit
pypto	cf8b954a (ci: bump Ascend CANN version from 8.5.0 to 9.0.0 #1374)
simpler	a94d5140 (Add: tool-smoke gate inside each DFX scene test #771)
pto-isa	4b0e4c8e ([Bugfix] A5 ttrans and tconcatidx bugfix for Blue Zone Assembly Line)

- Change attention_csa.py to use FLASH config instead of DEMO - Rename compressor to indexer_compressor in indexer_compressor.py - Update indexer.py to use indexer_compressor and set IDX_TOPK from config

gemini-code-assist · 2026-05-15T05:17:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-05-15T05:17:55Z

📝 Walkthrough

Walkthrough

The PR consolidates configuration, naming, and computation changes across the DeepSeek v4 attention stack. attention_csa.py switches to FLASH configuration; indexer_compressor.py and indexer.py rename and wire the compressor module with config-driven constants; sparse_attn.py vectorizes head-block attention computation.

Changes

DeepSeek v4 Refactoring and Optimization

Layer / File(s)	Summary
Configuration migration to FLASH `models/deepseek/v4/attention_csa.py`	Module constant `M` switches from `config.DEMO` to `config.FLASH`, propagating FLASH configuration values through all derived compile-time constants and tensor shape calculations.
Compressor module refactoring and indexer integration `models/deepseek/v4/indexer_compressor.py`, `models/deepseek/v4/indexer.py`	Compressor kernel renamed from `compressor` to `indexer_compressor` with updated imports and call sites; `IDX_TOPK` constant updated to use `M.index_topk` from model config instead of hardcoded value.
Sparse attention head-block vectorization `models/deepseek/v4/sparse_attn.py`	Attention projection loop changes from single-head iteration to `MATMUL_ROW_PAD`-sized head blocks; `q_batch`/`kv_batch` creation updated for block parallelism; sink bias computation vectorized with `attn_stage_row` reshaped from `[1, 1]` to `[MATMUL_ROW_PAD, 1]`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

hw-native-sys/pypto-lib#289: Matches the CSA orchestration and config-driven changes to attention_csa.py and compressor wiring in the indexer pipeline.
hw-native-sys/pypto-lib#265: Introduces the indexer-compressor integration path that this PR's compressor renaming and wiring directly builds upon.
hw-native-sys/pypto-lib#256: Updates sparse_attn.py fixture and RoPE threading in the same attention computation path affected by this PR's head-block vectorization.

Poem

🐰 A rabbit hops through configs bright,
From DEMO's shade to FLASH's light!
The compressor dons a fancier name,
While heads now dance in vectored flame—
Attention optimized, efficiency gained! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically summarizes the main change: adding FLASH config support to DSv4 CSA attention components.
Description check	✅ Passed	The description is directly related to the changeset, providing clear bullet points of the modifications made across four files to support FLASH configuration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

models/deepseek/v4/sparse_attn.py (1)

179-220: ⚡ Quick win

Guard the head-block tail.

This loop now assumes H is a multiple of MATMUL_ROW_PAD. The q_flat/attn_sink slices and the final assemble are all unconditional h0 : h0 + MATMUL_ROW_PAD, so the last block will go out of bounds if that stops being true.

Suggested guard

 MATMUL_ROW_PAD = 16
+assert H % MATMUL_ROW_PAD == 0, (
+    f"num_attention_heads={H} must be divisible by MATMUL_ROW_PAD={MATMUL_ROW_PAD}"
+)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/sparse_attn.py` around lines 179 - 220, The loop over h0
assumes H is divisible by MATMUL_ROW_PAD causing out-of-bounds slices on q_flat,
attn_sink and oi_out; modify the loop body in the parallel block that uses
q_flat, kv_topk_batch, attn_sink and oi_out (symbols: H, MATMUL_ROW_PAD, q_flat,
kv_topk_batch, attn_sink, oi_out, attn_stage_row) to compute a tail_len =
min(MATMUL_ROW_PAD, H - h0) and use that length for all slices and reshapes (or
explicitly pad temporary buffers to MATMUL_ROW_PAD and mask results) so the
final cast/assemble only indexes 0:tail_len where needed; ensure all per-block
computations (q_batch, kv_batch, oi, li, mi, sink_bias, oi_out and the final
attn_stage_row write) respect tail_len to avoid OOB accesses.

models/deepseek/v4/indexer.py (1)

35-35: ⚡ Quick win

Assert index_topk fits the score buffer.

Now that IDX_TOPK comes from config, the later top-k path assumes IDX_TOPK <= SCORE_LEN without checking it. A larger value will overrun the sorted-pair slice contract and the fixed [1, IDX_TOPK] scratch shape.

Suggested guard

 IDX_TOPK = M.index_topk
+assert 0 <= IDX_TOPK <= SCORE_LEN, (
+    f"index_topk={IDX_TOPK} must satisfy 0 <= index_topk <= SCORE_LEN={SCORE_LEN}"
+)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/deepseek/v4/indexer.py` at line 35, The code sets IDX_TOPK =
M.index_topk but never validates it against the fixed score buffer length,
risking buffer overruns; add a guard where IDX_TOPK is derived (near IDX_TOPK /
M.index_topk) that checks IDX_TOPK <= SCORE_LEN and either clamp it (IDX_TOPK =
min(M.index_topk, SCORE_LEN)) or raise a clear ValueError/Assertion if
M.index_topk > SCORE_LEN, and update any dependent assumptions about the
sorted-pair slice/scratch shape ([1, IDX_TOPK]) accordingly so callers using
IDX_TOPK cannot exceed the SCORE_LEN buffer.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@models/deepseek/v4/indexer.py`:
- Line 35: The code sets IDX_TOPK = M.index_topk but never validates it against
the fixed score buffer length, risking buffer overruns; add a guard where
IDX_TOPK is derived (near IDX_TOPK / M.index_topk) that checks IDX_TOPK <=
SCORE_LEN and either clamp it (IDX_TOPK = min(M.index_topk, SCORE_LEN)) or raise
a clear ValueError/Assertion if M.index_topk > SCORE_LEN, and update any
dependent assumptions about the sorted-pair slice/scratch shape ([1, IDX_TOPK])
accordingly so callers using IDX_TOPK cannot exceed the SCORE_LEN buffer.

In `@models/deepseek/v4/sparse_attn.py`:
- Around line 179-220: The loop over h0 assumes H is divisible by MATMUL_ROW_PAD
causing out-of-bounds slices on q_flat, attn_sink and oi_out; modify the loop
body in the parallel block that uses q_flat, kv_topk_batch, attn_sink and oi_out
(symbols: H, MATMUL_ROW_PAD, q_flat, kv_topk_batch, attn_sink, oi_out,
attn_stage_row) to compute a tail_len = min(MATMUL_ROW_PAD, H - h0) and use that
length for all slices and reshapes (or explicitly pad temporary buffers to
MATMUL_ROW_PAD and mask results) so the final cast/assemble only indexes
0:tail_len where needed; ensure all per-block computations (q_batch, kv_batch,
oi, li, mi, sink_bias, oi_out and the final attn_stage_row write) respect
tail_len to avoid OOB accesses.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 09000a80-d75d-4290-97f7-467ffa40b819

📥 Commits

Reviewing files that changed from the base of the PR and between aee5258 and 4e615d9.

📒 Files selected for processing (4)

models/deepseek/v4/attention_csa.py
models/deepseek/v4/indexer.py
models/deepseek/v4/indexer_compressor.py
models/deepseek/v4/sparse_attn.py

Add FLASH config support to DSv4 CSA attention

4e615d9

- Change attention_csa.py to use FLASH config instead of DEMO - Rename compressor to indexer_compressor in indexer_compressor.py - Update indexer.py to use indexer_compressor and set IDX_TOPK from config

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

zhangqi-chen merged commit 4f8388c into hw-native-sys:main May 15, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FLASH config support to DSv4 CSA attention#296

Add FLASH config support to DSv4 CSA attention#296
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
sjduan:feat/dsv4-csa-flash-support

sjduan commented May 15, 2026

Uh oh!

gemini-code-assist Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sjduan commented May 15, 2026

Summary

Uh oh!

gemini-code-assist Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 15, 2026 •

edited

Loading