[Add] Add operator code and skill modifications for seer_attention and block_sparse_attn.#983
[Add] Add operator code and skill modifications for seer_attention and block_sparse_attn.#983ArmandAlbert wants to merge 4 commits intotile-ai:ascendc_ptofrom
Conversation
…d block_sparse_attn.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Code Review
This pull request updates the TileLang-Ascend operator design documentation with new constraints regarding GEMM non-divisible shapes and L0C capacity limits, while also adding a block-sparse attention example. The review identified several critical issues: a potential out-of-bounds access in the new attention kernel when dimensions are not multiples of the block size, logic errors in the documentation regarding zero-block conditions, and inconsistencies in the mandatory checklist for operator design. Additionally, it was suggested to replace the use of infinity with a large concrete negative value for better NPU compiler compatibility.
| T.tile.div(acc_o[h_i, :], acc_o[h_i, :], sumexp[h_i]) | ||
|
|
||
| T.copy(acc_o, acc_o_half) | ||
| T.copy(acc_o_half, Output[bz, by, bx * block_M + vid * block_M // 2 : bx * block_M + vid * block_M // 2 + block_M // 2, :]) |
There was a problem hiding this comment.
This T.copy operation poses an out-of-bounds (OOB) risk if seq_q is not a multiple of block_M. For example, if seq_q=100 and block_M=64, the second block will attempt to write to indices 64:128, exceeding the tensor boundary. This violates the rule added in SKILL.md (Item 14) requiring explicit handling of non-divisible shapes. You should either implement a tail-handling strategy (like using T.min for the slice end or validRow logic) or ensure the input is padded on the host side.
| | 16 | **函数无全局变量依赖**:维度参数从 tensor shape 或函数参数获取,支持多场景顺序测试 | ⭕ 推荐 | | ||
|
|
||
| **通过条件**:必须项(1, 2, 3, 7, 8, 9)全部通过,推荐项(4, 5, 6, 10)至少通过 3/4。 | ||
| **通过条件**:必须项(1, 2, 3, 7, 8, 9, 14)全部通过,推荐项至少通过 4/9。 |
There was a problem hiding this comment.
The list of mandatory items is missing item 14 (which was just added) and item 13 (which is marked as mandatory on line 304). Please update the pass criteria to include all required checks.
| **通过条件**:必须项(1, 2, 3, 7, 8, 9, 14)全部通过,推荐项至少通过 4/9。 | |
| **通过条件**:必须项(1, 2, 3, 7, 8, 9, 13, 14)全部通过,推荐项至少通过 4/9。 |
| # === Vector: Online softmax + output accumulation === | ||
| T.tile.fill(acc_o, 0.0) | ||
| T.tile.fill(sumexp, 0.0) | ||
| T.tile.fill(m_i, -T.infinity(accum_dtype)) |
There was a problem hiding this comment.
Using T.infinity might lead to compatibility issues or unexpected behavior on some NPU compiler versions. It is safer and more consistent with other examples in this repository (e.g., flash_attn_bhsd_cc_sync.py) to use a large concrete negative value like -2**30 for masking purposes.
| T.tile.fill(m_i, -T.infinity(accum_dtype)) | |
| T.tile.fill(m_i, -2**30) |
|
/re-test |
|
🔄 Re-running failed jobs Original workflow run: View details Only the failed jobs will be re-executed. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Add operator code and skill modifications for seer_attention and block_sparse_attn.