Skip to content

[Add] Add operator code and skill modifications for seer_attention and block_sparse_attn.#983

Open
ArmandAlbert wants to merge 4 commits intotile-ai:ascendc_ptofrom
ArmandAlbert:seer_attn
Open

[Add] Add operator code and skill modifications for seer_attention and block_sparse_attn.#983
ArmandAlbert wants to merge 4 commits intotile-ai:ascendc_ptofrom
ArmandAlbert:seer_attn

Conversation

@ArmandAlbert
Copy link
Copy Markdown
Collaborator

Add operator code and skill modifications for seer_attention and block_sparse_attn.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the TileLang-Ascend operator design documentation with new constraints regarding GEMM non-divisible shapes and L0C capacity limits, while also adding a block-sparse attention example. The review identified several critical issues: a potential out-of-bounds access in the new attention kernel when dimensions are not multiples of the block size, logic errors in the documentation regarding zero-block conditions, and inconsistencies in the mandatory checklist for operator design. Additionally, it was suggested to replace the use of infinity with a large concrete negative value for better NPU compiler compatibility.

T.tile.div(acc_o[h_i, :], acc_o[h_i, :], sumexp[h_i])

T.copy(acc_o, acc_o_half)
T.copy(acc_o_half, Output[bz, by, bx * block_M + vid * block_M // 2 : bx * block_M + vid * block_M // 2 + block_M // 2, :])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This T.copy operation poses an out-of-bounds (OOB) risk if seq_q is not a multiple of block_M. For example, if seq_q=100 and block_M=64, the second block will attempt to write to indices 64:128, exceeding the tensor boundary. This violates the rule added in SKILL.md (Item 14) requiring explicit handling of non-divisible shapes. You should either implement a tail-handling strategy (like using T.min for the slice end or validRow logic) or ensure the input is padded on the host side.

Comment thread .agents/skills/tilelang-op-design/SKILL.md Outdated
| 16 | **函数无全局变量依赖**:维度参数从 tensor shape 或函数参数获取,支持多场景顺序测试 | ⭕ 推荐 |

**通过条件**:必须项(1, 2, 3, 7, 8, 9)全部通过,推荐项(4, 5, 6, 10)至少通过 3/4
**通过条件**:必须项(1, 2, 3, 7, 8, 9, 14)全部通过,推荐项至少通过 4/9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of mandatory items is missing item 14 (which was just added) and item 13 (which is marked as mandatory on line 304). Please update the pass criteria to include all required checks.

Suggested change
**通过条件**:必须项(1, 2, 3, 7, 8, 9, 14)全部通过,推荐项至少通过 4/9。
**通过条件**:必须项(1, 2, 3, 7, 8, 9, 13, 14)全部通过,推荐项至少通过 4/9。

# === Vector: Online softmax + output accumulation ===
T.tile.fill(acc_o, 0.0)
T.tile.fill(sumexp, 0.0)
T.tile.fill(m_i, -T.infinity(accum_dtype))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using T.infinity might lead to compatibility issues or unexpected behavior on some NPU compiler versions. It is safer and more consistent with other examples in this repository (e.g., flash_attn_bhsd_cc_sync.py) to use a large concrete negative value like -2**30 for masking purposes.

Suggested change
T.tile.fill(m_i, -T.infinity(accum_dtype))
T.tile.fill(m_i, -2**30)

@ArmandAlbert
Copy link
Copy Markdown
Collaborator Author

/re-test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

🔄 Re-running failed jobs

Original workflow run: View details

Only the failed jobs will be re-executed.

ArmandAlbert and others added 2 commits May 8, 2026 14:16
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants