Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .agents/evals/minimal_skill_trigger_eval.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{"id":"overview-01","prompt":"解释 npuir 分支里 target=npuir 的完整编译链路,从 lower 到 codegen","expected_skill":"tilelang-npuir-overview"}
{"id":"overview-02","prompt":"Developer 模式和 Expert 模式在 npuir 下有什么区别,怎么切换","expected_skill":"tilelang-npuir-overview"}
{"id":"vector-01","prompt":"帮我写一个 vector add,默认用 vadd 不要 npuir_add","expected_skill":"tilelang-vector-skill"}
{"id":"vector-02","prompt":"实现 sigmoid 的向量版本,用 vexp vadd vrec 这类接口","expected_skill":"tilelang-vector-skill"}
{"id":"cube-01","prompt":"写一个 matmul,要求用 load_nd2nz 和 store_fixpipe","expected_skill":"tilelang-cube-skill"}
{"id":"cube-02","prompt":"Cube 路径下 L1 和 L0C 怎么分配,GEMM initC 怎么控制","expected_skill":"tilelang-cube-skill"}
{"id":"mixcv-01","prompt":"实现 flash attention 的 mixcv 流水,Cube 阶段和 Vector 阶段要同步","expected_skill":"tilelang-mixcv-skill"}
{"id":"mixcv-02","prompt":"sync_block_set 和 sync_block_wait 在融合算子里怎么配对","expected_skill":"tilelang-mixcv-skill"}
{"id":"mlir-01","prompt":"tilelangir-cv-split 和 tilelangir-vectorize 在哪一层执行,如何调试","expected_skill":"tilelang-mlir-skill"}
{"id":"mlir-02","prompt":"帮我定位 MLIR pass pipeline 哪一步把 IR 变坏了","expected_skill":"tilelang-mlir-skill"}
{"id":"debug-01","prompt":"这个 npuir kernel 结果不对,想通过 T.print 和 IR dump 定位","expected_skill":"tilelang-debug-helper"}
{"id":"debug-02","prompt":"需要给 Python 脚本加 GDB 附加点来定位崩溃","expected_skill":"tilelang-debug-helper"}
{"id":"fixer-01","prompt":"编译时报错并且出现段错误,帮我定位并给修复方案","expected_skill":"tilelang-error-fixer"}
{"id":"review-01","prompt":"请对这个 PR 做代码审查,优先看行为回归和测试缺口","expected_skill":"tilelang-review-skill"}
{"id":"review-02","prompt":"提交前帮我跑一遍格式和 lint 检查,看看 CI 风险","expected_skill":"tilelang-review-skill"}
{"id":"github-01","prompt":"我准备把分支提 PR 到 npuir,给我完整 commit push PR 流程","expected_skill":"tilelang-github-operations"}
{"id":"github-02","prompt":"帮我同步 upstream/npuir,rebase 后再发起 PR","expected_skill":"tilelang-github-operations"}
63 changes: 63 additions & 0 deletions .agents/skills/tilelang-cube-skill/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: tilelang-cube-skill
description: TileLang npuir Cube 算子开发指南。用户提及 GEMM、matmul、batch gemm、L1/L0C、load_nd2nz、store_fixpipe、NZ 格式、Cube scope、矩阵分块与流水优化时必须使用本技能。
---

# TileLang Cube Skill

## Mandatory routing rule

Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)".

## Operator baseline rule (Mandatory)

- Before writing a new cube operator, first check examples/ and testing/npuir/.
- Prefer adapting an existing operator case rather than writing from scratch.

## Primary use cases

- matmul and batched matmul kernels
- cube-heavy stages in mixed kernels
- explicit L1 and L0C memory usage

## Core APIs

- T.alloc_shared (Developer mode)
- T.alloc_L1 (Expert mode only)
- T.alloc_L0C (Expert mode only)
- T.load_nd2nz (Expert mode only)
- T.gemm
- T.store_fixpipe (Expert mode only)

## Minimal flow

1. Partition blocks for M and N
2. Load global tiles with load_nd2nz in Expert mode or T.copy in Developer mode
3. Accumulate with T.gemm(initC controlled by k-loop)
4. Store outputs with store_fixpipe in Expert mode or T.copy in Developer mode

## NZ format rule

- NZ format path is Expert mode only.
- In Developer mode kernels, keep ND layout and use T.copy-based data movement.

## References

- references/api-cube.md
- references/examples-matmul.md
- references/nz-format.md

## Official docs to consult

- docs/Tilelang.language/内存操作/T.alloc_shared.md
- docs/Tilelang.language/线性代数操作/T.gemm.md
- docs/Tilelang.language/内存操作/T.load_nd2nz.md
- docs/Tilelang.language/内存操作/T.store_fixpipe.md
- docs/Tilelang.language/内存操作/T.alloc_L1.md
- docs/Tilelang.language/内存操作/T.alloc_L0C.md

## Related skills

- tilelang-vector-skill
- tilelang-mixcv-skill
- tilelang-debug-helper
31 changes: 31 additions & 0 deletions .agents/skills/tilelang-cube-skill/references/api-cube.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Cube API Guide

## Memory

- T.alloc_shared(shape, dtype) (Developer mode)
- T.alloc_L1(shape, dtype) (Expert mode only)
- T.alloc_L0C(shape, accum_dtype) (Expert mode only)

## Data movement

- T.copy(src, dst)
- T.load_nd2nz(src, dst, size) (Expert mode only)

## Compute

- T.gemm(A, B, C, initC=True or False, b_transpose=True or False, size=[M, K, N])

## Store path

- T.copy(C_buf, C_out)
- T.store_fixpipe(C_buf, C_out, size=[M, N], enable_nz2nd=True) (Expert mode only)

## Scope recommendation

Use explicit T.Scope("Cube") for cube sections in expert mode kernels.
Don't use explicit T.Scope for cube sections in developer mode kernels.

## Mode guidance

- Expert mode: ND -> NZ (load_nd2nz), cube compute, NZ -> ND (store_fixpipe).
- Developer mode: Keep ND tensors and use T.copy without explicit NZ conversion.
14 changes: 14 additions & 0 deletions .agents/skills/tilelang-cube-skill/references/examples-matmul.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Matmul Pattern

## Standard K-loop accumulation

- for each k tile
- load A and B into L1
- gemm into L0C with initC=(k==0)
- store once final tile is complete

## Practical notes

- Keep K tile size aligned with target constraints
- Validate transpose configuration for B path
- Validate numerical tolerance with torch reference
24 changes: 24 additions & 0 deletions .agents/skills/tilelang-cube-skill/references/nz-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# NZ Format Notes

NZ format path is intended for Expert mode kernels only.

## Why NZ path matters

Cube compute often benefits from ND to NZ layout conversion for compute-friendly access.

## Typical path

- load_nd2nz for input tiles (Expert mode)
- gemm in cube path
- store_fixpipe with enable_nz2nd=True for output conversion (Expert mode)

## Developer mode note

- Developer mode kernels should keep ND layout.
- Use T.copy/T.alloc_shared path and do not force NZ conversion.

## Validation checklist

- check tile size consistency across load, gemm, and store
- check transpose setting and layout assumptions
- compare outputs with reference implementation
37 changes: 37 additions & 0 deletions .agents/skills/tilelang-debug-helper/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
name: tilelang-debug-helper
description: TileLang npuir 调试辅助技能。用户提及调试 npuir kernel、GDB 附加、IR dump、精度异常定位、编译失败定位、pass 阶段定位、T.print 调试、最小复现缩减时必须使用本技能。
---

# TileLang Debug Helper (npuir)

## Mandatory routing rule

Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)".

## Debug workflow

1. Reproduce with minimal script
2. Add process attach window if native debug is needed
3. Capture IR snapshots around transformation boundaries
4. Narrow down failing pass or API misuse

## For API debugging

- First verify v-prefix API usage
- Then verify alias compatibility if legacy npuir_xxx appears

## References

- references/mlir-dump-guide.md

## Official docs to consult

- docs/Tilelang算子调试指南.md
- docs/Tilelang.language/调试操作/T.print.md
- docs/developer/EnvironmentVariables.md

## Related skills

- tilelang-mlir-skill
- tilelang-error-fixer
13 changes: 13 additions & 0 deletions .agents/skills/tilelang-debug-helper/references/mlir-dump-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# IR Dump Guide for npuir branch

## Strategy

- capture IR before and after major pass stages
- compare operation-level diffs
- correlate with failing runtime behavior

## Common checkpoints

- after lower entry
- after tilelangir pass application
- before backend codegen
62 changes: 62 additions & 0 deletions .agents/skills/tilelang-error-fixer/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
name: tilelang-error-fixer
description: TileLang npuir 错误诊断与修复技能。用户提及编译失败、运行错误、pass 异常、结果错误、性能回退、Core Dump、段错误、BishengIR 编译报错、sync 死锁、load/store 维度不一致时必须使用本技能。
---

# TileLang Error Fixer (npuir)

## Mandatory routing rule

Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)".

## Scope

- compile errors in npuir path
- runtime failures and invalid results
- pass pipeline divergence
- performance regressions

## Diagnosis workflow

1. Confirm environment and target setting
2. Reproduce with smallest kernel
3. Classify issue type: compile, runtime, pass, precision, performance
4. Capture evidence: logs, IR snapshot, failing stage
5. Propose minimal patch and validate

## NPUIR-specific checks

- verify default vector API style uses v-prefix ops
- verify alias callsites are semantically equivalent
- verify load_nd2nz and store_fixpipe size/layout consistency
- verify sync_block_set and sync_block_wait pairing

## Official docs to consult

- docs/Tilelang算子调试指南.md
- docs/开发指南.md
- docs/developer/EnvironmentVariables.md
- docs/Tilelang.language/内存操作/T.load_nd2nz.md
- docs/Tilelang.language/内存操作/T.store_fixpipe.md

## Output template

## TileLang JIT Issue Report

### Summary
- Symptom:
- Repro script:
- Impact:

### Root Cause
- Layer: frontend or pass or codegen or runtime
- Fault pattern:

### Fix
- Minimal change:
- Why this fixes it:

### Verification
- Repro after fix:
- Numerical check:
- Regression risk:
40 changes: 40 additions & 0 deletions .agents/skills/tilelang-github-operations/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: tilelang-github-operations
description: TileLang npuir 分支 GitHub 工作流技能。用户提及 commit、push、PR、rebase、upstream、issue、GitHub Actions、gh CLI、分支同步时必须使用本技能。默认遵循 npuir 分支协作规范并提示 Issue 标题使用 [AscendNPU-IR] 或 [npuir] 前缀。
---

# TileLang GitHub Operations Skill

## Mandatory routing rule

Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)".

## Scope

- branch sync and rebase workflow for npuir
- commit and push sequence
- pull request creation and readiness checks
- issue and PR metadata conventions

## Workflow baseline

1. Sync with upstream npuir
2. Run pre-PR format validation from repo root: bash format.sh --files changed_files
3. Commit focused changes
4. Push branch and create PR
5. Verify CI status and address feedback

## Docs to consult first

- docs/Tilelang-Ascend贡献指南.md
- docs/developer/EnvironmentVariables.md

## References

- references/pr-workflow.txt
- references/issue-template.txt

## Related skills

- tilelang-review-skill
- tilelang-error-fixer
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Issue template hints for npuir

Title:
- Prefix with [npuir] or [AscendNPU-IR]

Body:
- Environment and branch
- Minimal repro script
- Expected vs actual behavior
- Logs or IR snippet
- Impact scope and urgency
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
PR workflow for npuir

1) Branch sync
- git fetch upstream
- git checkout <feature-branch>
- git rebase upstream/npuir

2) Local validation
- run focused tests for changed area
- run formatting/lint checks if needed
- run mandatory format validation from repo root:
bash format.sh --files changed_files

3) Commit and push
- write clear commit message
- git push origin <feature-branch>

4) PR creation
- target branch: npuir
- PR title must start with [AscendNPU-IR]
- include repro, scope, and risk notes
- track CI and update quickly on failures
Loading
Loading