-
Notifications
You must be signed in to change notification settings - Fork 115
[skill] add skill-journal & tilelang-skill-review feedback loop #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hedi515
wants to merge
9
commits into
tile-ai:ascendc_pto
Choose a base branch
from
hedi515:ascendc_pto
base: ascendc_pto
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
cd8d3a8
[skill] add skill-journal & tilelang-skill-review feedback loop
hedi515 6413314
Merge branch 'tile-ai:ascendc_pto' into ascendc_pto
hedi515 560cbdc
Add op orchestrator and subagents for end-to-end op development
hedi515 5a5bfdf
Document state_transition as manual Read/Write actions in orchestrator
hedi515 a914680
Make Stage 4 perf tuning opt-in via user confirmation after precision…
hedi515 605b77f
Add README for agent system entry and orchestration details
hedi515 db49504
Consolidate kernel/golden/test into single example_{op}.py file
hedi515 eecf9ec
Merge Stage 3 precision fix into Stage 2 and respect user-provided te…
hedi515 db8e276
Add process-end reflection in orchestrator and clean stage-numbering …
hedi515 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| # Skill Journal | ||
|
|
||
| 存储每次算子开发过程中收集的 **skill 改进反馈**。 | ||
|
|
||
| 整体流程: | ||
|
|
||
| ``` | ||
| op-generate 完成后 ──► 写 journal/{op}-{timestamp}.md (status=pending 的 entries) | ||
| │ | ||
| ▼ | ||
| 开发者运行 /tilelang-skill-review | ||
| │ | ||
| ▼ | ||
| 聚合所有 pending entries ──► 输出表格 + reviews/review-{date}.md | ||
| │ | ||
| ▼ | ||
| 开发者命令行勾选: apply 1,3,5 / reject 2,4 | ||
| │ | ||
| ▼ | ||
| review skill 修改对应 SKILL.md,更新 entry.status | ||
| ``` | ||
|
|
||
| ## 目录结构 | ||
|
|
||
| ``` | ||
| .agents/skill-journal/ | ||
| ├── README.md # 本文件 | ||
| ├── {op}-{timestamp}.md # 每次算子开发产生一个 journal 文件 | ||
| └── reviews/ # 评审快照(自动创建) | ||
| └── review-{date}.md | ||
| ``` | ||
|
|
||
| ## Journal 文件格式 | ||
|
|
||
| 每个 journal 文件由 frontmatter + 多个 entry 组成。entry 状态在 entry 内部维护,**文件本身不移动**。 | ||
|
|
||
| ```markdown | ||
| --- | ||
| op: softmax | ||
| created: 2026-05-09T14:30:00 | ||
| skills_consulted: | ||
| - tilelang-op-design | ||
| - tilelang-op-generate | ||
| - tilelang-custom-skill/tilelang-api-best-practices | ||
| - tilelang-custom-skill/tilelang-expert-to-developer | ||
| --- | ||
|
|
||
| # Skill Feedback - softmax | ||
|
|
||
| ## Entry e1 | ||
| - **target_skill**: tilelang-op-design | ||
| - **target_section**: §2.5.1 已知限制 | ||
| - **type**: missing_constraint | ||
| - **severity**: high | ||
| - **status**: pending | ||
|
|
||
| **Observation**: | ||
| 设计时没说 T.alloc_ub 的总大小上限,跑 example 才发现 UB 只有 192KB。 | ||
|
|
||
| **Evidence**: | ||
| 报错 `Memory allocation failed required: 245760`。把 block_M 从 128 砍到 64 后通过。 | ||
|
|
||
| **Proposed change**: | ||
| 在 §2.5.1 表格新增一行 "UB 容量限制 192KB / 单 block buffer 总和不可超" 并给出修正方法。 | ||
|
|
||
| --- | ||
|
|
||
| ## Entry e2 | ||
| - **target_skill**: tilelang-custom-skill/tilelang-api-best-practices | ||
| - **target_section**: T.tile.broadcast 用法 | ||
| - **type**: outdated_example | ||
| - **severity**: medium | ||
| - **status**: pending | ||
|
|
||
| ... | ||
| ``` | ||
|
|
||
| ### 开发者手填的 manual 文件(`manual-{YYYYMMDD}.md`) | ||
|
|
||
| 由 `/tilelang-skill-review add` 写入,同一天追加到同一文件,frontmatter 简化: | ||
|
|
||
| ```markdown | ||
| --- | ||
| created: 2026-05-11T16:23:00 | ||
| source: developer | ||
| --- | ||
|
|
||
| # Manual Skill Feedback - 2026-05-11 | ||
|
|
||
| ## Entry e1 | ||
| - **target_skill**: tilelang-custom-skill/tilelang-api-best-practices | ||
| - **target_section**: T.gemm_v0 用法示例 | ||
| - **type**: wrong_api_signature | ||
| - **severity**: high | ||
| - **status**: pending | ||
| - **source**: developer | ||
|
|
||
| **Observation**: ... | ||
| **Evidence**: ... | ||
| **Proposed change**: ... | ||
|
|
||
| --- | ||
| ``` | ||
|
|
||
| ## 字段说明 | ||
|
|
||
| ### Frontmatter | ||
|
|
||
| | 字段 | 说明 | | ||
| |------|------| | ||
| | `op` | 算子名(小写下划线,如 `softmax`、`flash_attention`)| | ||
| | `created` | ISO8601 时间戳 | | ||
| | `skills_consulted` | 本次开发**实际查阅过**的所有 skill 路径列表,相对 `.agents/skills/` | | ||
|
|
||
| ### Entry 字段 | ||
|
|
||
| | 字段 | 说明 | | ||
| |------|------| | ||
| | `target_skill` | 目标 skill 路径(相对 `.agents/skills/`),可指向任意 skill | | ||
| | `target_section` | 目标章节(用 `§N.N` 或具体小节标题)| | ||
| | `type` | 见下方"类型词表" | | ||
| | `severity` | `high` / `medium` / `low` | | ||
| | `status` | `pending` / `applied` / `rejected`,由 review skill 维护 | | ||
| | `source` | `agent` / `developer`,缺省 `agent`。`developer` 表示由 `/tilelang-skill-review add` 添加的人工反馈 | | ||
| | `observation` | 一句话描述发现的问题 | | ||
| | `evidence` | 报错信息 / 实际代码 / 调试过程的具体证据 | | ||
| | `proposed_change` | 具体的改动提案(一两句话),让 reviewer 能直接判断是否值得改 | | ||
|
|
||
| ### 类型词表 | ||
|
|
||
| | type | 含义 | 例子 | | ||
| |------|------|------| | ||
| | `missing_constraint` | skill 没讲到的硬约束 | UB 容量、对齐要求、不支持的形参组合 | | ||
| | `wrong_api_signature` | API 签名/参数描述与实际不符 | `T.gemm_v0` 参数顺序错 | | ||
| | `outdated_example` | 示例代码已经跑不通或不是最佳写法 | broadcast 索引示例 shape 写错 | | ||
| | `missing_api_doc` | 完全没提到的 API | `T.tile.exp` 没收录 | | ||
| | `unclear_workflow` | 工作流步骤模糊或漏检查 | 没说"先搜 examples/" 的强制顺序 | | ||
| | `mode_misjudgment` | 编程模式选型描述误导 | 把混合算子说成可用 Developer 单模式 | | ||
| | `pass_config_gap` | pass_configs 配置说明不全 | 没提 `AUTO_CV_SYNC` 必须配 `AUTO_CV_COMBINE` | | ||
| | `other` | 不属于以上 | | | ||
|
|
||
| ### 严重度判定 | ||
|
|
||
| | severity | 标准 | | ||
| |----------|------| | ||
| | `high` | 不改会导致后续算子开发踩同样的坑,或编译/运行失败 | | ||
| | `medium` | 不改会让生成的代码不是最佳实践 | | ||
| | `low` | 措辞优化、补充示例 | | ||
|
|
||
| ## 命名规范 | ||
|
|
||
| journal 文件按来源区分命名模式: | ||
|
|
||
| | 模式 | 来源 | 说明 | | ||
| |------|------|------| | ||
| | `{op}-{YYYYMMDD-HHMMSS}.md` | agent(op-generate §6 自动反思) | 每次算子开发一个文件,时间戳精确到秒避免冲突,frontmatter 含 `op` 和 `skills_consulted` | | ||
| | `manual-{YYYYMMDD}.md` | developer(`/tilelang-skill-review add`) | **同一天追加**到同一个文件,frontmatter 含 `source: developer`,无 `op` / `skills_consulted` 字段 | | ||
|
|
||
| 时间戳用本地时间,与 frontmatter 的 ISO8601 时间一致即可。 | ||
|
|
||
| ## 注意事项 | ||
|
|
||
| - **不要在 journal 里写解决方案的完整代码**:journal 只记录"skill 哪里需要改",具体修改文本由 review skill 在 apply 阶段产出 | ||
| - **同一问题不要重复写**:写之前 grep 一下现有 journal,避免 e1 和 e2 是同一件事 | ||
| - **拒绝的 entry 也保留**:`status=rejected` 的 entry 留着,下次同主题再出现时频次会累计,便于发现"反复被拒但反复出现"的争议项 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -162,70 +162,10 @@ python examples/{op}/example_{op}.py | |
| 2. **运行错误** → 检查索引越界、同步缺失 | ||
| 3. **精度错误** → 检查计算公式、数据类型、容差设置 | ||
|
|
||
| ### 步骤 5:校验原有实现正确性 | ||
|
|
||
| **生成代码前必须先用默认参数跑通原有实现**,确认 baseline 正确后再扩展新功能/测试。 | ||
|
|
||
| ```bash | ||
| python examples/{op}/example_{op}.py # 确认默认参数通过 | ||
| ``` | ||
|
|
||
| ### 步骤 6:设计测试用例的覆盖原则 | ||
|
|
||
| 测试用例必须覆盖以下 4 类场景: | ||
|
|
||
| | 类别 | 场景 | 说明 | | ||
| |------|------|------| | ||
| | 完美对齐 | M/N/K 均为 block 大小整数倍 | 验证零 padding 路径 | | ||
| | 单维 padding | 仅 M 或 N 或 K 不足 block 大小时 | 验证单边 padding+裁剪 | | ||
| | 全维 padding | M/N/K 同时需要 padding | 验证组合 padding | | ||
| | 多 block | 维度数倍于 block 大小 | 验证多 block 并行正确性 | | ||
|
|
||
| ### 步骤 7:函数解耦全局变量 | ||
|
|
||
| 为实现多场景顺序测试,算子函数应**从 tensor shape 自推导所有维度参数**,而非依赖模块级全局变量: | ||
|
|
||
| ```python | ||
| # ✅ 推荐:从 tensor 自推导 | ||
| def conv_im2col_gemm(input_tensor, kernel, stride=1, padding=0): | ||
| B, C, H, W = input_tensor.shape | ||
| OC, C_k, KH, KW = kernel.shape | ||
|
|
||
| # ❌ 避免:依赖全局变量 | ||
| def conv_im2col_gemm(...): | ||
| C = globals()['C'] # 多测试场景会互相污染 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 4. 关键编码规范 | ||
|
Comment on lines
165
to
167
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ### GEMM 算子:非整除维度处理 | ||
|
|
||
| GEMM kernel 内部使用 `M // block_M` 和 `N // block_N`,要求 M、N 为 block 大小整数倍。非整除时需在调用的 Python 层 zero-padding 后裁剪: | ||
|
|
||
| ```python | ||
| # padding | ||
| M_pad = ((M + block_M - 1) // block_M) * block_M | ||
| N_pad = ((N + block_N - 1) // block_N) * block_N | ||
| K_pad = ((K + block_K - 1) // block_K) * block_K | ||
|
|
||
| if M_pad > M or K_pad > K: | ||
| kernel_padded = torch.zeros(M_pad, K_pad, ...) | ||
| kernel_padded[:M, :K] = kernel_flat | ||
|
|
||
| # GEMM 后裁剪 | ||
| output = output[:M, :N] | ||
| ``` | ||
|
|
||
| **关键约束**: 不 padding 时 `M // block_M = 0`(当 M < block_M)会导致零 block 启动(输出全零)或除零编译崩溃。 | ||
|
|
||
| ### Autotune 算子: supply_prog 与 get_configs 接口约定 | ||
|
|
||
| - **`supply_prog(params)`**: `params` 仅含输入 tensor 描述符(不含输出 param)。从 `params[0].shape` / `params[1].shape` 提取维度,不可访问 `params[2]`。 | ||
| - **`get_configs` 作为 callable**: autotuner 调用形式为 `get_configs(key_args_tuple, key_kwargs_tuple)`,须签名为 `get_configs(key_args, _key_kwargs=None)`,从 `key_args` 提取 M/N/K。 | ||
| - **config 过滤**: 必须在 `get_configs` 中过滤 `block > dimension` 的无效组合(避免除零编译错误),及 `block_M * block_N * sizeof(accum) > L0C_capacity` 的组合(避免 L0C 溢出 segfault)。 | ||
|
|
||
| ### Buffer 分配 | ||
|
|
||
| ```python | ||
|
|
@@ -313,3 +253,103 @@ torch.testing.assert_close(output.cpu(), ref_output.cpu(), rtol=rtol, atol=atol) | |
| | workspace shape 不匹配 | 检查 block_num 计算是否正确 | | ||
| | 核分离方式错误 | Developer + 自动同步模式应无显式 T.Scope("C"/"V") | | ||
| | 精度误差超过 1% | 优先检查内存层级 API 选择和 pass_configs 配置 | | ||
|
|
||
| --- | ||
|
|
||
| ## 6. Skill 反馈采集 | ||
|
|
||
| > **本节的触发权归属取决于调用模式:** | ||
| > | ||
| > | 调用模式 | 由谁负责 skill 反馈采集 | | ||
| > |---------|----------------------| | ||
| > | 通过 `tilelang-op-orchestrator` 编排(推荐) | **由 orchestrator 在流程结束(SUCCESS / BLOCKED_*)时统一执行**,本 skill 不主动触发。详见 [.opencode/agents/tilelang-op-orchestrator.md §流程结束反思采集](../../../.opencode/agents/tilelang-op-orchestrator.md) | | ||
| > | 单独调用本 skill(`/tilelang-op-generate`,跳过编排) | **由调用者在算子调试通过后手动触发**,按下文 §6.1-6.6 流程执行 | | ||
| > | ||
| > 为什么分开:orchestrator 模式下本 skill 在 Subagent 隔离上下文中被多次调度,单次调度结束 ≠ "全流程结束"。让本 skill 自己触发反思会导致 ① 每次调用都做一次 → 浪费;② 看不到其他 Subagent 用过什么 skill → 反思不全。因此 orchestrator 模式下交给 orchestrator 在全流程视野下统一采集。 | ||
| > | ||
| > **若你(developer subagent)在 orchestrator 模式下被调度本 skill,直接跳过本节即可**——orchestrator 会在最终阶段做反思。但你**仍然应该在 `debug_log.md` 里如实记录本次调度的 changes / error_summary / next_hint**,这是 orchestrator 反思的核心数据源。 | ||
|
|
||
| 本节(以下 §6.1-6.6)是 **skill 自适应更新机制**的采集端,**仅在单独调用模式下适用**。每次算子开发流程跑完后,必须把"哪些 skill 没讲清楚 / 被现实打脸 / 凭经验补的内容"写到 `.agents/skill-journal/`,由 `/tilelang-skill-review` 后续聚合评审。 | ||
|
|
||
| **注意**:本节覆盖**整个开发链路**用到的所有 skill,不只是 op-design / op-generate。 | ||
|
|
||
| ### 6.1 触发时机 | ||
|
|
||
| 满足以下任一条件后立即执行: | ||
| - 算子代码已生成且至少跑通过一次(即使精度不达标但能编译) | ||
| - 用户明确表示"本次开发结束"或"暂时到这" | ||
| - 调试中卡了很久(即使没跑通也要把过程中的发现写下来,type 标 `unclear_workflow`) | ||
|
|
||
| ### 6.2 步骤 1:枚举本次查阅过的所有 skill | ||
|
|
||
| 回顾整个开发会话,列出**实际打开 / 引用 / 跳转过**的所有 skill 路径(相对 `.agents/skills/`),不只是 op-design 和 op-generate。常见包含: | ||
|
|
||
| | skill | 何时会被查阅 | | ||
| |-------|-------------| | ||
| | `tilelang-op-design` | 设计阶段全程 | | ||
| | `tilelang-op-generate` | 生成阶段全程(即本 skill 自身)| | ||
| | `tilelang-custom-skill/tilelang-api-best-practices` | 查 API 用法 / 参数 | | ||
| | `tilelang-custom-skill/tilelang-expert-to-developer` | 决定模式 / pass_configs | | ||
| | `tilelang-custom-skill/tilelang-debug-helper` | 调试报错 | | ||
| | `tilelang-custom-skill/tilelang-error-fixer` | 修编译/运行错误 | | ||
| | `tilelang-ascend-tile-api` | 查 T.tile.* 系列 | | ||
| | 其它 | 任何被 grep / read 过的 SKILL.md | | ||
|
|
||
| **规则**:宁可多列,不可漏列。漏列会导致那个 skill 的反馈永远收不上来。 | ||
|
|
||
| ### 6.3 步骤 2:针对每个 skill 反思(逐个过) | ||
|
|
||
| 对**每一个**在步骤 1 列出的 skill,按以下四问逐项检查: | ||
|
|
||
| 1. 该 skill 讲清楚的事项里,**有哪些被现实打脸**?(如说"支持 X"实际不支持) | ||
| 2. 我**凭经验补了**它没讲的什么内容?(如自己加了个对齐处理) | ||
| 3. 它的**示例 / API 描述是否过时**?(如示例 shape 写错、API 签名变了) | ||
| 4. 它的**工作流步骤是否漏了关键检查**?(如没说"先 grep examples/") | ||
|
|
||
| 每个 yes 的发现 = 一条 entry。**没有发现也要记录**(写空 entries),便于统计 skill 的"完美命中率"。 | ||
|
|
||
| ### 6.4 步骤 3:写 journal 文件 | ||
|
|
||
| 按 `.agents/skill-journal/README.md` 的 schema,写到: | ||
|
|
||
| ``` | ||
| .agents/skill-journal/{op}-{YYYYMMDD-HHMMSS}.md | ||
| ``` | ||
|
|
||
| frontmatter 的 `skills_consulted` 字段必须包含步骤 1 的完整列表。 | ||
|
|
||
| 每条 entry 包含 `target_skill / target_section / type / severity / status:pending / observation / evidence / proposed_change`,字段含义见 README。 | ||
|
|
||
| **禁止**: | ||
| - ❌ 把 `target_skill` 全部填成 op-generate(懒得分类的常见错误) | ||
| - ❌ 在 journal 里直接写完整修订后的 SKILL.md 段落(review skill 在 apply 阶段才生成具体修改文本) | ||
| - ❌ 漏写 evidence(无证据的提案会被 review 阶段直接拒) | ||
|
|
||
| ### 6.5 自检 | ||
|
|
||
| 写完 journal 后逐项检查: | ||
|
|
||
| | # | 检查项 | 必须通过 | | ||
| |---|--------|---------| | ||
| | 1 | `skills_consulted` 包含本次查阅的所有 skill | ✅ | | ||
| | 2 | 至少 50% 的 `skills_consulted` 在 entries 中至少出现一次(避免只反思 op-generate 自己)| ✅ | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| | 3 | 每条 entry 的 `evidence` 都有具体报错/代码/文件引用 | ✅ | | ||
| | 4 | 没有重复 entry(同 `target_skill + target_section + type` 只出现一次) | ✅ | | ||
| | 5 | `severity=high` 的 entry 都附带了具体踩坑过程 | ⭕ | | ||
|
|
||
| ### 6.6 完成报告 | ||
|
|
||
| 写完 journal 后输出: | ||
|
|
||
| ``` | ||
| ## Skill 反馈采集报告 | ||
|
|
||
| - Journal 文件: .agents/skill-journal/{op}-{timestamp}.md | ||
| - 查阅的 skill 数量: N | ||
| - 写入 entries 数量: M | ||
| - 按 skill 分布: | ||
| - tilelang-op-design: 3 | ||
| - tilelang-custom-skill/tilelang-api-best-practices: 2 | ||
| - ... | ||
| - 提示: 运行 /tilelang-skill-review 进入评审流程 | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议增加一个用于记录“查阅后确认无误”的类型,以支持
op-generate中的反馈覆盖率自检要求,避免在没有问题时被迫填写错误分类。