Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .agents/evals/minimal_skill_trigger_eval.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@
{"id":"review-02","prompt":"提交前帮我跑一遍格式和 lint 检查,看看 CI 风险","expected_skill":"tilelang-review-skill"}
{"id":"github-01","prompt":"我准备把分支提 PR 到 npuir,给我完整 commit push PR 流程","expected_skill":"tilelang-github-operations"}
{"id":"github-02","prompt":"帮我同步 upstream/npuir,rebase 后再发起 PR","expected_skill":"tilelang-github-operations"}
{"id":"precision-01","prompt":"算子输出有 nan,并且和 Torch 的结果有细微差异,如何看精度报告","expected_skill":"tilelang-precision-debug-skill"}
{"id":"precision-02","prompt":"请用 assert_close 给跑一下测试,输出 ascii diff map 帮我看看误差分布","expected_skill":"tilelang-precision-debug-skill"}
1 change: 1 addition & 0 deletions .agents/skills/tilelang-debug-helper/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,4 @@ Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)"

- tilelang-mlir-skill
- tilelang-error-fixer
- tilelang-precision-debug-skill
56 changes: 56 additions & 0 deletions .agents/skills/tilelang-precision-debug-skill/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: tilelang-precision-debug-skill
description: TileLang 算子精度比对与误差分析技能。当用户提及“精度错误”、“结果不一致”、“比对失败”、“ASCII diff map”或需要深入定位数值差异时必须使用本技能。
---

# TileLang Precision Debug Skill

## Mandatory routing rule

Before answering, follow AGENTS.md section "Docs Auto Routing Rules (Mandatory)".

## Trigger Guidance

- When users face accuracy issues, mismatch with reference implementation (e.g., PyTorch), or precision errors.
- When users mention `assert_close` failure, Top-10 errors, relative error too large, or need to see the ASCII diff map.
- When a `.precision_debug` directory is mentioned.

## Instructions SOP

1. **Replace Comparison API:**
Ensure `assert_close` is imported from `testcommon` (standard in `testing/npuir`) or `tilelang.utils.prec_assert_close`.

2. **Activate Debug Reporting:**
The detailed reporting is **disabled by default** to avoid clutter. You can activate it in two ways:
- **Surgical (recommended):** Add `@pytest.mark.precision_debug` to the test function.
```python
import pytest
from testcommon import assert_close

@pytest.mark.precision_debug
def test_my_op():
...
assert_close(actual, expected, dtype=dtype)
```
- **Global:** Set the environment variable `TL_PREC_DEBUG=1` before running `pytest`.
```bash
TL_PREC_DEBUG=1 pytest testing/npuir/test_xxx.py
```

2. **Run and Collect:**
Execute the test script. If there is a mismatch, the tool will automatically generate a `.precision_debug/<test_name>_<timestamp>/` directory containing `report.txt`, `diff_map.txt`, and the serialized tensors.

3. **Analyze the Precision Debug Report:**
- **`report.txt`:** Check the mismatch ratio, `Max abs/rel diff`, and the `Top-10 largest differences`. Determine if the error is widespread (logic error) or isolated (boundary, overflow).
- **`diff_map.txt` (ASCII Map):** Identify spatial distribution patterns:
- **Blocky distribution:** Incorrect tiling or block_M/N parameters.
- **Periodic stripes:** Incorrect memory layout, strides, or vectorization broadcast issues.
- **Edge/Boundary spikes:** Padding, masking, or loop boundary conditions not handled correctly.

4. **Root Cause Localization:**
Use the pattern analysis to trace back to Load (memory alignment), Compute (accumulation precision, type casting), or Store (write-back boundary) stages in the TVM IR or MLIR.

## Related skills

- tilelang-debug-helper
- tilelang-error-fixer
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,9 @@ tilelang/jit/adapter/cython/.cycache

# unit test output files
unittest/npuir/output/

# precision debug tool output
.precision_debug/

# agents
.worktree/
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Use the matching skill whenever the user asks for:
- compile/runtime error analysis on npuir branch
- code review, lint/format checks, PR readiness
- commit/push/rebase/upstream sync, PR or issue workflow
- accuracy issues, assert_close failures, ASCII diff map, or precision errors

Developer-mode MixCV trigger rule:
- If one kernel contains Cube-side T.gemm and Vector-side at least one v-prefix op (such as T.vadd/T.vmul/T.vexp/T.vcast/T.vbrc), treat it as MixCV and use tilelang-mixcv-skill.
Expand Down
28 changes: 24 additions & 4 deletions docs/Tilelang算子调试指南.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,29 @@

| 问题类型 | 具体表现 | 推荐调试方法 |
| ------------ | ------------ |------------ |
| **1.精度问题** | 算子成功编译并生成.o文件,但NPU运行结果和标杆参考(如torch)存在差异 | `T.print`打印调试 |
| **2.编译失败**| 算子编译失败,未生成预期的TVM IR和MLIR | 编译调试 |
| **3.运行时失败** | 算子编译成功,但未生成.o文件,进程终止 | 运行时调试 |
| **1.精度问题** | 算子成功编译并生成.o文件,但NPU运行结果和标杆参考(如torch)存在差异 | `T.print`打印调试 |
| **2.编译失败**| 算子编译失败,未生成预期的TVM IR和MLIR | 编译调试 |
| **3.运行时失败** | 算子编译成功,但未生成.o文件,进程终止 | 运行时调试 |

### 1.2 精度问题快速定位(precision debug)

对于 `assert_close` 失败、结果不一致或需要观察误差分布的场景,推荐优先使用
`testing/npuir/testcommon.py` 中的 `assert_close` 包装器,或直接使用
`tilelang.utils.prec_assert_close`。

开启方式有两种:

- 在单个 pytest 用例上增加 `@pytest.mark.precision_debug`
- 运行测试前设置环境变量 `TL_PREC_DEBUG=1`

当比对失败时,工具会在 `.precision_debug/<test_name>_<timestamp>/` 下生成:

- `report.txt`:误差统计、Top-10 差异点、误差分布
- `diff_map.txt`:ASCII diff map,便于观察块状、条纹、边界等模式
- `actual.pt` / `expected.pt`:失败时的张量快照

建议优先在最小复现用例上使用该工具,先判断误差是全局扩散、局部边界异常,还是
特定布局/广播模式下的周期性偏差,再回到 TVM IR 或 MLIR 阶段做进一步定位。

# 2 Tilelang-AscendNPUIR 编译流程概览

Expand Down Expand Up @@ -408,4 +428,4 @@ _compile_option_list = [
- **运行时失败**:xx。
- **性能优化**:通过控制jit_npu中的编译选项来配置流水优化等功能。

掌握这些调试技巧后,您就可以在TileLang中高效地调试NPU算子,充分利用昇腾硬件的计算能力。
掌握这些调试技巧后,您就可以在TileLang中高效地调试NPU算子,充分利用昇腾硬件的计算能力。
46 changes: 42 additions & 4 deletions testing/npuir/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,27 @@ single source of truth.

## Design

The test model is split into three layers:
The test model is split into four layers:

- Marker metadata: `op` and `mode`
- Optional debug marker: `precision_debug`
- Runtime controls: `--op`, `--mode`, `--npu-device`
- Test matrix: `dtype`, shapes, and other case dimensions via `@pytest.mark.parametrize(...)`

That split is intentional:

- `op` and `mode` are the only custom markers. They define what a test is.
- `op` and `mode` define what a test is.
- `precision_debug` is an opt-in debug switch for mismatch reporting.
- CLI options only choose which tests to run and which NPU device to use.
- Data-oriented coverage stays inside the test matrix instead of growing more CLI flags.

## Marker Rules

Only two custom markers are valid:
Three custom markers are valid:

- `@pytest.mark.op("<real-op>")`
- `@pytest.mark.mode("<mode>")`
- `@pytest.mark.precision_debug`

Use file-level `pytestmark` when a whole file shares the same metadata:

Expand Down Expand Up @@ -55,6 +58,9 @@ def test_copy_release():
The closest marker wins. In practice that means a test-level marker overrides a
file-level marker of the same kind.

Use `precision_debug` only on focused repro tests. It is not part of the test
identity and should not be used as a broad file-level marker.

## Runtime Rules

`--npu-device` is the only supported device selector in this pytest layer. The
Expand All @@ -69,6 +75,10 @@ warning so the remapping is visible in the test output.
`with ascend_mode(...)`. The pytest runtime reads the closest `mode` marker and
applies `ascend_mode(mode)` automatically around the test body.

`precision_debug` is also marker-driven. When present, pytest sets
`TL_PREC_DEBUG=1` around that test so `testcommon.assert_close(...)` emits a
report directory under `.precision_debug/` on mismatch.

## Test Matrix Rules

Use `@pytest.mark.parametrize(...)` for case dimensions such as:
Expand Down Expand Up @@ -104,12 +114,13 @@ def test_copy_shape(M, N, block_M, block_N, in_dtype, out_dtype):

## Contributor Rules

- Do not add custom markers beyond `op` and `mode`.
- Do not add custom markers beyond `op`, `mode`, and `precision_debug`.
- Do not add `@pytest.mark.dtype(...)`.
- Do not add folder-category markers such as `@pytest.mark.memory`.
- Do not call `torch.npu.set_device(...)` inside tests.
- Prefer file-level `pytestmark` for shared `op` / `mode`.
- Use test-level markers only when a file intentionally needs overrides.
- Use `precision_debug` sparingly for mismatch-focused repros.
- Keep compile and execution work inside test functions, not at module import time.

The directory name remains the category signal for humans. For example,
Expand All @@ -136,6 +147,9 @@ pytest testing/npuir --mode=Developer

# combined selection
pytest testing/npuir --op=copy --mode=Developer --npu-device=0

# focused precision report for one repro
TL_PREC_DEBUG=1 pytest testing/npuir/memory_ops/test_copy_shape_dev.py
```

`--op` and `--mode` accept comma-separated values.
Expand All @@ -146,6 +160,7 @@ pytest testing/npuir --op=copy --mode=Developer --npu-device=0
- `--mode` matches the closest `@pytest.mark.mode(...)`
- `--npu-device` sets the session device before tests execute
- out-of-range `--npu-device` values are remapped with modulo and reported as warnings
- `@pytest.mark.precision_debug` enables detailed mismatch reports for that test

Tests without a matching marker are excluded when that selector is provided.

Expand All @@ -169,3 +184,26 @@ CASES = [
def test_copy_shape_dev(M, N, block_M, block_N, dtype):
...
```

## Precision Debug

Use `from testcommon import assert_close` for NPUIR tests. On mismatch, the
precision debug tool can emit:

- `report.txt`
- `diff_map.txt`
- serialized `actual.pt` and `expected.pt`

Enable it in one of two ways:

```python
import pytest

@pytest.mark.precision_debug
def test_copy_debug():
...
```

```bash
TL_PREC_DEBUG=1 pytest testing/npuir/test_xxx.py
```
16 changes: 16 additions & 0 deletions testing/npuir/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ def _get_npu_device_id(config: pytest.Config) -> tuple[int, Optional[str]]:
return resolve_npu_device_id(config.getoption("--npu-device"))


def pytest_configure(config):
config.addinivalue_line(
"markers",
"precision_debug: enable tilelang advanced precision debugging for this test",
)


def pytest_addoption(parser):
parser.addoption(
"--op",
Expand Down Expand Up @@ -116,3 +123,12 @@ def _apply_mode_marker(request: pytest.FixtureRequest):

with ascend_mode(mode):
yield


@pytest.fixture(autouse=True)
def _apply_precision_debug_marker(
request: pytest.FixtureRequest, monkeypatch: pytest.MonkeyPatch
):
if request.node.get_closest_marker("precision_debug") is not None:
monkeypatch.setenv("TL_PREC_DEBUG", "1")
yield
1 change: 1 addition & 0 deletions testing/npuir/pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ addopts = -ra --strict-markers
markers =
op(name): Real operation identity, used by --op.
mode(name): Runtime Ascend mode, used by --mode and applied automatically during test execution.
precision_debug: Enable detailed precision mismatch reporting for a focused test.
79 changes: 79 additions & 0 deletions testing/npuir/test_precision_debug_smoke.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import pathlib

import pytest
import torch

from testcommon import assert_close
from tilelang.utils import prec_assert_close


pytestmark = [
pytest.mark.op("precision_debug"),
pytest.mark.mode("Developer"),
]


def _single_run_dir(root: pathlib.Path) -> pathlib.Path:
runs = [path for path in root.iterdir() if path.is_dir()]
assert len(runs) == 1
return runs[0]


def test_prec_assert_close_exported_from_tilelang_utils():
from tilelang.utils.precision_debug import prec_assert_close as direct_import

assert prec_assert_close is direct_import


def test_assert_close_without_debug_does_not_write_reports(
tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.chdir(tmp_path)
actual = torch.tensor([0.0, 1.0], dtype=torch.float32)
expected = torch.tensor([0.0, 0.0], dtype=torch.float32)

with pytest.raises(AssertionError):
assert_close(actual, expected, dtype="float32")

assert not (tmp_path / ".precision_debug").exists()


@pytest.mark.precision_debug
def test_precision_debug_marker_enables_report_output(
tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.chdir(tmp_path)
actual = torch.tensor([0.0, 1.0], dtype=torch.float32)
expected = torch.tensor([0.0, 0.0], dtype=torch.float32)

with pytest.raises(AssertionError):
assert_close(actual, expected, dtype="float32")

run_dir = _single_run_dir(tmp_path / ".precision_debug")
assert (run_dir / "report.txt").is_file()
assert (run_dir / "diff_map.txt").is_file()
assert (run_dir / "actual.pt").is_file()
assert (run_dir / "expected.pt").is_file()


def test_prec_assert_close_respects_equal_nan_in_report(
tmp_path: pathlib.Path,
):
output_dir = tmp_path / "precision_debug_manual"
actual = torch.tensor([float("nan")], dtype=torch.float32)
expected = torch.tensor([float("nan")], dtype=torch.float32)

with pytest.raises(AssertionError):
prec_assert_close(
actual,
expected,
output_dir=str(output_dir),
save_tensors=False,
print_map=False,
equal_nan=False,
)

report_text = (_single_run_dir(output_dir) / "report.txt").read_text(
encoding="utf-8"
)
assert "Mismatched: 1 (100.0000%)" in report_text
Loading
Loading