diff --git a/docs/en/dev/passes/26-materialize_tensor_strides.md b/docs/en/dev/passes/26-materialize_tensor_strides.md index f963f124a..850764171 100644 --- a/docs/en/dev/passes/26-materialize_tensor_strides.md +++ b/docs/en/dev/passes/26-materialize_tensor_strides.md @@ -19,7 +19,7 @@ Codegen needs one machine-readable contract, so `MaterializeTensorStrides` walks **Produces**: -- `TensorViewCanonical` — `PassPipeline` auto-verifies after the pass (using the registry's weak-mode verifier) +- `TensorViewCanonical` — `PassPipeline` auto-verifies after the pass using the registry's **strict-mode** verifier (empty stride on a present `TensorView` is rejected — that is the state this pass is responsible for eliminating) **Position in the default pipeline** (active since RFC #1300 P6): between [`CanonicalizeIOOrder`](25-canonicalize_io_order.md) and [`InitMemRef`](27-init_memref.md). This is the codegen-prep boundary — every layout-mutating pass (`LowerTransposeLoadParamLayout`, `ResolveBackendOpLayouts`, `ExpandMixedKernel`, `SplitVectorKernel`) has finished, and `InitMemRef` is the first consumer that needs explicit stride. @@ -106,7 +106,7 @@ See `BuildLogicalStridesFromLayout` in [`tensor_view_semantics.h`](../../../../i ## Verifier interaction -Because the pass declares `produced = {... ∪ TensorViewCanonical}`, `PassPipeline` automatically runs the registry's `TensorViewCanonical` verifier after the pass, surfacing invalid IR (e.g. NZ-on-`TensorType`) immediately as `pypto::ValueError`. The registry default is the **weak-mode** verifier (which accepts `stride.empty()` as implicitly packed canonical); the **strict-mode** verifier — which requires materialization — is reachable directly via `passes.verify_tensor_view_canonical(program, require_materialized=True)` and is the codegen-entry contract that P6/P7 will enforce. +Because the pass declares `produced = {... ∪ TensorViewCanonical}`, `PassPipeline` automatically runs the registry's `TensorViewCanonical` verifier after the pass. The registry default is the **strict-mode** verifier (RFC #1300 §2.4 codegen-entry contract): it rejects `view.has_value() && stride.empty()` since this pass is responsible for materializing those slots. Bare `TensorType` (`!view.has_value()`) is still accepted — implicit ND-packed is canonical by construction. The same verifier is callable directly via `passes.verify_tensor_view_canonical(program, require_materialized=True)`; pass `require_materialized=False` for the weak mode used during the parse-time / early-pass window before materialization runs. ## Related diff --git a/docs/en/user/01-language_guide.md b/docs/en/user/01-language_guide.md index 642130da5..7fdc02ea1 100644 --- a/docs/en/user/01-language_guide.md +++ b/docs/en/user/01-language_guide.md @@ -48,19 +48,38 @@ idx: pl.Scalar[pl.INDEX] # index scalar ### Tensor Layouts -Layouts control the physical memory arrangement of Tensors: +Write your `pl.Tensor[...]` annotations using the **runtime row-major +shape** without a layout marker. Layout is an IR-internal concern that +passes derive from the ops actually producing/consuming views; you do +not need to express it in the type annotation. -| Layout | Description | -| ------ | ----------- | -| `pl.ND` | N-Dimensional (default, row-major) | -| `pl.DN` | DN layout | -| `pl.NZ` | NZ fractal format (hardware-specific tiling) | +```python +# ✅ Recommended — source tensor shape, no layout marker: +b: pl.Tensor[[N, K], pl.FP32] +``` ```python -# Specify layout as third type parameter -a: pl.Tensor[[64, 128], pl.FP16, pl.NZ] +# ⚠️ Deprecated (RFC #1300 supplementary 1): +b: pl.Tensor[[K, N], pl.FP32, pl.DN] # → DeprecationWarning at parse time ``` +> **Why `pl.Tensor[..., pl.DN]` is deprecated.** Writing the DN +> layout-only shorthand forces you to mentally hold two coordinate systems +> at once (the IR-logical post-view shape and the runtime row-major shape). +> Drop the layout marker and write the runtime shape — for matmul B^T, +> use `pl.load(..., transpose=True)` on the row-major tensor (see "Data +> Movement" below); for slicing a DN-producing op, the slice inherits +> the parent's layout automatically. + +For NZ (hardware-specific tile layout), use `pl.Tile[..., pl.NZ]` — NZ is +tile-only, never a TensorType annotation. The `pl.NZ` constant remains +available for tile annotations and IR-internal use. + +If you need to write a DN tensor at the IR level (e.g. when constructing +fixtures or round-tripping printed IR), prefer +`pl.TensorView(stride=[...], layout=pl.TensorLayout.DN)` which forces +explicit stride and avoids the implicit coordinate-flip hazard. + ### Dynamic Shapes Use `pl.dynamic()` for dimensions determined at runtime: diff --git a/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md b/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md index 6e8def61a..55d0afdbe 100644 --- a/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md +++ b/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md @@ -19,7 +19,7 @@ PyPTO IR 上 `TensorType.tensor_view_` 当前可以处于两种等价形态: **Produces**: -- `TensorViewCanonical` —— `PassPipeline` 在 Pass 之后自动用 registry 中的弱模式 verifier 校验 +- `TensorViewCanonical` —— `PassPipeline` 在 Pass 之后自动用 registry 中的**严格模式** verifier 校验(拒绝 `view.has_value() && stride.empty()` —— 正是本 Pass 负责消除的状态) **默认 pipeline 中的位置**(自 RFC #1300 P6 起激活):[`CanonicalizeIOOrder`](25-canonicalize_io_order.md) 与 [`InitMemRef`](27-init_memref.md) 之间。这是 codegen-prep 边界 —— 所有 layout-mutating pass(`LowerTransposeLoadParamLayout` / `ResolveBackendOpLayouts` / `ExpandMixedKernel` / `SplitVectorKernel`)已结束,`InitMemRef` 是第一个依赖显式 stride 的消费者。 @@ -106,7 +106,7 @@ ND 情况下公式退化为标准行主序 packed stride。 ## 与 verifier 的协同 -由于 Pass 声明 `produced = {... ∪ TensorViewCanonical}`,`PassPipeline` 在 Pass 完成后自动调用 registry 中的 `TensorViewCanonical` verifier;非法 IR(如 `TensorType` 上挂 NZ)会立即作为 `pypto::ValueError` 抛出。registry 默认是**弱模式** verifier(接受 `stride.empty()`);**严格模式** verifier 通过 `passes.verify_tensor_view_canonical(program, require_materialized=True)` 显式调用,它就是 P6/P7 将启用的 codegen 入口契约。 +由于 Pass 声明 `produced = {... ∪ TensorViewCanonical}`,`PassPipeline` 在 Pass 完成后自动调用 registry 中的 `TensorViewCanonical` verifier。registry 默认是**严格模式** verifier(RFC #1300 §2.4 codegen 入口契约):它拒绝 `view.has_value() && stride.empty()` —— 因为本 Pass 就是负责物化这些 stride 的。裸 `TensorType`(`!view.has_value()`)仍然接受 —— 隐式 ND-packed 自然 canonical。同一 verifier 也可通过 `passes.verify_tensor_view_canonical(program, require_materialized=True)` 显式调用;传 `require_materialized=False` 切换到弱模式(用于物化之前的解析期 / 前期 pass 窗口)。 ## 相关 diff --git a/docs/zh-cn/user/01-language_guide.md b/docs/zh-cn/user/01-language_guide.md index 81a477bac..6a6aabf31 100644 --- a/docs/zh-cn/user/01-language_guide.md +++ b/docs/zh-cn/user/01-language_guide.md @@ -48,19 +48,24 @@ idx: pl.Scalar[pl.INDEX] # 索引标量 ### 张量布局(TensorLayout) -布局控制 Tensor 的物理内存排列: +`pl.Tensor[...]` annotation 写 **runtime 行优先 shape**,不写 layout 标记。layout 是 IR 内部概念,由派生/消费视图的 op 推导,不需要在 annotation 上表达。 -| 布局 | 说明 | -| ---- | ---- | -| `pl.ND` | N 维(默认,行优先) | -| `pl.DN` | DN 布局 | -| `pl.NZ` | NZ 分形格式(硬件特定分块) | +```python +# ✅ 推荐 —— 写源 tensor shape,不写 layout 标记: +b: pl.Tensor[[N, K], pl.FP32] +``` ```python -# 指定布局作为第三个类型参数 -a: pl.Tensor[[64, 128], pl.FP16, pl.NZ] +# ⚠️ 已弃用(RFC #1300 补充 1): +b: pl.Tensor[[K, N], pl.FP32, pl.DN] # → 解析期触发 DeprecationWarning ``` +> **为什么弃用 `pl.Tensor[..., pl.DN]`。** layout-only 简写迫使用户脑子里同时持有两套坐标系(IR 逻辑后视图 shape 与 runtime 行优先 shape)—— 恰恰是 RFC #1300 想要消除的歧义。改用:去掉 layout 标记,写 runtime shape —— matmul B^T 场景用 `pl.load(..., transpose=True)` 加载行优先 tensor(参见下文「数据搬运」);DN-producing op 之后的 slice 自动继承父 layout。 + +如需 NZ(硬件 tile layout),写 `pl.Tile[..., pl.NZ]` —— NZ 是 tile-only,不允许作为 TensorType annotation。`pl.NZ` 常量保留用于 tile annotation 和 IR 内部使用。 + +若需要在 IR 层面写 DN tensor(如测试 fixture 或 round-trip 打印的 IR),用 `pl.TensorView(stride=[...], layout=pl.TensorLayout.DN)` —— 强制写显式 stride,避免隐式坐标翻转的隐患。 + ### 动态形状(Dynamic Shapes) 使用 `pl.dynamic()` 声明运行时确定的维度: diff --git a/python/pypto/language/parser/type_resolver.py b/python/pypto/language/parser/type_resolver.py index 37181d24a..3b1dc37de 100644 --- a/python/pypto/language/parser/type_resolver.py +++ b/python/pypto/language/parser/type_resolver.py @@ -10,6 +10,7 @@ """Type annotation resolution for IR parsing.""" import ast +import warnings from collections.abc import Callable, Sequence from typing import TYPE_CHECKING, Any, cast @@ -441,6 +442,7 @@ def _resolve_subscript_type(self, subscript_node: ast.Subscript) -> ir.Type: # tensor_view = self._resolve_tensorview(third) return tensor_ctor(shape, dtype, None, tensor_view) layout = self.resolve_layout(third) + self._warn_on_user_facing_dn_layout(layout, type_name) tensor_view = ir.TensorView([], layout) return tensor_ctor(shape, dtype, None, tensor_view) @@ -450,6 +452,7 @@ def _resolve_subscript_type(self, subscript_node: ast.Subscript) -> ir.Type: # tensor_view = self._resolve_tensorview(third) else: layout = self.resolve_layout(third) + self._warn_on_user_facing_dn_layout(layout, type_name) tensor_view = ir.TensorView([], layout) memref_node = slice_value.elts[3] if not self._is_memref_node(memref_node): @@ -986,6 +989,35 @@ def resolve_dtype(self, dtype_node: ast.expr) -> DataType: hint="Use pl.FP32, pl.INT32, or other supported dtype constants", ) + def _warn_on_user_facing_dn_layout(self, layout: "ir.TensorLayout", type_name: str) -> None: + """Emit a ``DeprecationWarning`` when the user writes the layout-only DN + shorthand on a tensor type annotation (RFC #1300 supplementary 1). + + Suppressed for ``ir.TensorLayout.ND`` (default, no-op marker) and for + explicit ``pl.TensorView(stride=..., layout=DN)`` forms (which carry + their own stride and don't rely on the shorthand's implicit coordinate + flip). Tile-side layouts are never seen here — Tile annotations route + through ``_resolve_tile_annotation_args``. + """ + if layout != ir.TensorLayout.DN: + return + warnings.warn( + f"pl.{type_name}[..., pl.DN] is deprecated (RFC #1300 supplementary 1). " + "Writing the DN layout-only shorthand requires the user to mentally hold " + "two coordinate systems at once (IR-logical post-view vs. runtime " + "row-major), which is exactly the ambiguity RFC #1300 aims to eliminate. " + "Three migration patterns cover every DN scenario without writing pl.DN:\n" + " * source tensor shape, no layout marker: pl.Tensor[[N, K], pl.FP32]\n" + " * derive DN at use site: xt = pl.transpose(x, -2, -1) # ND -> DN\n" + " * inherit DN through slice/reshape from a DN-producing op\n" + "If you must express a strided-DN view (e.g. canonical pretty-print " + "round-trip), use pl.TensorView(stride=[...], layout=pl.TensorLayout.DN) " + "instead — it forces explicit stride and avoids the implicit-coord-flip " + "hazard.", + DeprecationWarning, + stacklevel=4, + ) + def resolve_layout(self, layout_node: ast.expr) -> "ir.TensorLayout": """Resolve layout annotation to ir.TensorLayout. diff --git a/src/ir/verifier/property_verifier_registry.cpp b/src/ir/verifier/property_verifier_registry.cpp index ef003dd1a..ce8fe5b1e 100644 --- a/src/ir/verifier/property_verifier_registry.cpp +++ b/src/ir/verifier/property_verifier_registry.cpp @@ -67,11 +67,17 @@ PropertyVerifierRegistry::PropertyVerifierRegistry() { Register(IRProperty::InlineFunctionsEliminated, CreateInlineFunctionsEliminatedPropertyVerifier); Register(IRProperty::OrchestrationReferencesResolved, CreateOrchestrationReferencesResolvedPropertyVerifier); - // TensorViewCanonical (RFC #1300): the registry returns the weak-mode - // verifier (stride.empty() accepted as implicitly packed canonical). - // P3's MaterializeTensorStrides constructs the strict variant directly. + // TensorViewCanonical (RFC #1300 §2.4): strict mode — every TensorView + // reaching the codegen-entry boundary must carry explicit stride. The + // registry default fires immediately after ``MaterializeTensorStrides`` + // (its produced property), turning the "codegen entry has explicit + // stride" contract from convention into a verified invariant. Bare + // TensorTypes (``!view.has_value()``) are still accepted as implicitly + // ND-packed — the check only flags ``view.has_value() && stride.empty()``, + // which is the state ``MaterializeTensorStrides`` is responsible for + // eliminating. Register(IRProperty::TensorViewCanonical, - []() { return CreateTensorViewCanonicalPropertyVerifier(/*require_materialized=*/false); }); + []() { return CreateTensorViewCanonicalPropertyVerifier(/*require_materialized=*/true); }); } void PropertyVerifierRegistry::Register(IRProperty prop, std::function factory) { diff --git a/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py b/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py index ac040d5e7..a8a1267c5 100644 --- a/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py +++ b/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py @@ -211,13 +211,32 @@ def test_symbolic_dn_relaxed_passes(): # ============================================================================ -def test_registry_returns_weak_verifier(): - """The registry's TensorViewCanonical entry uses weak mode by default — - so empty stride is accepted (mirrors weak mode of verify_tensor_view_canonical).""" +def test_registry_returns_strict_verifier(): + """The registry's TensorViewCanonical entry uses strict mode (RFC #1300 + §2.4 — codegen-entry contract). MaterializeTensorStrides produces this + property, so the auto-verify after it enforces explicit stride. Empty + stride on an explicit TensorView is rejected (the state + MaterializeTensorStrides is responsible for eliminating).""" view = ir.TensorView([], ir.TensorLayout.DN) t = ir.TensorType(_shape(4, 8), DataType.FP32, None, view) program = _program_with_param_type(t) + props = _passes.IRPropertySet() + props.insert(_passes.IRProperty.TensorViewCanonical) + diags = _passes.PropertyVerifierRegistry.verify(props, program) + assert len(diags) >= 1 + assert any("stride is empty" in d.message for d in diags), ( + f"expected 'stride is empty' diagnostic, got: {[d.message for d in diags]}" + ) + + +def test_registry_accepts_bare_tensor_type(): + """Bare TensorTypes (``!view.has_value()``) are implicitly ND-packed and + accepted by both weak and strict modes — only ``view.has_value() && + stride.empty()`` is flagged.""" + t = ir.TensorType(_shape(4, 8), DataType.FP32) + program = _program_with_param_type(t) + props = _passes.IRPropertySet() props.insert(_passes.IRProperty.TensorViewCanonical) diags = _passes.PropertyVerifierRegistry.verify(props, program)