hw-native-sys · Hzfengsy · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/docs/en/dev/passes/26-materialize_tensor_strides.md b/docs/en/dev/passes/26-materialize_tensor_strides.md
@@ -19,7 +19,7 @@ Codegen needs one machine-readable contract, so `MaterializeTensorStrides` walks
 
 **Produces**:
 
-- `TensorViewCanonical` — `PassPipeline` auto-verifies after the pass (using the registry's weak-mode verifier)
+- `TensorViewCanonical` — `PassPipeline` auto-verifies after the pass using the registry's **strict-mode** verifier (empty stride on a present `TensorView` is rejected — that is the state this pass is responsible for eliminating)
 
 **Position in the default pipeline** (active since RFC #1300 P6): between [`CanonicalizeIOOrder`](25-canonicalize_io_order.md) and [`InitMemRef`](27-init_memref.md). This is the codegen-prep boundary — every layout-mutating pass (`LowerTransposeLoadParamLayout`, `ResolveBackendOpLayouts`, `ExpandMixedKernel`, `SplitVectorKernel`) has finished, and `InitMemRef` is the first consumer that needs explicit stride.
 
@@ -106,7 +106,7 @@ See `BuildLogicalStridesFromLayout` in [`tensor_view_semantics.h`](../../../../i
 
 ## Verifier interaction
 
-Because the pass declares `produced = {... ∪ TensorViewCanonical}`, `PassPipeline` automatically runs the registry's `TensorViewCanonical` verifier after the pass, surfacing invalid IR (e.g. NZ-on-`TensorType`) immediately as `pypto::ValueError`. The registry default is the **weak-mode** verifier (which accepts `stride.empty()` as implicitly packed canonical); the **strict-mode** verifier — which requires materialization — is reachable directly via `passes.verify_tensor_view_canonical(program, require_materialized=True)` and is the codegen-entry contract that P6/P7 will enforce.
+Because the pass declares `produced = {... ∪ TensorViewCanonical}`, `PassPipeline` automatically runs the registry's `TensorViewCanonical` verifier after the pass. The registry default is the **strict-mode** verifier (RFC #1300 §2.4 codegen-entry contract): it rejects `view.has_value() && stride.empty()` since this pass is responsible for materializing those slots. Bare `TensorType` (`!view.has_value()`) is still accepted — implicit ND-packed is canonical by construction. The same verifier is callable directly via `passes.verify_tensor_view_canonical(program, require_materialized=True)`; pass `require_materialized=False` for the weak mode used during the parse-time / early-pass window before materialization runs.
 
 ## Related
 

diff --git a/docs/en/user/01-language_guide.md b/docs/en/user/01-language_guide.md
@@ -48,19 +48,38 @@ idx: pl.Scalar[pl.INDEX]                # index scalar
 
 ### Tensor Layouts
 
-Layouts control the physical memory arrangement of Tensors:
+Write your `pl.Tensor[...]` annotations using the **runtime row-major
+shape** without a layout marker. Layout is an IR-internal concern that
+passes derive from the ops actually producing/consuming views; you do
+not need to express it in the type annotation.
 
-| Layout | Description |
-| ------ | ----------- |
-| `pl.ND` | N-Dimensional (default, row-major) |
-| `pl.DN` | DN layout |
-| `pl.NZ` | NZ fractal format (hardware-specific tiling) |
+```python
+# ✅ Recommended — source tensor shape, no layout marker:
+b: pl.Tensor[[N, K], pl.FP32]
+```
 
 ```python
-# Specify layout as third type parameter
-a: pl.Tensor[[64, 128], pl.FP16, pl.NZ]
+# ⚠️ Deprecated (RFC #1300 supplementary 1):
+b: pl.Tensor[[K, N], pl.FP32, pl.DN]   # → DeprecationWarning at parse time
 ```
 
+> **Why `pl.Tensor[..., pl.DN]` is deprecated.** Writing the DN
+> layout-only shorthand forces you to mentally hold two coordinate systems
+> at once (the IR-logical post-view shape and the runtime row-major shape).
+> Drop the layout marker and write the runtime shape — for matmul B^T,
+> use `pl.load(..., transpose=True)` on the row-major tensor (see "Data
+> Movement" below); for slicing a DN-producing op, the slice inherits
+> the parent's layout automatically.
+
+For NZ (hardware-specific tile layout), use `pl.Tile[..., pl.NZ]` — NZ is
+tile-only, never a TensorType annotation. The `pl.NZ` constant remains
+available for tile annotations and IR-internal use.
+
+If you need to write a DN tensor at the IR level (e.g. when constructing
+fixtures or round-tripping printed IR), prefer
+`pl.TensorView(stride=[...], layout=pl.TensorLayout.DN)` which forces
+explicit stride and avoids the implicit coordinate-flip hazard.
+
 ### Dynamic Shapes
 
 Use `pl.dynamic()` for dimensions determined at runtime:

diff --git a/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md b/docs/zh-cn/dev/passes/26-materialize_tensor_strides.md
@@ -19,7 +19,7 @@ PyPTO IR 上 `TensorType.tensor_view_` 当前可以处于两种等价形态：
 
 **Produces**：
 
-- `TensorViewCanonical` —— `PassPipeline` 在 Pass 之后自动用 registry 中的弱模式 verifier 校验
+- `TensorViewCanonical` —— `PassPipeline` 在 Pass 之后自动用 registry 中的**严格模式** verifier 校验（拒绝 `view.has_value() && stride.empty()` —— 正是本 Pass 负责消除的状态）
 
 **默认 pipeline 中的位置**（自 RFC #1300 P6 起激活）：[`CanonicalizeIOOrder`](25-canonicalize_io_order.md) 与 [`InitMemRef`](27-init_memref.md) 之间。这是 codegen-prep 边界 —— 所有 layout-mutating pass（`LowerTransposeLoadParamLayout` / `ResolveBackendOpLayouts` / `ExpandMixedKernel` / `SplitVectorKernel`）已结束，`InitMemRef` 是第一个依赖显式 stride 的消费者。
 
@@ -106,7 +106,7 @@ ND 情况下公式退化为标准行主序 packed stride。
 
 ## 与 verifier 的协同
 
-由于 Pass 声明 `produced = {... ∪ TensorViewCanonical}`，`PassPipeline` 在 Pass 完成后自动调用 registry 中的 `TensorViewCanonical` verifier；非法 IR（如 `TensorType` 上挂 NZ）会立即作为 `pypto::ValueError` 抛出。registry 默认是**弱模式** verifier（接受 `stride.empty()`）；**严格模式** verifier 通过 `passes.verify_tensor_view_canonical(program, require_materialized=True)` 显式调用，它就是 P6/P7 将启用的 codegen 入口契约。
+由于 Pass 声明 `produced = {... ∪ TensorViewCanonical}`，`PassPipeline` 在 Pass 完成后自动调用 registry 中的 `TensorViewCanonical` verifier。registry 默认是**严格模式** verifier（RFC #1300 §2.4 codegen 入口契约）：它拒绝 `view.has_value() && stride.empty()` —— 因为本 Pass 就是负责物化这些 stride 的。裸 `TensorType`（`!view.has_value()`）仍然接受 —— 隐式 ND-packed 自然 canonical。同一 verifier 也可通过 `passes.verify_tensor_view_canonical(program, require_materialized=True)` 显式调用；传 `require_materialized=False` 切换到弱模式（用于物化之前的解析期 / 前期 pass 窗口）。
 
 ## 相关
 

diff --git a/docs/zh-cn/user/01-language_guide.md b/docs/zh-cn/user/01-language_guide.md
@@ -48,19 +48,24 @@ idx: pl.Scalar[pl.INDEX]                # 索引标量
 
 ### 张量布局（TensorLayout）
 
-布局控制 Tensor 的物理内存排列：
+`pl.Tensor[...]` annotation 写 **runtime 行优先 shape**，不写 layout 标记。layout 是 IR 内部概念，由派生/消费视图的 op 推导，不需要在 annotation 上表达。
 
-| 布局 | 说明 |
-| ---- | ---- |
-| `pl.ND` | N 维（默认，行优先） |
-| `pl.DN` | DN 布局 |
-| `pl.NZ` | NZ 分形格式（硬件特定分块） |
+```python
+# ✅ 推荐 —— 写源 tensor shape，不写 layout 标记：
+b: pl.Tensor[[N, K], pl.FP32]
+```
 
 ```python
-# 指定布局作为第三个类型参数
-a: pl.Tensor[[64, 128], pl.FP16, pl.NZ]
+# ⚠️ 已弃用（RFC #1300 补充 1）：
+b: pl.Tensor[[K, N], pl.FP32, pl.DN]   # → 解析期触发 DeprecationWarning
 ```
 
+> **为什么弃用 `pl.Tensor[..., pl.DN]`。** layout-only 简写迫使用户脑子里同时持有两套坐标系（IR 逻辑后视图 shape 与 runtime 行优先 shape）—— 恰恰是 RFC #1300 想要消除的歧义。改用：去掉 layout 标记，写 runtime shape —— matmul B^T 场景用 `pl.load(..., transpose=True)` 加载行优先 tensor（参见下文「数据搬运」）；DN-producing op 之后的 slice 自动继承父 layout。
+
+如需 NZ（硬件 tile layout），写 `pl.Tile[..., pl.NZ]` —— NZ 是 tile-only，不允许作为 TensorType annotation。`pl.NZ` 常量保留用于 tile annotation 和 IR 内部使用。
+
+若需要在 IR 层面写 DN tensor（如测试 fixture 或 round-trip 打印的 IR），用 `pl.TensorView(stride=[...], layout=pl.TensorLayout.DN)` —— 强制写显式 stride，避免隐式坐标翻转的隐患。
+
 ### 动态形状（Dynamic Shapes）
 
 使用 `pl.dynamic()` 声明运行时确定的维度：

diff --git a/python/pypto/language/parser/type_resolver.py b/python/pypto/language/parser/type_resolver.py
@@ -10,6 +10,7 @@
 """Type annotation resolution for IR parsing."""
 
 import ast
+import warnings
 from collections.abc import Callable, Sequence
 from typing import TYPE_CHECKING, Any, cast
 
@@ -441,6 +442,7 @@ def _resolve_subscript_type(self, subscript_node: ast.Subscript) -> ir.Type:  #
                 tensor_view = self._resolve_tensorview(third)
                 return tensor_ctor(shape, dtype, None, tensor_view)
             layout = self.resolve_layout(third)
+            self._warn_on_user_facing_dn_layout(layout, type_name)
             tensor_view = ir.TensorView([], layout)
             return tensor_ctor(shape, dtype, None, tensor_view)
 
@@ -450,6 +452,7 @@ def _resolve_subscript_type(self, subscript_node: ast.Subscript) -> ir.Type:  #
             tensor_view = self._resolve_tensorview(third)
         else:
             layout = self.resolve_layout(third)
+            self._warn_on_user_facing_dn_layout(layout, type_name)
             tensor_view = ir.TensorView([], layout)
         memref_node = slice_value.elts[3]
         if not self._is_memref_node(memref_node):
@@ -986,6 +989,35 @@ def resolve_dtype(self, dtype_node: ast.expr) -> DataType:
             hint="Use pl.FP32, pl.INT32, or other supported dtype constants",
         )
 
+    def _warn_on_user_facing_dn_layout(self, layout: "ir.TensorLayout", type_name: str) -> None:
+        """Emit a ``DeprecationWarning`` when the user writes the layout-only DN
+        shorthand on a tensor type annotation (RFC #1300 supplementary 1).
+
+        Suppressed for ``ir.TensorLayout.ND`` (default, no-op marker) and for
+        explicit ``pl.TensorView(stride=..., layout=DN)`` forms (which carry
+        their own stride and don't rely on the shorthand's implicit coordinate
+        flip). Tile-side layouts are never seen here — Tile annotations route
+        through ``_resolve_tile_annotation_args``.
+        """
+        if layout != ir.TensorLayout.DN:
+            return
+        warnings.warn(
+            f"pl.{type_name}[..., pl.DN] is deprecated (RFC #1300 supplementary 1). "
+            "Writing the DN layout-only shorthand requires the user to mentally hold "
+            "two coordinate systems at once (IR-logical post-view vs. runtime "
+            "row-major), which is exactly the ambiguity RFC #1300 aims to eliminate. "
+            "Three migration patterns cover every DN scenario without writing pl.DN:\n"
+            "  * source tensor shape, no layout marker: pl.Tensor[[N, K], pl.FP32]\n"
+            "  * derive DN at use site: xt = pl.transpose(x, -2, -1)  # ND -> DN\n"
+            "  * inherit DN through slice/reshape from a DN-producing op\n"
+            "If you must express a strided-DN view (e.g. canonical pretty-print "
+            "round-trip), use pl.TensorView(stride=[...], layout=pl.TensorLayout.DN) "
+            "instead — it forces explicit stride and avoids the implicit-coord-flip "
+            "hazard.",
+            DeprecationWarning,
+            stacklevel=4,
+        )
+
     def resolve_layout(self, layout_node: ast.expr) -> "ir.TensorLayout":
         """Resolve layout annotation to ir.TensorLayout.
 

diff --git a/src/ir/verifier/property_verifier_registry.cpp b/src/ir/verifier/property_verifier_registry.cpp
@@ -67,11 +67,17 @@ PropertyVerifierRegistry::PropertyVerifierRegistry() {
   Register(IRProperty::InlineFunctionsEliminated, CreateInlineFunctionsEliminatedPropertyVerifier);
   Register(IRProperty::OrchestrationReferencesResolved,
            CreateOrchestrationReferencesResolvedPropertyVerifier);
-  // TensorViewCanonical (RFC #1300): the registry returns the weak-mode
-  // verifier (stride.empty() accepted as implicitly packed canonical).
-  // P3's MaterializeTensorStrides constructs the strict variant directly.
+  // TensorViewCanonical (RFC #1300 §2.4): strict mode — every TensorView
+  // reaching the codegen-entry boundary must carry explicit stride. The
+  // registry default fires immediately after ``MaterializeTensorStrides``
+  // (its produced property), turning the "codegen entry has explicit
+  // stride" contract from convention into a verified invariant. Bare
+  // TensorTypes (``!view.has_value()``) are still accepted as implicitly
+  // ND-packed — the check only flags ``view.has_value() && stride.empty()``,
+  // which is the state ``MaterializeTensorStrides`` is responsible for
+  // eliminating.
   Register(IRProperty::TensorViewCanonical,
-           []() { return CreateTensorViewCanonicalPropertyVerifier(/*require_materialized=*/false); });
+           []() { return CreateTensorViewCanonicalPropertyVerifier(/*require_materialized=*/true); });
 }
 
 void PropertyVerifierRegistry::Register(IRProperty prop, std::function<PropertyVerifierPtr()> factory) {

diff --git a/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py b/tests/ut/ir/transforms/test_verify_tensor_view_canonical.py
@@ -211,13 +211,32 @@ def test_symbolic_dn_relaxed_passes():
 # ============================================================================
 
 
-def test_registry_returns_weak_verifier():
-    """The registry's TensorViewCanonical entry uses weak mode by default —
-    so empty stride is accepted (mirrors weak mode of verify_tensor_view_canonical)."""
+def test_registry_returns_strict_verifier():
+    """The registry's TensorViewCanonical entry uses strict mode (RFC #1300
+    §2.4 — codegen-entry contract). MaterializeTensorStrides produces this
+    property, so the auto-verify after it enforces explicit stride. Empty
+    stride on an explicit TensorView is rejected (the state
+    MaterializeTensorStrides is responsible for eliminating)."""
     view = ir.TensorView([], ir.TensorLayout.DN)
     t = ir.TensorType(_shape(4, 8), DataType.FP32, None, view)
     program = _program_with_param_type(t)
 
+    props = _passes.IRPropertySet()
+    props.insert(_passes.IRProperty.TensorViewCanonical)
+    diags = _passes.PropertyVerifierRegistry.verify(props, program)
+    assert len(diags) >= 1
+    assert any("stride is empty" in d.message for d in diags), (
+        f"expected 'stride is empty' diagnostic, got: {[d.message for d in diags]}"
+    )
+
+
+def test_registry_accepts_bare_tensor_type():
+    """Bare TensorTypes (``!view.has_value()``) are implicitly ND-packed and
+    accepted by both weak and strict modes — only ``view.has_value() &&
+    stride.empty()`` is flagged."""
+    t = ir.TensorType(_shape(4, 8), DataType.FP32)
+    program = _program_with_param_type(t)
+
     props = _passes.IRPropertySet()
     props.insert(_passes.IRProperty.TensorViewCanonical)
     diags = _passes.PropertyVerifierRegistry.verify(props, program)