Summary
PyPTO IR's TensorType.tensor_view_ field has two conflicting semantics for the layout (DN) tag, depending on whether stride is empty. Codegen disambiguates via tensor_view_->stride.empty() — same TensorType structure, different meanings of shape_. This RFC proposes a self-consistent canonical form where shape_ is always logical, stride is per-logical-dim element step, and layout is a derivable / asserted constraint tag.
Motivation
Two conflicting paths producing the same IR shape:
Semantic A — matmul B^T path: User writes pl.Tensor[[N, K], pl.FP32]; ResolveTransposeLayout adds DN tag with empty stride. shape_ is the source-tensor shape; codegen swaps trailing two dims when emitting pto.make_tensor_view (get_shape_source_idx in pto_codegen.cpp). tile.load similarly swaps offsets/sizes via dn_swap.
Reference: tests/st/runtime/test_matmul.py::TestMatmulOperations::test_matmul_btranspose, output dir build_output/matmul_btranspose_32x64x32_*.
Semantic B — tensor.transpose path (post commit c6b6027): transpose([8, 16] ND) produces Tensor[[16, 8], DN, stride=[1, 16]]. shape_ is the logical post-transpose shape; stride explicitly describes col-major access. Codegen uses (shape, stride) as-is via has_explicit_stride.
Reference: tests/st/runtime/test_trans.py, output dir build_output/TransposeSliceRepro_*.
Root cause: A single TensorType structure carries two incompatible interpretations of shape_. c6b6027 patched tensor.transpose with explicit stride to coexist with the legacy DN-swap convention, but the underlying ambiguity remains. Any new pass / op touching DN tensors must navigate this ambiguity.
Fixing this enables:
- Mechanical codegen (no
dn_swap, no has_explicit_stride branching).
- Composability between
tensor.transpose, slice, and matmul B^T paths without special cases.
- Clear contract for orch / IR / Incore /
.pto boundaries.
Related symptoms: #1213 (matmul b_trans=True wrong results), #1230 (ND→ColMajor Vec tile rejection).
Design
A full design proposal is published in the first comment below.
- Background and conflict examples.
- Canonical TensorType invariants (
shape_ = logical; stride per logical dim; layout consistent with stride).
- Layout family table (ND-packed / ND-strided / DN-packed / DN-strided / NZ).
BuildLogicalStridesFromLayout(shape, layout) formula.
- "Codegen entry must have explicit stride" contract via new
MaterializeTensorStrides pass + VerifyTensorViewCanonical.
- Five frontend examples covering ND / explicit DN / matmul B^T /
tensor.transpose / DN-slice.
- Orch ↔ IR ↔ Incore ↔
.pto field-by-field mapping table.
- New virtual op
tensor.as_layout for orch↔Incore type bridging.
Open Questions
- Should
tensor.as_layout be exposed in the user-facing Python DSL, or kept as an internal IR-only op produced by LowerTransposeLoadParamLayout?
- For symbolic strides where
stride[-1] >= shape[-2] cannot be statically verified, do we accept the structural equality stride[-2] == 1 only and skip the >= check, or require an explicit constraint annotation?
- Inside
convert_tensor_to_tile_ops_pass.cpp, the layout kwarg propagated to tensor.create() currently includes the legacy DN-swap convention — what changes here once IR is canonical?
Git Commit ID
1810c34
Summary
PyPTO IR's
TensorType.tensor_view_field has two conflicting semantics for thelayout(DN) tag, depending on whetherstrideis empty. Codegen disambiguates viatensor_view_->stride.empty()— sameTensorTypestructure, different meanings ofshape_. This RFC proposes a self-consistent canonical form whereshape_is always logical,strideis per-logical-dim element step, andlayoutis a derivable / asserted constraint tag.Motivation
Two conflicting paths producing the same IR shape:
Semantic A — matmul B^T path: User writes
pl.Tensor[[N, K], pl.FP32];ResolveTransposeLayoutadds DN tag with empty stride.shape_is the source-tensor shape; codegen swaps trailing two dims when emittingpto.make_tensor_view(get_shape_source_idxinpto_codegen.cpp).tile.loadsimilarly swaps offsets/sizes viadn_swap.Reference:
tests/st/runtime/test_matmul.py::TestMatmulOperations::test_matmul_btranspose, output dirbuild_output/matmul_btranspose_32x64x32_*.Semantic B —
tensor.transposepath (post commit c6b6027):transpose([8, 16] ND)producesTensor[[16, 8], DN, stride=[1, 16]].shape_is the logical post-transpose shape; stride explicitly describes col-major access. Codegen uses(shape, stride)as-is viahas_explicit_stride.Reference:
tests/st/runtime/test_trans.py, output dirbuild_output/TransposeSliceRepro_*.Root cause: A single
TensorTypestructure carries two incompatible interpretations ofshape_. c6b6027 patchedtensor.transposewith explicit stride to coexist with the legacy DN-swap convention, but the underlying ambiguity remains. Any new pass / op touching DN tensors must navigate this ambiguity.Fixing this enables:
dn_swap, nohas_explicit_stridebranching).tensor.transpose, slice, and matmul B^T paths without special cases..ptoboundaries.Related symptoms: #1213 (matmul b_trans=True wrong results), #1230 (ND→ColMajor Vec tile rejection).
Design
A full design proposal is published in the first comment below.
shape_= logical;strideper logical dim;layoutconsistent with stride).BuildLogicalStridesFromLayout(shape, layout)formula.MaterializeTensorStridespass +VerifyTensorViewCanonical.tensor.transpose/ DN-slice..ptofield-by-field mapping table.tensor.as_layoutfor orch↔Incore type bridging.Open Questions
tensor.as_layoutbe exposed in the user-facing Python DSL, or kept as an internal IR-only op produced byLowerTransposeLoadParamLayout?stride[-1] >= shape[-2]cannot be statically verified, do we accept the structural equalitystride[-2] == 1only and skip the>=check, or require an explicit constraint annotation?convert_tensor_to_tile_ops_pass.cpp, the layout kwarg propagated totensor.create()currently includes the legacy DN-swap convention — what changes here once IR is canonical?Git Commit ID
1810c34