refactor(ir): move P6 tensor.as_layout from orch call-site to InCore body (RFC #1300) by lyfne123 · Pull Request #1339 · hw-native-sys/pypto

lyfne123 · 2026-05-12T02:59:40Z

Summary

Moves the P6 tensor.as_layout bridge from the orch call site to the top of the InCore body, end-to-end equivalent but -132 LOC net and removes a cluster of incidental complexity in orchestration codegen. See #1300 discussion comment for the design rationale and consensus question.

What changes

For each InCore parameter p loaded via tile.load(p, ..., transpose=True):

Before (current main, post-#1324):

InCore param signature is promoted to [..., b, a] DN.
Orch call site is wrapped: bridged = tensor.as_layout(arg, DN); incore(bridged, ...).
Orchestration codegen has to chase aliases through the bridge via BuildWrapperAliasMap + ResolveAliasChain to recover the original wrapper param.

After (this PR):

InCore param signature is untouched (stays [..., a, b] ND, matching the runtime torch tensor).
InCore body is prepended with p_dn = tensor.as_layout(p, layout=DN); body uses of p are substituted with p_dn.
The matching tile.load is rewritten to read from p_dn with the trailing pair of offsets/shapes/valid_shapes swapped and transpose=False.
Orch is left completely alone — the orchestrator's call args are wrapper params directly.

Diff stats

File	Change
`src/ir/transforms/lower_transpose_load_param_layout_pass.cpp`	-174 / +14 — deletes `CallSiteAsLayoutInjector` + Phase 2 + `PromoteToCanonicalDN`; new `LowerInCoreFunction` prepends to body
`src/codegen/orchestration/orchestration_codegen.cpp`	-60 / 0 — deletes `BuildWrapperAliasMap` / `ResolveAliasChain` / alias-chasing fallback
`src/backend/common/pto_ops_common.cpp`	0 / +86 — registers `tensor.as_layout` PTO codegen (emits one `pto.make_tensor_view` sharing the input's base)
Tests + docs + bindings	-188 / +291 — 5 pass-test bodies rewritten to assert the new IR shape; pass 18 docs (en/zh-cn) rewritten; pass 26 example caption updated; passes.h Doxygen + nanobind docstring + pyi stub updated

Total: -485 / +391 = -94 LOC net (the +291 in tests/docs is mostly added explanatory comments and structured assertions — the production code delta is -297 / +165 = -132 LOC).

Why this is acceptable per RFC §4.2

RFC §4.2's "InCore cannot create tensors" invariant targets ops that allocate a byte buffer (tensor.create). tensor.as_layout is a pure metadata reinterpret — it allocates nothing, it just re-describes the input's existing physical buffer. The four-layer boundary (§5) becomes cleaner under this design:

Runtime / Orch: row-major ND physical buffer (matches runtime).
Cross-function boundary: always row-major ND (no layout reinterpret).
Inside an InCore body: derive the DN view via tensor.as_layout; this is a single-function internal detail.
.pto: codegen consumes whatever canonical triple the InCore body sees.

Validation

cmake --build build --parallel — clean.
pytest tests/ut/ -n auto --maxprocesses 8 — 4602 passed / 41 skipped / 0 failed.
Golden-string .pto codegen tests pass — output is byte-identical to main.
End-to-end matmul B^T, paged-attention (single + multi-config), orchestration codegen tests all pass.

Test plan

All existing unit tests pass.
Pass-specific tests rewritten to validate new IR shape (body-prepended tensor.as_layout binding + tile.load reading from binding LHS + orch left alone).
cmake --build clean.
CI: clang-tidy, pre-commit, unit-tests (macos + ubuntu), fuzz-tests-sim, system-tests, system-tests-a5sim, pypto-lib-model.

Discussion

Open for RFC author / reviewers to weigh in. The design tradeoff is "signature is the contract" (current main) vs. "cross-function boundary is the runtime-faithful boundary, DN view is a per-kernel detail" (this PR). The latter eliminates downstream codegen complexity at the cost of slightly less honest InCore signatures.

Alternative design for RFC hw-native-sys#1300 P6: keep InCore param signatures unchanged and put the layout reinterpret inside the InCore body instead of at the orch caller. The end-to-end semantics are the same, but the structural cost of the bridge shifts entirely into PTO codegen (one new op handler ~80 LOC) and removes the orch-side complexity (BuildWrapperAliasMap + ResolveAliasChain + CallSiteAsLayoutInjector ~110 LOC + half the pass). Files: - lower_transpose_load_param_layout_pass.cpp: PromoteInCoreFunction is replaced by LowerInCoreFunction. New pass: scan InCore body for transpose=True loads of params; for each such param ``b``, prepend ``b_dn = tensor.as_layout(b, layout=DN)`` to the body and substitute body uses of ``b`` with ``b_dn`` before swapping the trailing pair of offsets/shapes/valid_shapes on the matching tile.loads. Param signatures are untouched. ``CallSiteAsLayoutInjector`` and the entire Phase 2 (walk non-InCore functions and inject bridges at every call site) are removed. - pto_ops_common.cpp: register ``tensor.as_layout`` as a PTO backend op. The codegen lowers it to a fresh ``pto.make_tensor_view`` bound to the input's underlying buffer SSA (the function parameter), using the LHS's (shape, stride, layout) triple. Downstream tile.load resolves through ``RegisterTensorView``. - orchestration_codegen.cpp: ``BuildWrapperAliasMap`` and ``ResolveAliasChain`` are deleted; ``BuildWrapperReorderedParams`` drops its alias-chasing fallback. Inner-call args now map directly to wrapper params (no orch-side bridge to see through). Net diff: -297 / +165 lines across the three files. Unit tests (excluding the pass-specific assertions which encode the old design's expected param-promotion behaviour): 4583 pass / 41 skipped / 0 failures. The 5 pass-specific test failures all assert ``b.shape == [K, N]`` which is no longer the post-pass param shape — they need rewriting to assert the new IR structure (body prepended with ``b_dn = as_layout(b, DN)`` of type ``[K, N] DN``). End-to-end codegen tests (paged_attn, paged_attn_multi_config, the full tile-pto pipeline) all pass — the .pto output is byte-identical to the original design.

Companion to the prior pass/codegen refactor commit. Updates: - tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py: All 5 affected tests rewritten to validate the new IR shape — the InCore param signature is preserved, a body-prepended ``b_dn = tensor.as_layout(b, layout=DN)`` AssignStmt is the new contract, and the matching tile.load reads from ``b_dn`` (the body-local Var) with the trailing pair of offsets/shapes/valid_shapes swapped. Orch call sites are asserted to pass wrapper params directly with no tensor.as_layout bridge. New helpers: ``_find_as_layout_binding`` / ``_has_as_layout_for``; obsolete ``_find_assign_rhs`` removed. All 8 pass tests now pass. - docs/en/dev/passes/18-lower_transpose_load_param_layout.md and the zh-cn mirror: rewritten to describe the body-prepend design — param signatures unchanged, ``b_dn = tensor.as_layout(b, DN)`` prepended at the top of the InCore body, orch left untouched. The example diff in the doc now matches what the pass actually emits. - docs/{en,zh-cn}/dev/passes/26-materialize_tensor_strides.md: drop the "produced by a future LowerTransposeLoadParamLayout rewrite" hint in the example caption — under the new design, P18's body-local ``tensor.as_layout`` LHS already carries CanonicalizeView-filled strides, so the empty-stride DN-view example is most naturally the user-written ``pl.Tensor[..., pl.DN]`` case. - include/pypto/ir/transforms/passes.h, python/bindings/modules/passes.cpp, python/pypto/pypto_core/passes.pyi: C++ Doxygen, nanobind docstring, and Python stub doc updated to match the new pass behaviour. Validation: cmake --build build --parallel (clean); pytest tests/ut/ -n auto → 4591 passed / 41 skipped / 0 failed.

coderabbitai · 2026-05-12T02:59:53Z

📝 Walkthrough

Walkthrough

The LowerTransposeLoadParamLayout pass is refactored to insert body-local tensor.as_layout(..., layout=DN) views for InCore function parameters instead of promoting their types and wrapping non-InCore call sites. Parameter signatures remain unchanged; the pass rewrites tile.load(..., transpose=True) by swapping trailing offset/shape pairs and removing the kwarg.

Changes

Transpose Layout Encoding Refactor

Layer / File(s)	Summary
Pass specification and documentation `docs/en/dev/passes/18-lower_transpose_load_param_layout.md`, `docs/zh-cn/dev/passes/18-lower_transpose_load_param_layout.md`, `docs/*/dev/passes/26-materialize_tensor_strides.md`, `include/pypto/ir/transforms/passes.h`, `python/bindings/modules/passes.cpp`, `python/pypto/pypto_core/passes.pyi`	Overview, algorithm, example, and scope sections rewritten for all language versions and bindings to describe body-local DN view insertion, unchanged parameter signatures, non-InCore function preservation, and tile.load coordinate swapping. MaterializeTensorStrides doc updated to clarify DN view source as user-written.
IR transform pass implementation `src/ir/transforms/lower_transpose_load_param_layout_pass.cpp`	Core rewrite removes type-promotion and call-site-injection logic. New `LowerInCoreFunction` flow scans for transpose-loaded parameters, prepends `tensor.as_layout` bindings, substitutes body uses, and rewrites `tile.load` calls via rewritten `TileLoadBodyRewriter` keyed on DN view variables.
Backend codegen for tensor.as_layout `src/backend/common/pto_ops_common.cpp`	New op registration validates tensor view metadata, registers buffers for downstream `tile.load` lookups, materializes shape/stride operands as index SSA values, and emits `pto.make_tensor_view` with layout attributes derived from `TensorLayout`.
Orchestration codegen simplification `src/codegen/orchestration/orchestration_codegen.cpp`	Removes `BuildWrapperAliasMap` and `ResolveAliasChain` helpers and alias-chasing fallback path from `BuildWrapperReorderedParams`, simplifying task parameter routing to direct mapping only since non-InCore functions are no longer wrapped.
Test suite updates `tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py`	Removes orchestration-side SSA binding helpers. Adds InCore body `tensor.as_layout` binding detection helpers. Rewrites all transpose promotion test assertions to validate body-local view insertion, unchanged parameter signatures, DN-view-based tile.load rewriting, and unmodified orchestration call sites.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hw-native-sys/pypto#745: Both PRs modify DN-layout handling pipeline codegen for tensor.as_layout and tile.load offset handling in pto_ops_common.cpp and InCore parameter/transpose-layout transform logic.
hw-native-sys/pypto#753: Both PRs modify tile.load transpose handling by swapping shape/offset elements and adjusting tile.load and type-deduction code paths.
hw-native-sys/pypto#1324: Both PRs add tensor.as_layout and modify LowerTransposeLoadParamLayout pass, but this PR shifts from call-site injection/promotion to body-local view insertion.

Suggested reviewers

Hzfengsy

Poem

🐰 A layout view, now local to the body bound,
No wrapping at the call site to be found,
The pass inserts a DN frame so neat,
Swaps the shapes to make transpose complete,
Non-InCore functions? Left untouched, serene!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.90% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: moving tensor.as_layout from orchestrator call-site to InCore body as part of RFC `#1300` implementation.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed rationale, before/after comparisons, diff statistics, and validation results that explain the refactoring changes across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py (1)

95-116: ⚡ Quick win

Enforce “prepended at top of body” in helper semantics.

Line 96 says the helper validates a body-prepended binding, but the implementation only checks existence/uniqueness anywhere in the body. This weakens coverage for the pass contract. Please also assert placement at the beginning of func.body.

Proposed tightening

 def _find_as_layout_binding(func, input_var):
@@
-    for stmt in _iter_stmts(func.body):
+    for stmt in _iter_stmts(func.body):
         if not isinstance(stmt, ir.AssignStmt):
             continue
@@
     assert len(matches) == 1, (
         f"expected exactly one tensor.as_layout binding for {input_var.name_hint}, found {len(matches)}"
     )
+    # Keep the contract strict: binding must be prepended in the top-level body.
+    assert isinstance(func.body, ir.SeqStmts) and func.body.stmts, "function body must be non-empty SeqStmts"
+    first_stmt = func.body.stmts[0]
+    assert isinstance(first_stmt, ir.AssignStmt), "first statement must be AssignStmt for prepended as_layout"
+    first_rhs = first_stmt.value
+    assert (
+        isinstance(first_rhs, ir.Call)
+        and first_rhs.op is not None
+        and first_rhs.op.name == "tensor.as_layout"
+        and first_rhs.args
+        and isinstance(first_rhs.args[0], ir.Var)
+        and first_rhs.args[0] is input_var
+    ), "tensor.as_layout binding must be prepended at top of function body"
     return matches[0]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py` around
lines 95 - 116, The helper _find_as_layout_binding currently only ensures a
unique tensor.as_layout binding anywhere in func.body; change it to also assert
that the matching AssignStmt is prepended at the start of func.body by recording
the matching stmt (the ir.AssignStmt whose rhs is an ir.Call to
"tensor.as_layout" with args[0] == input_var) and after collecting matches
assert len(matches) == 1 and that the matched stmt is the very first statement
in func.body (i.e., func.body[0] is the matched AssignStmt); reference
_find_as_layout_binding, func.body, ir.AssignStmt, ir.Call, and the
"tensor.as_layout" call when making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py`:
- Around line 95-116: The helper _find_as_layout_binding currently only ensures
a unique tensor.as_layout binding anywhere in func.body; change it to also
assert that the matching AssignStmt is prepended at the start of func.body by
recording the matching stmt (the ir.AssignStmt whose rhs is an ir.Call to
"tensor.as_layout" with args[0] == input_var) and after collecting matches
assert len(matches) == 1 and that the matched stmt is the very first statement
in func.body (i.e., func.body[0] is the matched AssignStmt); reference
_find_as_layout_binding, func.body, ir.AssignStmt, ir.Call, and the
"tensor.as_layout" call when making this change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 30d6b85e-dfed-47cc-b3b0-524d2149ecf5

📥 Commits

Reviewing files that changed from the base of the PR and between 2f76f2f and 59b0257.

📒 Files selected for processing (11)

docs/en/dev/passes/18-lower_transpose_load_param_layout.md
docs/en/dev/passes/26-materialize_tensor_strides.md
docs/zh-cn/dev/passes/18-lower_transpose_load_param_layout.md
docs/zh-cn/dev/passes/26-materialize_tensor_strides.md
include/pypto/ir/transforms/passes.h
python/bindings/modules/passes.cpp
python/pypto/pypto_core/passes.pyi
src/backend/common/pto_ops_common.cpp
src/codegen/orchestration/orchestration_codegen.cpp
src/ir/transforms/lower_transpose_load_param_layout_pass.cpp
tests/ut/ir/transforms/test_lower_transpose_load_param_layout_pass.py

💤 Files with no reviewable changes (1)

src/codegen/orchestration/orchestration_codegen.cpp

gemini-code-assist

Code Review

This pull request refactors the LowerTransposeLoadParamLayout pass to use a body-local tensor.as_layout view instead of promoting function parameter signatures. This change ensures that orchestration functions remain untouched and parameter signatures stay consistent with their original row-major ND layout. The pass now prepends an explicit layout reinterpret at the top of the InCore body, substitutes internal uses, and rewrites tile.load calls to use canonical coordinates. Corresponding documentation, Python bindings, and unit tests have been updated to reflect this architectural shift. A review comment suggests using a safer casting pattern for the codegen_base in the backend registration to improve robustness and maintain consistency with project patterns.

gemini-code-assist · 2026-05-12T03:05:24Z

+  // ``tile.load`` lookups via ``GetOrCreateTensorView`` find the LHS through
+  // the ``RegisterTensorView`` call below.
+  reg("tensor.as_layout", [](const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
+    auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);


The dynamic_cast to codegen::PTOCodegen& assumes that the provided codegen_base is always an instance of PTOCodegen. While this is likely true for all current PTO-based backends, it's safer to use As<codegen::PTOCodegen>(codegen_base) if available, or at least add a check before the cast to avoid a std::bad_cast exception if a non-PTO backend ever attempts to use this registration.

References

When checking for a type that has a base class, a single check for the base class (using the As pattern) is preferred for consistency and simplicity.

lyfne123 added 2 commits May 12, 2026 10:57

github-project-automation Bot added this to pto project May 12, 2026

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

Hzfengsy approved these changes May 12, 2026

View reviewed changes

Hzfengsy merged commit e1e7a58 into hw-native-sys:main May 12, 2026
9 checks passed

coderabbitai Bot mentioned this pull request May 12, 2026

feat: deprecate pl.Tensor[..., pl.DN] + strict TensorViewCanonical (RFC #1300 §2.4 + supplementary 1) #1347

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ir): move P6 tensor.as_layout from orch call-site to InCore body (RFC #1300)#1339

refactor(ir): move P6 tensor.as_layout from orch call-site to InCore body (RFC #1300)#1339
Hzfengsy merged 2 commits into
hw-native-sys:mainfrom
lyfne123:rfc1300/p6-as-layout-incore

lyfne123 commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyfne123 commented May 12, 2026

Summary

What changes

Diff stats

Why this is acceptable per RFC §4.2

Validation

Test plan

Discussion

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 12, 2026 •

edited

Loading