IR builder prototype for vpto#362
Open
learning-chip wants to merge 258 commits into
Open
Conversation
- Acquire/release per slice inside loop - Iteration 0 blocks until MTE2 done, iterations 1-7 proceed immediately
Add the merged v0.3 PTO micro-instruction release spec document for A5, including ISA group references and updated synchronization notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce a three-pass pipeline that lowers PTO tile ops to vector-level
implementations via TileLang DSL templates:
- ExpandTileOp: invokes TileLang Python DSL to instantiate template
functions and replaces tile ops with func.call. SpecKey covers all
operands; tile_buf operands are passed through without bridging.
- PTOInlineLibCall: extended to recognize tilelang instance functions via
the attribute set by the DSL frontend.
- FoldTileBufIntrinsics: resolves pto.tile_buf_addr / tile_valid_rows /
tile_valid_cols, including dynamic valid-shape via pto.bind_tile chain
tracing.
- MemrefToTileBuf: recovers tile_buf types from memref + bind_tile
metadata after PlanMemory/InsertSync.
- PTOViewToMemref: insert pto.bind_tile anchors for tile_buf function
args so MemrefToTileBuf can recover them.
Adds new PTO ops (tile_buf_addr/tile_valid_rows/tile_valid_cols),
ptoas pipeline wiring, design docs, and unit tests.
- Add tilelang-dsl cube surface design and implementation - Add vpto lowering for cube operations - Update frontend AST, semantic analysis, and lowering passes - Add tests for tilelang-dsl v1 cube features - Add design specs and task documentation
…w-native-sys#303) * feat: enhance verification for multiple PTO operations in A5 architecture - TLoadOp: add vec dst layout validation (ND/DN/NZ) and shape matching - TStoreOp: add vec layout consistency and alignment checks - TAddSOp: enable requireValidRowsEqual validation for A5 - TCvtOp: add A5 type conversion validation - TPartAddOp/TPartMaxOp/TPartMinOp/TPartMulOp: add verifyPartialValidPattern for A5 - TSelSOp: add mask row-major layout check - TShlOp/TShrOp: add dst element type validation in verifyShiftLikeBinaryTileOpCommon * feat: add 10 test cases and update PTO.cpp verification Test cases added to test/basic: - tadds_validrow_mismatch.pto - tcvt_type_combination.pto - tload_vec_layout_mismatch.pto - tpartadd_valid_shape.pto - tpartmax_valid_shape.pto - tpartmin_valid_shape.pto - tpartmul_valid_shape.pto - tshl_dtype_mismatch.pto - tshr_dtype_mismatch.pto - tstore_vec_alignment.pto PTO.cpp updates: - Simplify TLoadOp NZ validation - TStoreOp verification refinements * fix: update TPart*Op test cases and PTO.cpp verification * fix: update PTO.cpp and remove tcvt_type_combination test * fix: update PTO.cpp verification logic * fix: update PTO.cpp and test cases * fix: update PTO.cpp * fix: update PTO.cpp verification * fix: update PTO.cpp * fix: update PTO.cpp * fix: update PTO.cpp * fix: update PTO.cpp * revert: 回退PTO.cpp校验逻辑到正确版本 回退到commit 0ec558d(最后能通过的版本) 原因: - 检视意见误解了ISA编译时static_assert vs AS运行时校验的差异 - 盲目照搬ISA规则导致破坏原有设计 - 添加的错误校验导致CI反复失败(5次失败) 删除的错误校验: - TLoad的ValidCol/shape匹配强制校验 - TLoad的NZ shape[3]/shape[4]强制校验 - TPart的'至少一个src==dst'校验 - TStore过度严格的对齐校验 - fp4特殊处理逻辑 恢复的正确设计: - TLoad只校验布局类型匹配(ND/DN/NZ) - TStore标准对齐校验 - TPart只校验dst>=src - 保留运行时的灵活性和原有设计哲学 * fix: add missing validations for TMinS/TMulS/TMaxS and TPart ops - TMinS/TMulS/TMaxS: enable requireValidRowsEqualOnA5 to match ISA BinaryInstr behavior (ISA uses validRow for loop bounds, mismatch causes out-of-bounds access) - TPart: add validation that at least one src valid_shape matches dst valid_shape (ISA TPartInstr requires this for TPartCopyInstr/TPartOps) - TStore: improve code readability with explicit dim check and rows/cols naming * fix: correct TStore validation and revert TPart validation - TStore: use srcValid (not srcShape) for isSpecialCase and alignment checks (valid_shape represents actual data range, shape is tile allocation) - TPart: remove incorrect 'at least one src == dst' validation (ISA allows src0/src1 to be smaller in different dimensions) - Keep: TMinS/TMulS/TMaxS requireValidRowsEqualOnA5=true (valid fix) * fix: revert TStore to use srcShape for alignment checks - TStore alignment check should use srcShape (tile allocation size) not srcValid (valid data range) - Example: cols=256, v_col=255 → 256 satisfies 32-byte alignment, 255 doesn't - Hardware alignment requirement is based on tile allocation, not data range * fix: correct indentation for TStore verification code - Fix missing indentation on line 2285 (int32_t bl declaration) - Code formatting consistency with surrounding context - No functional changes, only whitespace fix * fix: update PTO.cpp * Fix TPart*Op validation: only use verifyPartialValidPatternLoose for A5 - Added verifyPartialValidPatternLoose function (checks only <= dst, not == dst) - Modified only A5 branches to use loose validation (4 locations) - A2A3 branches continue to use strict verifyPartialValidPattern (requires at least one == dst) - Fixes tpartmin/tpartmax test failures --------- Co-authored-by: User <user@example.com>
Input: kernel.pto with cube/vector kernel modules, or a single kernel module normalized into that form. Output: the final VPTO fatobj object at the requested -o path. Steps: 1. Normalize the VPTO input and lower cube/vector modules to LLVM. 2. Compile cube/vector LLVM IR to device objects and merge them. 3. Generate the host stub internally and repack it with the merged device object into the final fatobj. Tests: VPTO host validation now links the ptoas-produced fatobj directly; cases use one kernel.pto and no longer need cube.pto or hand-written stub.cpp.
Input: single VPTO module with pto.aicore functions containing pto.section.vector and pto.section.cube regions. Output: canonical VPTO container with vector/cube child modules carrying pto.kernel_kind, ready for the existing LLVM/fatobj path. Key steps: 1. Add vpto-split-cv-module to clone the input per core kind, drop the opposite section, and inline the selected section body. 2. Add vpto-normalize-container to wrap single kernel modules and verify the final container shape. 3. Run both passes in the VPTO backend pipeline and convert the mixed cbuf/ubuf SIM case to the section input form.
Input: VPTO and TileLang ST .pto testcases\nOutput: ptoas fatobj validation flow\n\n1. Remove stub.cpp and old ll/repack path\n2. Route TileLang ST and VPTO scripts through fatobj\n3. Normalize testcase inputs to the new module form
* Add HP (HIGH_PRECISION) support for TLog * Refactor tlog_template: extract precision mode into subfunctions Extract HIGH_PRECISION and DEFAULT implementations into separate inline_proc functions for better code organization per review feedback. * fix(tlog): add missing pto.aicore attribute to _hp functions --------- Co-authored-by: caojian5 <caojian5@huawei.com>
Move lit-discovered .pto tests out of test/basic into test/lit/pto or test/lit/vpto, and update related documentation paths. Relocate misplaced VPTO sim cases under the micro-op hierarchy, remove a duplicated nested sim case, and add local guidance for choosing PTOAS test frameworks.
Update VPTO lit tests to inspect explicit VPTO IR with --emit-vpto where they check VPTO-level rewrites. Route *_vpto_llvm tests through the VPTO LLVM lowering dump and check llvm.hivm calls from LLVM dialect MLIR instead of stale VPTO output.
Relocate FileCheck-based VPTO .pto tests from test/vpto to test/lit/vpto so they are discovered by the lit framework. Wrap auto-vecscope tests in a vector kernel submodule to match the current VPTO container form.
5e223fb to
42b74f9
Compare
[vpto] Add ptodsl tracing POC
Author
|
Because the base branch mouliangyu:feature-vpto-backend is rebased and force-pushed, the file diff of this PR got messed up. For an easy rebase, just move the ptodsl subdirectory to the latest commit. Other internal files (like .cpp or .td) are not touched by this PR. @Zhendong404 @mouliangyu |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
To reproduce
API design
As discussed in #360 (comment), I intentionally avoid ast module, but only use normal eager Python wrappers.
See the added
ptodsl/README.mdfile for current API list.Side notes
The two bash scripts
quick_install.shandset_ptoas_env.share just for internal quick development on top of my Docker image learning-chip/agent_docker_npu#8, they can be removed upon merge.findings on the python binding APIs
VRegType / MaskType — no dedicated Python class: These types (!pto.vreg<64xf32>, !pto.mask) are not exposed as Python classes in the C extension. They are constructed via Type.parse("!pto.vreg<64xf32>") and Type.parse("!pto.mask"), which is valid since the dialect is registered.
Nested modules: Module.create() ignores the active InsertionPoint. To insert an inner module as a child operation, Operation.create("builtin.module", regions=1) is used instead, followed by manually appending a block to its region with inner_op.regions[0].blocks.append().
pto.vecscope region: VecScopeOp creates the op with an empty region (no blocks). A block must be explicitly appended with vecscope_op.body.blocks.append() before inserting ops into it.
Module attributes (pto.target_arch, pto.kernel_kind): Set via op.attributes["name"] = value — string attributes via StringAttr.get(...), custom dialect attributes via Attribute.parse("#pto.kernel_kind").
No need to change or rebuild ptoas source: All required ops (VecScopeOp, VldsOp, VstsOp, VaddOp, PltB32Op, CastPtrOp, AddPtrOp) were already present in the auto-generated _pto_ops_gen.py bindings.