Add a testcase: insert tile op between vecscope regions#251
Draft
KurrinQu wants to merge 157 commits into
Draft
Conversation
Explain block/subblock runtime queries in workload-partitioning terms and remove redundant supported-forms wording from conversion ops docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add detailed mode parameter documentation (mode=0 vs mode=1) - Add 'Why get_buf/rls_buf is More Programmer-Friendly' section: - No manual priming/draining for ping/pong loops - No loop peeling for complex/nested loop dependencies - Simpler mental model (buffer ID + program order) - Add quick example comparison showing set_flag overhead vs get_buf simplicity - Update Example 2 and 3b with explicit mode=0 in code - Update comparison table with 'Loop peeling' row
- set_flag/wait_flag: 2 IDs per buffer (1 forward + 1 reverse pipe-pair) - get_buf/rls_buf: 1 ID per buffer (handles both directions automatically) - 8 per pipe-pair is HW limit, not a formula
- set_flag/wait_flag: 8 IDs per pipe-pair direction (HW limit) - get_buf/rls_buf: 1 buffer ID per shared resource (HW limit: 32 global), same ID used across all pipelines
- Event ID mgmt: each buffer occupies 1 ID per direction (removed misleading 4 IDs calc)
- Drain example: use concrete EVT_*_0/EVT_*_1 instead of {(N-1)%2} expressions
- 4 set_flag + 4 wait_flag (not 8) - 4 IDs = 2 pipe-pair directions × 2 ping/pong buffers
- set_flag/wait_flag: 1 MTE2 load, 8 Vector slices — must peel set/wait outside loop - get_buf/rls_buf: same pattern but acquire/release can stay inside or outside
- Acquire/release per slice inside loop - Iteration 0 blocks until MTE2 done, iterations 1-7 proceed immediately
Add the merged v0.3 PTO micro-instruction release spec document for A5, including ISA group references and updated synchronization notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce a three-pass pipeline that lowers PTO tile ops to vector-level
implementations via TileLang DSL templates:
- ExpandTileOp: invokes TileLang Python DSL to instantiate template
functions and replaces tile ops with func.call. SpecKey covers all
operands; tile_buf operands are passed through without bridging.
- PTOInlineLibCall: extended to recognize tilelang instance functions via
the attribute set by the DSL frontend.
- FoldTileBufIntrinsics: resolves pto.tile_buf_addr / tile_valid_rows /
tile_valid_cols, including dynamic valid-shape via pto.bind_tile chain
tracing.
- MemrefToTileBuf: recovers tile_buf types from memref + bind_tile
metadata after PlanMemory/InsertSync.
- PTOViewToMemref: insert pto.bind_tile anchors for tile_buf function
args so MemrefToTileBuf can recover them.
Adds new PTO ops (tile_buf_addr/tile_valid_rows/tile_valid_cols),
ptoas pipeline wiring, design docs, and unit tests.
(cherry picked from commit f6ba2c4)
* Add TABS tileop template * Add TEXP tileop template * Add TLOG tileop template * Add TNEG tileop template * Add TNOT tileop template * Add TRECIP tileop template * Add TRSQRT tileop template * Add TSQRT tileop template * Add TODOs for HIGH_PRECISION type * Specify supported dtypes for TRECIP & TRSQRT * Add license headers for uop tileop templates
a8c45f2 to
352427a
Compare
5e223fb to
42b74f9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.