Skip to content

feat(pto-gss): add A5 buf-id sync emission style to GraphSyncSolver#665

Open
zhangstevenunity wants to merge 8 commits into
mainfrom
claude/heuristic-greider-a67dd3
Open

feat(pto-gss): add A5 buf-id sync emission style to GraphSyncSolver#665
zhangstevenunity wants to merge 8 commits into
mainfrom
claude/heuristic-greider-a67dd3

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

Users can opt into the A5-only pto.get_buf / pto.rls_buf bracket form instead of pto.set_flag / pto.wait_flag via
--graph-sync-solver-sync-style=buf-id. The buf-id model is a sequential programming model: the hw (get_cnt, rel_cnt) scoreboard counters serialize same-id brackets across pipes and iterations, so the solver emits brackets at producer/consumer anchor RWOperations and the counter monotonicity handles both intra-iter (forward) and loop-carried (backward) deps without out-of-loop compensation, scf.if isFirstIter / isLastIter guards, or pto.barrier on A5.

Solver-internals reuse:

  • Conflict detection / coloring / barrier-all fallback all reused.
  • buf-id flips checkIntersect to "share any pipe -> conflict" because get_buf doesn't carry the partner pipe in the opcode; without that rule, two pipe-pairs sharing a pipe would alias to the same physical scoreboard and produce back-to-back get(P,#id) which violates the spec.
  • buf-id collapses the per-pipe-pair EventIdSolver map into a single instance (same trick cross-core already uses), so all pairs draw from one id pool.
  • getBeforeAfterSyncMaps adds a mirror-image dedup: forward F and backward B with the same unordered anchor set {(setOp,setPipe), (waitOp,waitPipe)} share the same brackets, so emitting both would waste an id. We keep the smaller-id pair and skip the mirror, which reproduces doc §1.2's canonical single-id for {load; vector} form.
  • All backward-sync hoist optimizations (tryMovingOutBackwardSyncPairs, considerOuterBackwardSyncPairs, mergeBackwardSyncPairs) short-circuit in buf-id mode — they exist to repair set/wait pairing across loop boundaries, which buf-id doesn't need.

CLI / pass option:
--graph-sync-solver-sync-style={set-wait (default) | buf-id}
PTOGraphSyncSolver pass option: sync-style="set-wait" | "buf-id"
buf-id requires --pto-arch=a5; non-A5 errors out.

Tests (all PASS, plus full 209-test PTO lit suite green):

  • graph_sync_solver_buf_id_basic.pto: forward MTE2->V->MTE3 chain, distinct ids on shared pipe (V) for two consumer/producer brackets.
  • graph_sync_solver_buf_id_arch_guard.pto: A3 + buf-id -> error.
  • graph_sync_solver_buf_id_backward.pto: loop with tload+tmuls+tstore; 3 ids cover 2 forward + 1 backward dep, no scf.if, no out-of-loop ops.
  • graph_sync_solver_buf_id_canonical_loop.pto: doc §1.2 canonical for {load; vector} form; single id #0 covers forward + backward.

Design doc: docs/designs/ptoas-graph-sync-solver-buf-id-design.md.

Users can opt into the A5-only `pto.get_buf` / `pto.rls_buf` bracket form
instead of `pto.set_flag` / `pto.wait_flag` via
`--graph-sync-solver-sync-style=buf-id`. The buf-id model is a sequential
programming model: the hw `(get_cnt, rel_cnt)` scoreboard counters
serialize same-id brackets across pipes and iterations, so the solver
emits brackets at producer/consumer anchor RWOperations and the counter
monotonicity handles both intra-iter (forward) and loop-carried (backward)
deps without out-of-loop compensation, scf.if isFirstIter / isLastIter
guards, or pto.barrier on A5.

Solver-internals reuse:
  * Conflict detection / coloring / barrier-all fallback all reused.
  * buf-id flips checkIntersect to "share any pipe -> conflict" because
    get_buf doesn't carry the partner pipe in the opcode; without that
    rule, two pipe-pairs sharing a pipe would alias to the same physical
    scoreboard and produce back-to-back get(P,#id) which violates the
    spec.
  * buf-id collapses the per-pipe-pair EventIdSolver map into a single
    instance (same trick cross-core already uses), so all pairs draw
    from one id pool.
  * getBeforeAfterSyncMaps adds a mirror-image dedup: forward F and
    backward B with the same unordered anchor set {(setOp,setPipe),
    (waitOp,waitPipe)} share the same brackets, so emitting both would
    waste an id. We keep the smaller-id pair and skip the mirror, which
    reproduces doc §1.2's canonical single-id `for {load; vector}` form.
  * All backward-sync hoist optimizations
    (tryMovingOutBackwardSyncPairs, considerOuterBackwardSyncPairs,
    mergeBackwardSyncPairs) short-circuit in buf-id mode — they exist
    to repair set/wait pairing across loop boundaries, which buf-id
    doesn't need.

CLI / pass option:
  --graph-sync-solver-sync-style={set-wait (default) | buf-id}
  PTOGraphSyncSolver pass option: sync-style="set-wait" | "buf-id"
  buf-id requires --pto-arch=a5; non-A5 errors out.

Tests (all PASS, plus full 209-test PTO lit suite green):
  * graph_sync_solver_buf_id_basic.pto: forward MTE2->V->MTE3 chain,
    distinct ids on shared pipe (V) for two consumer/producer brackets.
  * graph_sync_solver_buf_id_arch_guard.pto: A3 + buf-id -> error.
  * graph_sync_solver_buf_id_backward.pto: loop with tload+tmuls+tstore;
    3 ids cover 2 forward + 1 backward dep, no scf.if, no out-of-loop ops.
  * graph_sync_solver_buf_id_canonical_loop.pto: doc §1.2 canonical
    `for {load; vector}` form; single id #0 covers forward + backward.

Design doc: docs/designs/ptoas-graph-sync-solver-buf-id-design.md.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the 'Buf-ID' synchronization style for the PTOGraphSyncSolver, enabling get_buf and rls_buf bracketing on A5 architectures. The implementation includes new internal IR operations, updated conflict detection to enforce hardware constraints, and a mirror-image deduplication pass for optimized loop synchronization. The review feedback correctly identifies a typo in the design documentation and an outdated comment in the test suite regarding the implementation status of the deduplication logic.

Comment on lines +551 to +552
get_buf(MTE2, #id); load(); rel_buf(MTE2, #id)
get_buf(V, #id); Vector(); rel_buf(V, #id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo in this code block. rel_buf should be rls_buf to be consistent with the rest of the document and the operation name.

Suggested change
get_buf(MTE2, #id); load(); rel_buf(MTE2, #id)
get_buf(V, #id); Vector(); rel_buf(V, #id)
get_buf(MTE2, #id); load(); rls_buf(MTE2, #id)
get_buf(V, #id); Vector(); rls_buf(V, #id)

Comment on lines +13 to +18
// Doc form uses ONE id (the same scoreboard counter naturally enforces
// MTE2 -> V -> MTE2 -> V -> ... ordering). Our solver allocates F and B
// as two ConflictPairs and assigns them distinct ids — that's correct but
// uses two ids instead of one. Both forms satisfy the buf-id spec; the
// single-id form is a future merge optimization, not required for
// correctness.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment appears to be outdated. The mirror-image deduplication logic added in SyncSolver.cpp in this PR is designed to handle this exact case, resulting in a single ID being used for the forward and backward dependencies. The comment should be updated to reflect that this optimization is now implemented.

// Doc form uses ONE id (the same scoreboard counter naturally enforces
// MTE2 -> V -> MTE2 -> V -> ... ordering). The solver's mirror-image
// deduplication logic identifies that the forward (tload->tmuls) and
// backward (tmuls->tload) dependencies form a mirror pair and can be
// covered by a single buf-id, matching the canonical form.

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 13, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat(pto-gss): add A5 buf-id sync emission style to GraphSyncSolver #665 feat(pto-gss): add A5 buf-id sync emission style to GraphSyncSolver
  • Author: zhangstevenunity
  • Base/Head: main / claude/heuristic-greider-a67dd3
  • Head SHA: 829945ffee83
  • Trigger: PR 有新提交
  • Generated At: 2026-05-14T06:50:15Z
  • Previous Head SHA: f7156674310b
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 .../ptoas-graph-sync-solver-buf-id-design.md       | 711 +++++++++++++++++++++
 include/PTO/IR/PTOSyncUtils.h                      |   8 +
 .../PTO/Transforms/GraphSyncSolver/SyncSolver.h    |  28 +-
 .../PTO/Transforms/GraphSyncSolver/SyncSolverIR.h  |  47 ++
 include/PTO/Transforms/GraphSyncSolver/Utility.h   |  22 +
 include/PTO/Transforms/Passes.td                   |   6 +
 lib/PTO/IR/PTOSyncUtils.cpp                        |  19 +
 .../GraphSyncSolver/PTOGraphSyncSolver.cpp         |  31 +
 lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp  | 399 +++++++++++-
 .../GraphSyncSolver/SyncSolverCodeGen.cpp          |  30 +
 .../Transforms/GraphSyncSolver/SyncSolverIR.cpp    |  23 +
 .../pto/graph_sync_solver_buf_id_arch_guard.pto    |  44 ++
 test/lit/pto/graph_sync_solver_buf_id_backward.pto | 120 ++++
 test/lit/pto/graph_sync_solver_buf_id_basic.pto    | 156 +++++
 .../graph_sync_solver_buf_id_canonical_loop.pto    |  97 +++
 .../pto/graph_sync_solver_buf_id_cross_loop.pto    | 152 +++++
 .../graph_sync_solver_buf_id_if_both_branches.pto  |  92 +++
 .../lit/pto/graph_sync_solver_buf_id_if_branch.pto |  90 +++
 ...graph_sync_solver_buf_id_if_branch_consumer.pto |  82 +++
 .../graph_sync_solver_buf_id_two_buffer_loop.pto   | 114 ++++
 tools/ptoas/ptoas.cpp                              |   9 +
 21 files changed, 2245 insertions(+), 35 deletions(-)
===== END STAGE clone rc=0 @ 2026-05-14 14:50:07 =====

===== STAGE codex-review @ 2026-05-14 14:50:07 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019e253f-f199-76f1-84bd-1645e5d5d756
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#665 feat(pto-gss): add A5 buf-id sync emission style to GraphSyncSolver
作者:zhangstevenunity
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #665 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: 6e342540-7705-4ccd-9602-3bbf79e62f0e)
Reconnecting... 2/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: b910ce4e-0d71-4cc2-bf4a-4c12ba23916d)
Reconnecting... 3/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: 66b44875-048c-4442-a063-1393e0ea0563)
Reconnecting... 4/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: 35f63041-8979-4ddc-8bbc-2cf2e6e55354)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: 63bcfb95-8eca-4ff8-9367-92d273900eaf)
ERROR: unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, request id: cd2f7eb3-7e91-4aaa-a698-d3455d392ba4
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260514_145003_pr665/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-05-14 14:50:15 =====

zhangstevenunity and others added 5 commits May 13, 2026 16:36
The (get_cnt, rel_cnt) scoreboard counters monotonically advance per
forward bracket. Iter i+1's get_buf is therefore automatically ordered
after iter i's matching rls_buf — for both forward and backward edges of
the same id. Set/wait needs an explicit pair across the loop edge
(form 1 or form 2) because each set/wait flag is a one-shot handshake,
but buf-id's counters make backward sync a free side-effect of the
forward bracket. Emitting separate buf-id brackets for the backward
ConflictPair just wastes an id and serializes more than the spec
requires.

Implementation: skip ConflictPair->isInnerBackward in
getBeforeAfterSyncMaps when emitStyle == BUF_ID. The pair still gets an
id from EventIdSolver (the allocator already ran), but the id is unused.
Phase-2 can hoist the filter earlier (handleSetWaitConflict) to recover
the wasted ids if pool pressure becomes a problem.

Updates graph_sync_solver_buf_id_backward.pto CHECKs accordingly and
expands the design doc §2.4 / §3.6 with the rationale.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Previous "drop backward at emit" was over-aggressive — it broke the
single-buffer loop case (case 2 from the doc): tload, tmuls, tstore all
on the same tile would get distinct forward ids, and iter i+1's tload
would only chain through tmuls's id, leaving a race against iter i's
tstore on the shared tile.

The right model: backward analysis is still needed to learn buffer
sharing, but the result drives **id allocation** (which forward pairs
must reuse a single id to chain through the same scoreboard counter),
not bracket emission. The buf-id (get_cnt, rel_cnt) chain then
transitively serializes iter i+1 after iter i for free.

Implementation in getBeforeAfterSyncMaps (buf-id path):
  * union-find over forward ConflictPairs; F1 and F2 are unioned iff
    they share an anchor AND some backward B has F1's and F2's
    non-shared endpoints as its anchors. The backward edge between
    those two endpoints implies they're on the same buffer.
  * Each union group gets a single shared bufId (smallest id among
    members' allocated ids).
  * Emit one bracket pair per unique (anchor_op, pipe, bufId) triple
    (a shared anchor across multiple ConflictPairs in the same group
    must dedup to satisfy spec constraint 1).
  * Backward pairs never emit brackets — their semantic is already
    captured by the shared-id chain.

Output diff:
  * canonical_loop (2-pipe, single buf): still single id #0 (unchanged).
  * backward_loop (3-pipe, single tile): now collapses to a single id
    #0 across tload/tmuls/tstore (was 2 ids with race).
  * basic.linear_chain (no loop, no backward): two ids #0/#1
    (unchanged — no bridging backward to force merge).
  * two_buffer_loop (new test, doc §1.3 case): two ids on the two
    independent buffer chains, vadd has two brackets — exactly the
    doc form.

All 210 lit tests PASS (one new test added for the two-buffer pingpong
case).

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…d setOp

The graph sync solver picks `setOp` / `waitOp` for each ConflictPair via
LCA / placeholder hoisting (e.g. when the producer lives inside an
scf.if-then branch and the consumer outside, setOp lands at the parent
block, not at the conditional op). For set/wait this is correct — the
flag signal needs to be visible along every CFG edge so the pairing
holds.

For buf-id, hoisting the get_buf/rls_buf bracket above an scf.if is
semantically muddled: the counter advance is decoupled from whether the
guarded pipe op actually executes. When the conditional path is taken,
get_cnt/rel_cnt get bumped without a corresponding real pipe op. The
chain still happens to work in simple cases (a consumer outside the
branch waits on a phantom counter), but the model loses its "bracket
the actual op" invariant and gets confusing on more complex CFG shapes.

Switch buf-id emission to anchor at the raw RWOperation (`op1`, `op2`)
instead of the solver's `setOp` / `waitOp`. Both mirror dedup and the
final bracket emission now use op1/op2 directly. Result: brackets land
tightly around the actual pipe op, inside whatever scope contains it
(scf.if, scf.for, etc).

New regression: graph_sync_solver_buf_id_if_branch.pto exercises a
tload inside an scf.if with the tmuls outside. The fixed emit places
get_buf(TLOAD,#0) and rls_buf(TLOAD,#0) inside the scf.if (around the
tload) and get_buf(TVEC,#0)/rls_buf(TVEC,#0) at the tmuls.

All 211 lit tests PASS.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Symmetric companion to graph_sync_solver_buf_id_if_branch.pto: producer
runs unconditionally, consumer (tmuls) lives inside an scf.if-then. The
buf-id emit anchors brackets at the raw RWOperations, so:
  * MTE2 get/rls bracket sits tight around tload, OUTSIDE the scf.if.
  * V   get/rls bracket sits tight around tmuls, INSIDE the scf.if.
  * Both brackets share buf id #0 (single buffer / single chain).

The conditional consumer simply skips its bracket when cond=false — the
counter doesn't get an unmatched bump, and the model stays clean.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Two complementary cross-loop sync shapes:

Case A: producer inside the loop, consumer after.
  for i: tload (src_i -> tile)
  tmuls (tile)                  # depends on last iter's tload

Case B: two adjacent loops feeding through one shared tile.
  for i: tload (src_i -> tile)
  for j: tmuls (tile)           # depends on first loop's final output

Both should produce brackets anchored at the raw RWOperations:
  * tload bracket inside the loop body — once per iter, counter advances
    N times in sequence.
  * tmuls bracket at its actual location (after the loop in case A,
    inside the consumer loop body in case B), NOT at a loop boundary.
  * Same buf id #0 chains producer and consumer via counter monotonicity.

Verifies the anchor-at-raw-RWOp fix (de7fe04) generalizes from scf.if to
scf.for: even when the solver would hoist setOp/waitOp to a loop
placeholder for set/wait, buf-id ignores that and brackets the actual
op.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@zhangstevenunity zhangstevenunity marked this pull request as ready for review May 13, 2026 09:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 780ad32a35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/*isRegBasedArch=*/isA5);
opts.eventIdNumMax = eventIdNumMax;
opts.emitStyle = emitStyle;
if (emitStyle == SyncEmitStyle::BUF_ID) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip same-pipe barriers in buf-id mode

When buf-id is selected here, same-pipe hazards are still sent through the existing handleBarrierConflict path; that path only suppresses A5 barriers for PIPE_V/PIPE_M, so legal A5 cases such as two TLOAD/MTE2 operations on the same buffer can still produce pto.barrier even though this option and the pass docs promise that buf-id/A5 emits no barriers. This makes the new mode generate unsupported/unwanted sync for same-pipe MTE2/MTE3/FIX conflicts unless the option also disables barrier conflicts for all pipes in buf-id mode.

Useful? React with 👍 / 👎.

zhangstevenunity and others added 2 commits May 14, 2026 09:58
… share an id

Two ConflictPairs that anchor inside disjoint scf.if branches never execute
concurrently, so the spec-constraint-1 ("two consecutive get_buf(P,#id)
before a single op") cannot apply: the two brackets are on different
control-flow paths and the hw counter sees only one of them per
iteration. The previous checkIntersect rule for buf-id ("share any pipe
-> conflict") incorrectly forced distinct ids on such pairs, which:
  * wasted id budget when the id pool was tight, falling back to
    barrier-all sooner than necessary;
  * for the gemm.pto case with --event-id-max=8 still bracketed both
    branches correctly, but with --event-id-max smaller the else-branch
    was being collapsed away under id pressure.

Add a `mutuallyExclusive` predicate in checkIntersect's buf-id branch:
walk both pairs' ancestor Conditions and detect if they require disjoint
true/false branches of the same Condition. If yes, drop the conflict
edge so the coloring is free to share an id.

Regression: graph_sync_solver_buf_id_if_both_branches.pto runs with
--graph-sync-solver-event-id-max=1 and verifies that both then- and
else-branches get fully-bracketed forward-dep brackets sharing id #0.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…itive pruning

For set/wait, checkGraphConflict's transitive pruning treats two ConflictPairs
on the same (pipeSrc, pipeDst) as transitively covering each other via
same-pipe FIFO ordering. That's safe because flag-based sync is fundamentally
"signal the LAST producer, wait at the consumer" — issue-queue order takes
care of the rest.

For buf-id this is unsound: each buf-id is an independent scoreboard counter,
so a covering pair on the same pipe-pair but a DIFFERENT buffer does not
transitively sync the candidate's counter. Concretely, when two textracts on
PIPE_MTE1 feed one tmatmul.acc on PIPE_M but on distinct left/right tile
buffers, the solver previously pruned one of the two textract->tmatmul pairs;
the tmatmul ended up with only one consumer bracket, missing the wait on the
other textract.

Plumb the underlying buffer through:
  * checkMemoryConflicts now returns
      SmallVector<tuple<CorePipeInfo, CorePipeInfo, mlir::Value>>
    where the Value identifies the SSA buffer that triggered the conflict.
    Multiple conflicting buffers at the same pipe-pair yield one tuple each.
  * processConflict iterates the tuples and forwards the buffer through
    handleConflict / handleBarrierConflict / handleSetWaitConflict.
  * ConflictPair gains a `conflictBuffer` field; clone() preserves it.
  * checkGraphConflict gains a `conflictBuffer` parameter. In buf-id mode
    (when the candidate has a non-null buffer) it filters out existing
    pairs whose `conflictBuffer` is non-null and different — those pairs
    cannot transitively cover the candidate under independent counters.
    Null buffers fall through, preserving legacy set/wait behavior.

For the gemm.pto kernel that triggered the bug, the first three matmul.acc
calls in the inner else-branch now have two M-pipe brackets each — one per
textract producer (left + right tile) — exactly matching the user's
expected form A,B->C on distinct buffers yielding distinct buf-ids.

214 lit tests still pass.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants