Skip to content

[Maintain] review-loop signal gating + logical kernel parity + manifest generative carve-out#1439

Merged
lcy-seso merged 20 commits into
mainfrom
testbed
May 13, 2026
Merged

[Maintain] review-loop signal gating + logical kernel parity + manifest generative carve-out#1439
lcy-seso merged 20 commits into
mainfrom
testbed

Conversation

@lcy-seso
Copy link
Copy Markdown
Collaborator

@lcy-seso lcy-seso commented May 12, 2026

Summary

  • loop.sh re-fires whenever the PR body, label set, or any non-author comment changes; convergence requires signature_diff_reason to be empty AND no fresh signal since the prior round.
  • New signals.sh consolidates body / label / comment fingerprinting; labels_hash is over compact JSON so comma-in-name labels can't collide; agent-stuck (loop-owned) is excluded from the hash.
  • tests/ops/test_logical.py adds int / bool / bfloat16 parity coverage for LogicalAnd / LogicalOr, parametrized as a (op_cls, dtype) matrix.
  • scripts/validate_manifest.py distinguishes forward() introspection failure from zero positional args; L1 generative detection uses the same positional-filter as the C4 check.
  • tileops/manifest/elementwise_generative.yaml (Alibi, Sinusoidal) and tileops/manifest/elementwise_fused_gated.yaml (SiluAndMul, GeluAndMul, GeluTanhAndMul) added with status: spec-only; orphan elementwise ops also declared spec-only;

Test plan

  • pre-commit run --files <changed> passed on every commit.
  • pytest tests/test_validate_manifest.py — 218 passed.
  • python scripts/validate_manifest.py exits 0.
  • pytest tests/ops/test_logical.py — green.

Test node delta

File                         Base    HEAD    Delta
--------------------------------------------------
tests/ops/test_logical.py      19      33      +14
--------------------------------------------------
TOTAL                          19      33      +14

+14 nodes: int / bool / bfloat16 parity coverage for LogicalAnd / LogicalOr, parametrized as a (op_cls, dtype) matrix replacing the prior single-dtype tests.

Follow-up

The new elementwise_generative.yaml entries currently declare a device_carrier placeholder input and the validator carries an is_generative carve-out — a workaround so 0-tensor-input ops satisfy signature.inputs >= 1. Cleanup tracked in #1440: relax the schema invariant, drop the carrier + carve-out, simplify the registration path.

lcy-seso and others added 17 commits May 12, 2026 20:17
Add manifest entries for AlibiFwdOp, SinusoidalFwdOp, SiluAndMulFwdOp,
GeluAndMulFwdOp, GeluTanhAndMulFwdOp. These are TileOPs-private fused /
generative kernels with no single torch.* counterpart (ref_api: "none").
Statuses start at spec-only; the manifest now covers what previously was
an orphan code-only surface.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Add inline comment clarifying that ``device_carrier`` is modelled as a
manifest input only because the existing schema invariant
(``test_every_signature_has_inputs_and_outputs``) requires
``len(signature.inputs) >= 1``. The carrier itself remains an
implementation detail of ``_register_generative_custom_op``.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
AlibiFwdOp and SinusoidalFwdOp need schema-level generative-op support
(relax len(inputs)>=1 invariant and forward-arity check, represent dtype
via workloads). That work belongs in a separate human-reviewed PR per
manifest-trust-model.md and is tracked in #1429. The remaining three
fused-gated ops (SiluAndMul, GeluAndMul, GeluTanhAndMul) cover the rest
of the manifest declarations from #1415 and pass full + --check-op
validation cleanly.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
…soidal

L1 and C4 forward()-arity checks now skip when ``ref_api: "none"`` and
``forward()`` takes zero positional args. The five elementwise ops
without manifest entries -- AlibiFwdOp, SinusoidalFwdOp, plus the three
fused-gated entries already present -- are all covered by the manifest;
the two generative ops carry a single ``device_carrier`` scalar input to
satisfy ``test_every_signature_has_inputs_and_outputs`` while the new
carve-out unblocks the forward()-order check.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
… filter

The L1 carve-out at check_l1_pyfile_signature used _get_forward_params,
which counts KEYWORD_ONLY params. check_c4_forward_signature_parity
filters to POSITIONAL_ONLY / POSITIONAL_OR_KEYWORD before applying its
ref_api == none + zero-positional-args carve-out. A forward(self, *,
dtype=None) class therefore diverged: L1 emitted a [signature] error
while C4 returned []. Extract _forward_positional_params and use it from
both sites so the two carve-outs stay in lockstep.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
AC-5 forbids modifying tests/ in this PR; revert the two added tests
for _forward_positional_params. The validator helper change in
scripts/validate_manifest.py stays in place.
…wise families

- Convert signature shape values from YAML lists to string form ("[M, N]")
  in elementwise_generative.yaml and elementwise_fused_gated.yaml to match
  docs/design/manifest.md R8 + the convolution.yaml precedent so the
  validator can bind shape symbols.
- Add per-input workload shape keys (device_carrier_shape, x_shape) to
  every workload row to satisfy the workload schema documented in
  docs/design/manifest.md and used across normalization.yaml etc.
- Restore exception class/message in check_c4_forward_signature_parity's
  warning when inspect.signature(forward) raises.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
… positional args

`_forward_positional_params(cls)` returns ``None`` when
``inspect.signature(forward)`` raises, and ``[]`` when forward() has zero
positional args. The L1 generative-op carve-out used ``or []`` which
coalesced both cases to falsy, so an introspection failure silently
satisfied ``not positional_forward_params`` and skipped L1 alignment for
any ``ref_api: "none"`` op.

Use explicit ``positional is not None and len(positional) == 0`` at the
L1 site; mirror the explicit ``len == 0`` form at the C4 site so the two
carve-out predicates stay visually identical and an introspection-failed
class can never bypass either check.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
…gicalOr

LogicalAndFwdKernel / LogicalOrFwdKernel already declare the full
manifest dtype union (bool/uint8/int8/int16/int32/int64 plus fp16/bf16/
fp32). The kernel's non-zero truthiness path works on every integral
dtype, but tests/ops/test_logical.py only covered the float lanes.

Add three decoupled axes following the established pattern in
test_comparison.py and test_binary_arith.py:

- dtype-axis x LogicalAndFwdOp: every manifest-declared int dtype.
- op-axis x int32: every binary logical op.
- bool-axis: every binary logical op on torch.bool.

Oracle is torch.logical_and / torch.logical_or; no hand-coded goldens.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
…matrix

Replace the decoupled dtype-axis (LogicalAnd over every int dtype) and
op-axis (both ops at fixed int32) tests with a single parametrize over
the (op_cls, dtype) product so every (op in {LogicalAnd, LogicalOr}) x
(dtype in uint8 / int8 / int16 / int32 / int64 / bool) cell exercises
PyTorch parity.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Manifest dtype union for LogicalAndFwdOp / LogicalOrFwdOp includes
bfloat16; the float smoke fixtures only covered float16 + float32.
Extend both fixtures so per-dtype parity tests exist for every cell in
the manifest dtype union.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
The review-tileops loop's idle detector keyed on `head_sha` alone, so a
PR whose only outstanding blocker was metadata-shaped (description /
labels / a thread reply) wedged forever once the Gatekeeper resolved it
without a code push. Extend the trigger signature to: HEAD sha, PR body
hash (CRLF-normalized), label set hash (order-independent), non-reviewer
issue/review comment ids, and inbox.md presence. Any of those changing
fires a fresh round; the log line names which signal fired.

Body / labels are fetched in the existing `gh pr view` call, so the
per-tick GitHub API cost is unchanged. Hashes are persisted in
`meta.json` alongside the existing watermarks; a legacy meta without
the new fields treats the body / label hashes as "unset" so the first
post-upgrade poll does not spuriously fire.

Unit tests cover signature_diff_reason priority, hash determinism, and
the idle-vs-fire decision for every AC.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
The APPROVE convergence fast-path only inspected head / comments /
inbox, so a body- or label-only edit after an approved round silently
short-circuited to converge_and_exit without re-running review. Reuse
TRIGGER_REASON (the same single-source-of-truth diff the idle path
checks) for the convergence guard: converge iff GitHub still shows
APPROVED and TRIGGER_REASON is empty; otherwise fall through to a
fresh review pass.

Extend test_review_idle_decision.sh with the APPROVE branch and a
regression case for the body-edit-after-approve path.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Both post-codex APPROVE and Rule-1 skipped-APPROVE branches re-snapshot
head only and ignore body / label / non-reviewer-comment edits arriving
during the round. Convergence is terminal, so any dropped signal is lost
forever. Funnel both branches through post_approve_trigger_reason, which
re-snapshots HEAD + body + labels + non-reviewer comments + inbox and
runs the same signature_diff_reason helper the idle / pre-loop APPROVE
guards already use. Add regression tests for each of the three converge
sites.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
pr_labels_hash previously joined sorted label names with ","; two
distinct label sets (e.g. ["a,b","c"] vs ["a","b,c"]) collapsed to the
same hash input and the review loop could miss a real label change.
Serialize the sorted names as compact JSON instead.

Also make sha256_text portable: prefer GNU `sha256sum`, fall back to
BSD/macOS `shasum -a 256`.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@lcy-seso lcy-seso requested review from a team and Copilot May 12, 2026 12:23
@github-actions github-actions Bot added the maintain Ongoing monitoring, health tracking, and operational maintenance label May 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the review loop's trigger logic by incorporating PR body and label changes into the signature diffing process, ensuring that updates to these fields prompt a re-review. It also introduces a 'generative-op carve-out' in the manifest validation system to accommodate operations like ALiBi and Sinusoidal encodings that do not take positional input tensors. Additionally, new manifest definitions for fused gated activations and generative ops are added, and logical operation tests are expanded to include integral and boolean dtypes. Feedback was provided regarding the trigger logic in signals.sh, noting that gating on non-empty previous values may prevent the loop from firing when body or label tracking is first initialized for a PR.

Comment thread .claude/skills/review-tileops/signals.sh
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges a set of “testbed” improvements into main, spanning manifest/validator support for generative elementwise ops, expanded logical-op dtype testing, and review-loop robustness improvements for .claude/skills/review-tileops.

Changes:

  • Added manifest entries for elementwise generative ops (ALiBi, sinusoidal) and fused gated activations, plus documentation for the “generative-op carve-out”.
  • Updated validate_manifest.py to detect generative ops via zero positional forward() args and to reuse consistent positional-param introspection across L1 and C4.
  • Expanded logical op tests to cover bfloat16 smoke cases and an int/bool dtype correctness matrix for LogicalAnd/LogicalOr.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tileops/manifest/elementwise_generative.yaml Adds spec-only manifest entries for ALiBi and sinusoidal generative elementwise ops using a device/dtype carrier input.
tileops/manifest/elementwise_fused_gated.yaml Adds spec-only manifest entries for fused gated activation ops (SwiGLU/GELU variants).
tests/ops/test_logical.py Extends test coverage to bfloat16 smoke and full int/bool matrix correctness for binary logical ops.
scripts/validate_manifest.py Implements generative-op carve-out and shared helper for extracting positional forward params; improves introspection-failure diagnostics.
docs/design/manifest.md Documents the generative-op carve-out behavior and rationale.
.claude/skills/review-tileops/signals.sh Introduces pure helper functions to compute stable “fresh round” trigger signals (head/body/labels/comments/inbox).
.claude/skills/review-tileops/loop.sh Integrates the new trigger-signal helpers and extends loop state tracking to include body/label hashes and post-APPROVE stability rechecks.

Comment thread .claude/skills/review-tileops/signals.sh Outdated
Comment thread .claude/skills/review-tileops/loop.sh
@lcy-seso lcy-seso changed the title [Maintain] sync testbed into main (17 commits) [Maintain] review-loop signal gating + logical kernel parity + manifest generative carve-out May 13, 2026
The deleted section described an implementation workaround (carrier
scalar to satisfy the >=1-input invariant) as if it were a design
decision. The hack and its mechanism live in validator code + manifest
YAML comments where they belong; design docs should not document
transient impl details.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Copy link
Copy Markdown
Contributor

@Ibuki-wind Ibuki-wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

The code changes are close, but the PR body still needs to be normalized to the final-state template and include a ## Test node delta entry before approval.

Comment thread .claude/skills/review-tileops/signals.sh Outdated
Comments describing the generative-op workaround framing (mentions of
the schema invariant being satisfied by a carrier, references to a
"carve-out" pattern) age out the moment the workaround is removed.
Tightened to just the mechanical condition and the invariant the code
itself preserves.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Copilot AI review requested due to automatic review settings May 13, 2026 03:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Comment thread .claude/skills/review-tileops/loop.sh
Comment thread .claude/skills/review-tileops/loop.sh
Comment thread .claude/skills/review-tileops/signals.sh Outdated
Comment thread .claude/skills/review-tileops/signals.sh
Comment thread scripts/validate_manifest.py
Comment thread .claude/skills/review-tileops/loop.sh
Comment thread tileops/manifest/elementwise_generative.yaml
- Drop stale tests/test_review_signals.sh reference from signals.sh
  header (file was removed; standalone test does not exist).
- Exclude loop-owned label `agent-stuck` from pr_labels_hash so
  round-post.sh's own write does not surface as a fresh label signal
  on the next poll.
- On the idle path, seed last_body_hash / last_labels_hash from the
  current PR values when meta.json carries the empty-string sentinel.
  Without this, an upgraded or freshly-tracked PR keeps the prev hash
  anchored at "" and signature_diff_reason's empty-prev guard
  suppresses every subsequent body/label edit.
- Include current PR body + label set in the round prompt when the
  trigger is `body changed` or `labels changed`. Diff is empty in
  that case; without the actual content, the reviewer has nothing to
  inspect.

Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <[email protected]>
Copy link
Copy Markdown
Contributor

@Ibuki-wind Ibuki-wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

The code changes look good, but the PR body still does not match the final-state template and is missing the verification facts required by the approval gate. Please normalize it before merge.

@Ibuki-wind
Copy link
Copy Markdown
Contributor

Main issues preventing approval:\n\n1. The PR body still does not follow the final-state template. It should be rewritten into the standard ## Summary / ## Test plan format, with the process/history text removed.\n2. The body needs verification facts, not just a changelog. Please include the concrete validation results you ran, including at least pre-commit, validate_manifest, gpu-smoke, and test_node_delta.\n3. The current body contains factual drift: it says labels_hash uses an explicit separator, but the implementation now hashes a compact JSON array and excludes agent-stuck; it also says pending-schema generative ops were dropped, while the PR still adds AlibiFwdOp and SinusoidalFwdOp.\n\nThe code changes themselves look fine to me; this comment is only about the PR body.

@Ibuki-wind
Copy link
Copy Markdown
Contributor

The PR body is much closer now. One remaining body issue: the Summary still says “pending-schema generative ops dropped,” but the final merged diff adds tileops/manifest/elementwise_generative.yaml with AlibiFwdOp and SinusoidalFwdOp. That phrase reads like intermediate process history / factual drift rather than final-state summary. Please remove it or replace it with the final-state fact only.

Copy link
Copy Markdown
Contributor

@Ibuki-wind Ibuki-wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean — no issues.

@lcy-seso lcy-seso merged commit 24a036c into main May 13, 2026
27 of 28 checks passed
@lcy-seso lcy-seso deleted the testbed branch May 13, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintain Ongoing monitoring, health tracking, and operational maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants