Description
Symptom / Motivation
The current manifest schema enforces len(signature.inputs) >= 1 via test_every_signature_has_inputs_and_outputs. Two reduction-family ops (AlibiFwdOp, SinusoidalFwdOp) synthesize their output entirely from construction-time scalar parameters — they take zero tensor inputs. To satisfy the schema, both manifest entries declare a fake device_carrier 0-dim scalar tensor as their sole input; the carrier exists only as a workaround. This same carrier is also threaded through _register_generative_custom_op and the validator carries a is_generative carve-out that detects the pattern via ref_api == "none" + 0 positional args and skips L1/C4 alignment checks. The hack documents itself in the manifest YAML preamble; the design doc previously enshrined it as well (removed in #1439).
The real defect is one schema invariant being too strict, not a missing "generative op kind". Every op has inputs — they are not always tensors. ALiBi and sinusoidal have integer + dtype params as their real inputs.
Root Cause Analysis
scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputs requires len(signature.inputs) >= 1. Generative ops physically violate this.
scripts/validate_manifest.py threads is_generative through check_l1 and L4 (lines around 542, 836-862, 3398-3408) to skip the inputs↔forward positional alignment. The flag is derived from a ref_api == "none" + 0 positional heuristic.
tileops/ops/elementwise/_base.py:_register_generative_custom_op registers the op with device_carrier: Tensor as its first arg so torch.compile / register_fake has a tensor to trace.
tileops/ops/elementwise/alibi.py and tileops/ops/elementwise/sinusoidal.py allocate self._device_carrier = torch.empty((), dtype=dtype, device="cuda") in __init__ and pass it to the custom op call in forward().
tileops/manifest/elementwise_generative.yaml declares device_carrier as the sole input on both entries; output dtype uses same_as(device_carrier) to channel dtype through the carrier.
The cleanup vector is: relax the schema invariant, drop the carrier and its plumbing, simplify the validator.
Related Files
scripts/validate_manifest.py
tileops/ops/elementwise/_base.py
tileops/ops/elementwise/alibi.py
tileops/ops/elementwise/sinusoidal.py
tileops/manifest/elementwise_generative.yaml
tests/test_validate_manifest.py
tests/ops/test_special_elementwise.py
Goal
Replace the carrier-scalar hack with the minimum-correct implementation that allows signature.inputs: {} for ops whose output is fully derived from construction-time params. After this change, no "generative-op" carve-out concept exists anywhere — the schema is permissive enough that the alignment checks become natural no-ops on empty input lists.
Plan
- Relax
scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputs from len(signature.inputs) >= 1 to len(signature.outputs) >= 1 AND (len(signature.inputs) >= 1 OR len(signature.params) >= 1). Add a test that an op with inputs: {} plus non-empty params is accepted.
- Remove the
is_generative parameter from check_l1 and L4 in scripts/validate_manifest.py. Remove the ref_api == "none" + 0 positional heuristic detection (around lines 836-862 and 3398-3408). The inputs↔forward positional alignment loop becomes a natural no-op when inputs == [].
- Remove all "Generative-op carve-out" comments from
scripts/validate_manifest.py.
- Update
tileops/manifest/elementwise_generative.yaml: change inputs: { device_carrier: ... } to inputs: {} on both entries. Promote dtype from a fake same_as(device_carrier) channel to an explicit params: dtype: {type: torch.dtype} entry. Update output dtype expression accordingly (e.g., same_as(dtype) or direct reference). Delete the file's preamble comment block describing the carve-out.
- Replace
tileops/ops/elementwise/_base.py:_register_generative_custom_op with a tensor-input-less impl. Two acceptable approaches: (a) drop the custom_op registration entirely and have forward() call the kernel directly (eager-only; the ops cannot be torch.compile-graph-captured); or (b) register via torch.library.impl directly without going through dispatcher tracing. Pick (a) for this PR unless graph capture is required by a downstream consumer.
- Delete
self._device_carrier allocation in tileops/ops/elementwise/alibi.py:__init__ and tileops/ops/elementwise/sinusoidal.py:__init__. Simplify forward() to call the kernel directly without passing a carrier.
- Delete validator tests that assert the carve-out fires (in
tests/test_validate_manifest.py); add one test confirming inputs: {} is accepted by the relaxed invariant.
- Run
pytest tests/ops/test_special_elementwise.py to confirm AlibiFwdOp / SinusoidalFwdOp still produce correct output without the carrier path.
- Run
python scripts/validate_manifest.py and pytest tests/test_validate_manifest.py — both pass.
Constraints
- Joint manifest + validator + ops change. The schema relaxation and the impl changes that depend on it land in the same PR — splitting them would leave the manifest mid-state for one round.
- MUST NOT touch other manifest entries that currently use
ref_api: "none" with real tensor inputs (the 11 attention.yaml entries and 3 elementwise_fused_gated.yaml entries). They are not affected.
- MUST NOT introduce a new "generative op kind" concept anywhere. The fix is schema relaxation, not categorization.
- MUST keep
AlibiFwdOp / SinusoidalFwdOp working end-to-end. tests/ops/test_special_elementwise.py must stay green.
- Dropping torch.compile graph capture for ALiBi / sinusoidal is acceptable in this PR. If a downstream consumer requires it, file a separate follow-up; do not solve it here with another carrier-style placeholder.
Acceptance Criteria
Description
Symptom / Motivation
The current manifest schema enforces
len(signature.inputs) >= 1viatest_every_signature_has_inputs_and_outputs. Two reduction-family ops (AlibiFwdOp,SinusoidalFwdOp) synthesize their output entirely from construction-time scalar parameters — they take zero tensor inputs. To satisfy the schema, both manifest entries declare a fakedevice_carrier0-dim scalar tensor as their sole input; the carrier exists only as a workaround. This same carrier is also threaded through_register_generative_custom_opand the validator carries ais_generativecarve-out that detects the pattern viaref_api == "none" + 0 positional argsand skips L1/C4 alignment checks. The hack documents itself in the manifest YAML preamble; the design doc previously enshrined it as well (removed in #1439).The real defect is one schema invariant being too strict, not a missing "generative op kind". Every op has inputs — they are not always tensors. ALiBi and sinusoidal have integer + dtype params as their real inputs.
Root Cause Analysis
scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputsrequireslen(signature.inputs) >= 1. Generative ops physically violate this.scripts/validate_manifest.pythreadsis_generativethroughcheck_l1and L4 (lines around 542, 836-862, 3398-3408) to skip the inputs↔forward positional alignment. The flag is derived from aref_api == "none" + 0 positionalheuristic.tileops/ops/elementwise/_base.py:_register_generative_custom_opregisters the op withdevice_carrier: Tensoras its first arg so torch.compile /register_fakehas a tensor to trace.tileops/ops/elementwise/alibi.pyandtileops/ops/elementwise/sinusoidal.pyallocateself._device_carrier = torch.empty((), dtype=dtype, device="cuda")in__init__and pass it to the custom op call inforward().tileops/manifest/elementwise_generative.yamldeclaresdevice_carrieras the sole input on both entries; output dtype usessame_as(device_carrier)to channel dtype through the carrier.The cleanup vector is: relax the schema invariant, drop the carrier and its plumbing, simplify the validator.
Related Files
scripts/validate_manifest.pytileops/ops/elementwise/_base.pytileops/ops/elementwise/alibi.pytileops/ops/elementwise/sinusoidal.pytileops/manifest/elementwise_generative.yamltests/test_validate_manifest.pytests/ops/test_special_elementwise.pyGoal
Replace the carrier-scalar hack with the minimum-correct implementation that allows
signature.inputs: {}for ops whose output is fully derived from construction-time params. After this change, no "generative-op" carve-out concept exists anywhere — the schema is permissive enough that the alignment checks become natural no-ops on empty input lists.Plan
scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputsfromlen(signature.inputs) >= 1tolen(signature.outputs) >= 1 AND (len(signature.inputs) >= 1 OR len(signature.params) >= 1). Add a test that an op withinputs: {}plus non-emptyparamsis accepted.is_generativeparameter fromcheck_l1and L4 inscripts/validate_manifest.py. Remove theref_api == "none" + 0 positionalheuristic detection (around lines 836-862 and 3398-3408). The inputs↔forward positional alignment loop becomes a natural no-op wheninputs == [].scripts/validate_manifest.py.tileops/manifest/elementwise_generative.yaml: changeinputs: { device_carrier: ... }toinputs: {}on both entries. Promotedtypefrom a fakesame_as(device_carrier)channel to an explicitparams: dtype: {type: torch.dtype}entry. Update output dtype expression accordingly (e.g.,same_as(dtype)or direct reference). Delete the file's preamble comment block describing the carve-out.tileops/ops/elementwise/_base.py:_register_generative_custom_opwith a tensor-input-less impl. Two acceptable approaches: (a) drop the custom_op registration entirely and haveforward()call the kernel directly (eager-only; the ops cannot betorch.compile-graph-captured); or (b) register viatorch.library.impldirectly without going through dispatcher tracing. Pick (a) for this PR unless graph capture is required by a downstream consumer.self._device_carrierallocation intileops/ops/elementwise/alibi.py:__init__andtileops/ops/elementwise/sinusoidal.py:__init__. Simplifyforward()to call the kernel directly without passing a carrier.tests/test_validate_manifest.py); add one test confirminginputs: {}is accepted by the relaxed invariant.pytest tests/ops/test_special_elementwise.pyto confirmAlibiFwdOp/SinusoidalFwdOpstill produce correct output without the carrier path.python scripts/validate_manifest.pyandpytest tests/test_validate_manifest.py— both pass.Constraints
ref_api: "none"with real tensor inputs (the 11 attention.yaml entries and 3 elementwise_fused_gated.yaml entries). They are not affected.AlibiFwdOp/SinusoidalFwdOpworking end-to-end.tests/ops/test_special_elementwise.pymust stay green.Acceptance Criteria
pytest tests/test_validate_manifest.py tests/ops/test_special_elementwise.py, all green on CUDA).python scripts/validate_manifest.pyexits 0.grep -rn 'device_carrier\|_register_generative_custom_op\|is_generative\|Generative-op carve-out\|generative-op carve-out' tileops/ scripts/ tests/returns no matches (every trace of the hack is gone).grep -rn 'signature.inputs' scripts/validate_manifest.py tests/test_validate_manifest.pyshows no>= 1invariant; the relaxedoutputs >= 1 AND (inputs >= 1 OR params >= 1)is in place.tileops/manifest/elementwise_generative.yamldeclaresinputs: {}onAlibiFwdOpandSinusoidalFwdOp;dtypeis aparam; the file preamble has no carve-out narrative.test_signature_inputs_may_be_empty_when_params_present(or equivalent) is added totests/test_validate_manifest.pyand passes.