[REFACTOR][MANIFEST] drop generative-op carrier-scalar; allow empty signature.inputs

## Description

### Symptom / Motivation
The current manifest schema enforces `len(signature.inputs) >= 1` via `test_every_signature_has_inputs_and_outputs`. Two reduction-family ops (`AlibiFwdOp`, `SinusoidalFwdOp`) synthesize their output entirely from construction-time scalar parameters — they take zero tensor inputs. To satisfy the schema, both manifest entries declare a fake `device_carrier` 0-dim scalar tensor as their sole input; the carrier exists only as a workaround. This same carrier is also threaded through `_register_generative_custom_op` and the validator carries a `is_generative` carve-out that detects the pattern via `ref_api == "none" + 0 positional args` and skips L1/C4 alignment checks. The hack documents itself in the manifest YAML preamble; the design doc previously enshrined it as well (removed in #1439).

The real defect is one schema invariant being too strict, not a missing "generative op kind". Every op has inputs — they are not always tensors. ALiBi and sinusoidal have integer + dtype params as their real inputs.

### Root Cause Analysis
- `scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputs` requires `len(signature.inputs) >= 1`. Generative ops physically violate this.
- `scripts/validate_manifest.py` threads `is_generative` through `check_l1` and L4 (lines around 542, 836-862, 3398-3408) to skip the inputs↔forward positional alignment. The flag is derived from a `ref_api == "none" + 0 positional` heuristic.
- `tileops/ops/elementwise/_base.py:_register_generative_custom_op` registers the op with `device_carrier: Tensor` as its first arg so torch.compile / `register_fake` has a tensor to trace.
- `tileops/ops/elementwise/alibi.py` and `tileops/ops/elementwise/sinusoidal.py` allocate `self._device_carrier = torch.empty((), dtype=dtype, device="cuda")` in `__init__` and pass it to the custom op call in `forward()`.
- `tileops/manifest/elementwise_generative.yaml` declares `device_carrier` as the sole input on both entries; output dtype uses `same_as(device_carrier)` to channel dtype through the carrier.

The cleanup vector is: relax the schema invariant, drop the carrier and its plumbing, simplify the validator.

### Related Files
- `scripts/validate_manifest.py`
- `tileops/ops/elementwise/_base.py`
- `tileops/ops/elementwise/alibi.py`
- `tileops/ops/elementwise/sinusoidal.py`
- `tileops/manifest/elementwise_generative.yaml`
- `tests/test_validate_manifest.py`
- `tests/ops/test_special_elementwise.py`

## Goal
Replace the carrier-scalar hack with the minimum-correct implementation that allows `signature.inputs: {}` for ops whose output is fully derived from construction-time params. After this change, no "generative-op" carve-out concept exists anywhere — the schema is permissive enough that the alignment checks become natural no-ops on empty input lists.

## Plan

1. Relax `scripts/validate_manifest.py:test_every_signature_has_inputs_and_outputs` from `len(signature.inputs) >= 1` to `len(signature.outputs) >= 1 AND (len(signature.inputs) >= 1 OR len(signature.params) >= 1)`. Add a test that an op with `inputs: {}` plus non-empty `params` is accepted.
2. Remove the `is_generative` parameter from `check_l1` and L4 in `scripts/validate_manifest.py`. Remove the `ref_api == "none" + 0 positional` heuristic detection (around lines 836-862 and 3398-3408). The inputs↔forward positional alignment loop becomes a natural no-op when `inputs == []`.
3. Remove all "Generative-op carve-out" comments from `scripts/validate_manifest.py`.
4. Update `tileops/manifest/elementwise_generative.yaml`: change `inputs: { device_carrier: ... }` to `inputs: {}` on both entries. Promote `dtype` from a fake `same_as(device_carrier)` channel to an explicit `params: dtype: {type: torch.dtype}` entry. Update output dtype expression accordingly (e.g., `same_as(dtype)` or direct reference). Delete the file's preamble comment block describing the carve-out.
5. Replace `tileops/ops/elementwise/_base.py:_register_generative_custom_op` with a tensor-input-less impl. Two acceptable approaches: (a) drop the custom_op registration entirely and have `forward()` call the kernel directly (eager-only; the ops cannot be `torch.compile`-graph-captured); or (b) register via `torch.library.impl` directly without going through dispatcher tracing. Pick (a) for this PR unless graph capture is required by a downstream consumer.
6. Delete `self._device_carrier` allocation in `tileops/ops/elementwise/alibi.py:__init__` and `tileops/ops/elementwise/sinusoidal.py:__init__`. Simplify `forward()` to call the kernel directly without passing a carrier.
7. Delete validator tests that assert the carve-out fires (in `tests/test_validate_manifest.py`); add one test confirming `inputs: {}` is accepted by the relaxed invariant.
8. Run `pytest tests/ops/test_special_elementwise.py` to confirm `AlibiFwdOp` / `SinusoidalFwdOp` still produce correct output without the carrier path.
9. Run `python scripts/validate_manifest.py` and `pytest tests/test_validate_manifest.py` — both pass.

## Constraints
- Joint manifest + validator + ops change. The schema relaxation and the impl changes that depend on it land in the same PR — splitting them would leave the manifest mid-state for one round.
- MUST NOT touch other manifest entries that currently use `ref_api: "none"` with real tensor inputs (the 11 attention.yaml entries and 3 elementwise_fused_gated.yaml entries). They are not affected.
- MUST NOT introduce a new "generative op kind" concept anywhere. The fix is schema relaxation, not categorization.
- MUST keep `AlibiFwdOp` / `SinusoidalFwdOp` working end-to-end. `tests/ops/test_special_elementwise.py` must stay green.
- Dropping torch.compile graph capture for ALiBi / sinusoidal is acceptable in this PR. If a downstream consumer requires it, file a separate follow-up; do not solve it here with another carrier-style placeholder.

## Acceptance Criteria
- [ ] Modified files pass unit tests (`pytest tests/test_validate_manifest.py tests/ops/test_special_elementwise.py`, all green on CUDA).
- [ ] `python scripts/validate_manifest.py` exits 0.
- [ ] `grep -rn 'device_carrier\|_register_generative_custom_op\|is_generative\|Generative-op carve-out\|generative-op carve-out' tileops/ scripts/ tests/` returns no matches (every trace of the hack is gone).
- [ ] `grep -rn 'signature.inputs' scripts/validate_manifest.py tests/test_validate_manifest.py` shows no `>= 1` invariant; the relaxed `outputs >= 1 AND (inputs >= 1 OR params >= 1)` is in place.
- [ ] `tileops/manifest/elementwise_generative.yaml` declares `inputs: {}` on `AlibiFwdOp` and `SinusoidalFwdOp`; `dtype` is a `param`; the file preamble has no carve-out narrative.
- [ ] A new test `test_signature_inputs_may_be_empty_when_params_present` (or equivalent) is added to `tests/test_validate_manifest.py` and passes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REFACTOR][MANIFEST] drop generative-op carrier-scalar; allow empty signature.inputs #1440

Description

Symptom / Motivation

Root Cause Analysis

Related Files

Goal

Plan

Constraints

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REFACTOR][MANIFEST] drop generative-op carrier-scalar; allow empty signature.inputs #1440

Description

Description

Symptom / Motivation

Root Cause Analysis

Related Files

Goal

Plan

Constraints

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions