[Refactor][OPS] split tileops/ops/elementwise.py into per-cluster package#1388
[Refactor][OPS] split tileops/ops/elementwise.py into per-cluster package#1388lcy-seso merged 2 commits intotile-ai:testbedfrom
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a modular elementwise operation package, refactoring monolithic logic into specialized modules for activations, arithmetic, bitwise, and other operations. It implements a robust infrastructure with base classes and registration factories to support torch.compile. Feedback focuses on improving maintainability by reducing code duplication, specifically suggesting the centralization of the _expand_flat helper, simplifying the BinaryOp initialization to support flexible kernel arguments, and removing redundant method overrides. A simplification for the inner_size calculation in the PReLU operation was also proposed.
There was a problem hiding this comment.
Pull request overview
Refactors the elementwise ops family by replacing the large tileops/ops/elementwise.py monolith with a structured tileops/ops/elementwise/ package while preserving the existing public import surface (from tileops.ops.elementwise import ...) and torch.compile custom-op registrations.
Changes:
- Split elementwise ops into per-cluster modules under
tileops/ops/elementwise/and consolidated shared infrastructure into_base.py. - Added package-level
__init__.pythat explicitly re-exports previously-public symbols via__all__and registers all custom ops at import time. - Removed the legacy
tileops/ops/elementwise.pyfile.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tileops/ops/elementwise/init.py | Public re-exports + torch.compile custom-op registration loops for all elementwise ops. |
| tileops/ops/elementwise/_base.py | Shared umbrella Op bases, broadcasting helpers, fp8 helpers, and registration factory functions. |
| tileops/ops/elementwise/activations.py | Activation op classes split out of the monolith. |
| tileops/ops/elementwise/alibi.py | ALiBi generative op module. |
| tileops/ops/elementwise/arithmetic.py | Binary arithmetic ops module. |
| tileops/ops/elementwise/bitwise.py | Bitwise op module. |
| tileops/ops/elementwise/clamp.py | Clamp variants module (tensor-bound + scalar-bound). |
| tileops/ops/elementwise/comparison.py | Comparison ops module (bool outputs via Op-layer cast). |
| tileops/ops/elementwise/fused_gated.py | Fused gated ops module. |
| tileops/ops/elementwise/lerp_tensor.py | Tensor-weight lerp op module. |
| tileops/ops/elementwise/logical.py | Logical ops module. |
| tileops/ops/elementwise/masked_fill.py | MaskedFill variants module (scalar and tensor-value forms). |
| tileops/ops/elementwise/math_unary.py | Unary math ops module. |
| tileops/ops/elementwise/nan_to_num.py | NanToNum op module. |
| tileops/ops/elementwise/predicates.py | isnan/isinf/isfinite predicate ops module. |
| tileops/ops/elementwise/prelu.py | PReLU op module. |
| tileops/ops/elementwise/sinusoidal.py | Sinusoidal generative op module. |
| tileops/ops/elementwise/where.py | Where op module. |
| tileops/ops/elementwise.py | Deleted legacy monolithic module (replaced by package). |
There was a problem hiding this comment.
Code Review
This pull request introduces a new tileops/ops/elementwise package, refactoring elementwise operations into a modular structure with support for torch.compile via custom op registration. It includes various activation, arithmetic, bitwise, and comparison operations, along with shared infrastructure for broadcast coalescing and kernel registration. Feedback suggests consolidating duplicated _expand_flat utility methods and refactoring LerpFwdOp to reduce code duplication in __init__.
Ibuki-wind
left a comment
There was a problem hiding this comment.
Approved. The remaining review comments were scope-creep suggestions that would change constructor behavior or broaden the refactor beyond the file-layout-only constraint, so they were declined as out of scope.
Ibuki-wind
left a comment
There was a problem hiding this comment.
Approved. The remaining review comments were scope-creep suggestions that would change constructor behavior or broaden the refactor beyond the file-layout-only constraint, so they were declined as out of scope.
|
Nightshift orchestrator note: PR is technically ready (Reviewer PASS T015, Gatekeeper resolved all 9 unresolved bot threads). gpu-smoke result: 1908 pass, 1 fail, 46 skip — sole failure is the same pre-existing testbed break also blocking #1387 ( Verified not caused by this PR (PreluFwdOp source.op regression from initial split was fixed in commit Pipeline rounds used: 1 (after the round-0 PreluFwdOp fix). Scope: pure file-layout split (66 ops + 3 umbrellas → 16 cluster modules + _base.py). |
…e ops (#1392) ## Summary Fix gpu-smoke `test_every_op_has_at_least_two_workloads` failure by populating empty `workloads:` lists on 21 implemented binary elementwise ops (\`AddFwdOp\`, \`SubFwdOp\`, \`MulFwdOp\`, \`DivFwdOp\`, \`RemainderFwdOp\`, \`PowFwdOp\`, \`FloorDivideFwdOp\`, \`LerpFwdOp\`, \`MaximumFwdOp\`, \`MinimumFwdOp\`, \`Eq/Ne/Gt/Lt/Ge/LeFwdOp\`, \`LogicalAnd/OrFwdOp\`, \`BitwiseAnd/Or/XorFwdOp\`). This pre-existing testbed CI break has been blocking nightshift auto-merge for PRs #1387, #1388, #1389, #1391. Filed as issue #1390 — this PR resolves it. ## Plan executed Each affected op now declares two representative workloads: - **LLM hidden-state prefill**: \`input/other [2048, 4096]\` (no broadcast) - **CNN feature map**: \`input [16, 256, 56, 56]\`, \`other [256, 1, 1]\` (channel-wise broadcast) Dtype matrix matches each op's signature: - Float-only ops (\`Div\`, \`Pow\`, \`Lerp\`, \`Remainder\`, \`FloorDivide\`): \`float16, bfloat16\` - Bitwise ops (\`BitwiseAnd/Or/Xor\`): \`int32, int64\` - Logical ops (\`LogicalAnd/Or\`): \`bool\` - Default (other arithmetic + comparison): \`float16, bfloat16\` ## Test plan - [x] \`pytest tests/test_ops_manifest.py::TestOpSchema::test_every_op_has_at_least_two_workloads\` — passes - [x] \`pytest tests/test_validate_manifest.py tests/test_ops_manifest.py\` — 239 passed - [x] \`python scripts/validate_manifest.py\` — \"All manifest checks passed\" - [x] Pre-commit clean ## Acceptance Criteria Closes #1390. - AC-1: ✅ Modified files pass unit tests. - AC-2: ✅ \`AddFwdOp.workloads\` (and 20 siblings) now have 2 entries each, satisfying signature + broadcasting rules. - AC-3: ✅ \`scripts/validate_manifest.py\` exits 0 with no new warnings on the affected ops. - AC-4: To be verified by gpu-smoke on this PR's run. ## Trust model Manifest-only PR. No \`signature\`, \`roofline\`, \`params\`, \`shape_rules\`, or \`dtype_combos\` edits. Falls within the trust-model rule that workload edits without a status flip require a separate manifest-only PR with human review — this is that PR. --------- Co-authored-by: Ibuki 🍃 — a wind born from GPTs <Ibuki-wind@users.noreply.github.com>
b1d056b to
d068c02
Compare
Replace the 3415-line tileops/ops/elementwise.py monolith with a tileops/ops/elementwise/ package, one module per tight op cluster. Umbrella template classes (UnaryOp, BinaryOp, FusedGatedOp) and the shared registration / broadcast infrastructure live in _base.py. Package __init__.py re-exports every previously-public symbol with explicit __all__ + per-symbol relative imports, then runs the torch.library custom_op registration loops at import time so existing behaviour is preserved. Pure file-layout move: no class signature, kernel binding, or runtime behaviour change. Every previously-importable symbol stays reachable via from tileops.ops.elementwise import <Symbol>. Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <Ibuki-wind@users.noreply.github.com>
Update 66 manifest entries' source.op to point at the new per-cluster files under tileops/ops/elementwise/. Move LerpTensorFwdOp into arithmetic.py so it shares source.op with its variant_of primary LerpFwdOp (R16 cross-entry consistency). Co-Authored-By: Ibuki 🍃 — a wind born from GPTs <Ibuki-wind@users.noreply.github.com>
d068c02 to
25131e0
Compare
Closes #1378
Summary
tileops/ops/elementwise.pywith atileops/ops/elementwise/package, one module per cluster (16 cluster modules +_base.py).UnaryOp,BinaryOp,FusedGatedOpmoved to_base.py;__init__.pyre-exports every previously-public symbol via explicit__all__.Test plan