Operation Level
Other (Tensor-level and Block-level/tile-level)
Proposed Name & Signature
pl.cos(x: pl.Tensor) -> pl.Tensor
pl.sin(x: pl.Tensor) -> pl.Tensor
pl.tile.cos(x: pl.Tile) -> pl.Tile
pl.tile.sin(x: pl.Tile) -> pl.Tile
Semantics Description
Introduce first-class cos and sin interfaces for both tensor and tile values.
- Input requirements:
- Floating-point tensor/tile inputs (at least FP16/FP32); integer inputs should be rejected or explicitly cast.
- Output shape/type:
- Same shape/layout as input; dtype follows existing unary-op promotion rules.
- Mathematical definition:
y = cos(x) and y = sin(x) element-wise.
- Lowering requirement:
- During pass expansion, replace these ops with compositions of existing primitive ops instead of introducing mandatory backend intrinsics.
- A practical baseline is polynomial/Taylor-style approximation with range reduction (for numerical stability), implemented as reusable pass patterns.
- Edge cases:
- Define behavior for NaN/Inf consistently with existing math-op policy.
- Document approximation error bounds per dtype/range and fallback strategy when precision target cannot be met.
Example Usage
@pl.program
class TrigExample:
@pl.function
def main(self, x: pl.Tensor[[128, 128], pl.FP32]) -> pl.Tensor[[128, 128], pl.FP32]:
a = pl.cos(x)
b = pl.sin(x)
return a + b
# Tile-level usage sketch in a tile compute region
t = ... # pl.Tile[...] produced by tile.load or block compute
u = pl.tile.cos(t)
v = pl.tile.sin(t)
out = u + v
Hardware Support Considerations
- Should not require dedicated trig ISA support to be usable.
- Prefer pass-time decomposition to existing arithmetic primitives so CPU fallback and multiple NPUs can share semantics.
- If backend has native trig support, allow optional later canonicalization to backend-specific intrinsics.
Motivation / Use Case
- Trigonometric ops are fundamental for many kernels (signal processing, positional encoding variants, scientific workloads).
- Users currently need manual decomposition or out-of-graph preprocessing, which hurts readability and optimization opportunities.
- Providing both tensor and tile interfaces keeps API coverage consistent across frontend expression levels.
Additional Context
- Related implementation strategy: range reduction + polynomial approximation (e.g., truncated Taylor/Chebyshev-style forms).
- The exact approximation family and degree can be selected by pass policy (accuracy vs throughput).
Operation Level
Other (Tensor-level and Block-level/tile-level)
Proposed Name & Signature
Semantics Description
Introduce first-class
cosandsininterfaces for both tensor and tile values.y = cos(x)andy = sin(x)element-wise.Example Usage
Hardware Support Considerations
Motivation / Use Case
Additional Context