Skip to content

[New Op] Add tensor/tile cos and sin APIs with pass-time lowering to primitive ops #1289

@wuzhf9

Description

@wuzhf9

Operation Level

Other (Tensor-level and Block-level/tile-level)

Proposed Name & Signature

pl.cos(x: pl.Tensor) -> pl.Tensor
pl.sin(x: pl.Tensor) -> pl.Tensor

pl.tile.cos(x: pl.Tile) -> pl.Tile
pl.tile.sin(x: pl.Tile) -> pl.Tile

Semantics Description

Introduce first-class cos and sin interfaces for both tensor and tile values.

  • Input requirements:
    • Floating-point tensor/tile inputs (at least FP16/FP32); integer inputs should be rejected or explicitly cast.
  • Output shape/type:
    • Same shape/layout as input; dtype follows existing unary-op promotion rules.
  • Mathematical definition:
    • y = cos(x) and y = sin(x) element-wise.
  • Lowering requirement:
    • During pass expansion, replace these ops with compositions of existing primitive ops instead of introducing mandatory backend intrinsics.
    • A practical baseline is polynomial/Taylor-style approximation with range reduction (for numerical stability), implemented as reusable pass patterns.
  • Edge cases:
    • Define behavior for NaN/Inf consistently with existing math-op policy.
    • Document approximation error bounds per dtype/range and fallback strategy when precision target cannot be met.

Example Usage

@pl.program
class TrigExample:
    @pl.function
    def main(self, x: pl.Tensor[[128, 128], pl.FP32]) -> pl.Tensor[[128, 128], pl.FP32]:
        a = pl.cos(x)
        b = pl.sin(x)
        return a + b
# Tile-level usage sketch in a tile compute region
t = ...  # pl.Tile[...] produced by tile.load or block compute
u = pl.tile.cos(t)
v = pl.tile.sin(t)
out = u + v

Hardware Support Considerations

  • Should not require dedicated trig ISA support to be usable.
  • Prefer pass-time decomposition to existing arithmetic primitives so CPU fallback and multiple NPUs can share semantics.
  • If backend has native trig support, allow optional later canonicalization to backend-specific intrinsics.

Motivation / Use Case

  • Trigonometric ops are fundamental for many kernels (signal processing, positional encoding variants, scientific workloads).
  • Users currently need manual decomposition or out-of-graph preprocessing, which hurts readability and optimization opportunities.
  • Providing both tensor and tile interfaces keeps API coverage consistent across frontend expression levels.

Additional Context

  • Related implementation strategy: range reduction + polynomial approximation (e.g., truncated Taylor/Chebyshev-style forms).
  • The exact approximation family and degree can be selected by pass policy (accuracy vs throughput).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnew-operationNew tensor or block-level operation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions