Skip to content

[PerfXLab] Add and optimize gemm op#2220

Open
bin913 wants to merge 3 commits intoflagos-ai:masterfrom
bin913:gemm_ok
Open

[PerfXLab] Add and optimize gemm op#2220
bin913 wants to merge 3 commits intoflagos-ai:masterfrom
bin913:gemm_ok

Conversation

@bin913
Copy link
Copy Markdown
Contributor

@bin913 bin913 commented Apr 2, 2026

PR Category

[ Operator]

Type of Change

[Performance Optimization]

Description

Add gemm op and optimize op operformance

Issue

Progress

  • Change is properly reviewed (1 reviewer required, 2 recommended).
  • Change is responded to an issue.
  • Change is fully covered by a UT.

Performance

test_blas_perf.py::test_gemm_benchmark 
Operator: gemm  Performance Test (dtype=torch.float16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.007616            0.008032               0.948          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               0.231040            0.225984               1.022          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.010784            0.011232               0.960          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.030272            0.030352               0.997          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               0.230784            0.233760               0.987          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]


Operator: gemm  Performance Test (dtype=torch.float32, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.017024            0.012448               1.368          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               2.660368            2.578048               1.032          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.064992            0.045856               1.417          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.350976            0.331296               1.059          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               2.667088            2.624864               1.016          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]


Operator: gemm  Performance Test (dtype=torch.bfloat16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.007616            0.008160               0.933          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               0.213376            0.218624               0.976          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.010720            0.011200               0.957          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.029600            0.029792               0.994          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               0.212736            0.218960               0.972          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]

"""
benchmark for gemm
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is GemmBenchmark set to pass? Does the base class BlasBenchmark, with its default generated input shapes and calling method, achieve 100% compatibility with the standard gemm interface requirements? If it is compatible, it is recommended to add a comment line to clarify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants