[PerfXLab] Add and optimize gemm op by bin913 · Pull Request #2220 · flagos-ai/FlagGems

bin913 · 2026-04-02T15:00:57Z

PR Category

[ Operator]

Type of Change

[Performance Optimization]

Description

Add gemm op and optimize op operformance

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

test_blas_perf.py::test_gemm_benchmark 
Operator: gemm  Performance Test (dtype=torch.float16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.007616            0.008032               0.948          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               0.231040            0.225984               1.022          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.010784            0.011232               0.960          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.030272            0.030352               0.997          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               0.230784            0.233760               0.987          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]


Operator: gemm  Performance Test (dtype=torch.float32, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.017024            0.012448               1.368          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               2.660368            2.578048               1.032          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.064992            0.045856               1.417          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.350976            0.331296               1.059          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               2.667088            2.624864               1.016          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]


Operator: gemm  Performance Test (dtype=torch.bfloat16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.007616            0.008160               0.933          [torch.Size([384, 384]), torch.Size([384, 384])]
SUCCESS               0.213376            0.218624               0.976          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]
SUCCESS               0.010720            0.011200               0.957          [torch.Size([1024, 1024]), torch.Size([1024, 1024])]
SUCCESS               0.029600            0.029792               0.994          [torch.Size([2048, 2048]), torch.Size([2048, 2048])]
SUCCESS               0.212736            0.218960               0.972          [torch.Size([4096, 4096]), torch.Size([4096, 4096])]

you-and-you · 2026-04-03T09:42:03Z

benchmark/test_blas_perf.py

+    """
+    benchmark for gemm
+    """
+


Why is GemmBenchmark set to pass? Does the base class BlasBenchmark, with its default generated input shapes and calling method, achieve 100% compatibility with the standard gemm interface requirements? If it is compatible, it is recommended to add a comment line to clarify this.

bin913 added 3 commits March 29, 2026 21:11

optimize gemm

3b29e6e

add alpha and beta in test

a39dec1

fix stream_k

32b96fa

bin913 requested review from 0x45f, huangyiqun, kiddyjinjin and zhangpeiyang1 as code owners April 2, 2026 15:00

github-actions bot added benchmark ops/aten core tests vendor/NVIDIA size/Large labels Apr 2, 2026

huangyiqun self-assigned this Apr 3, 2026

you-and-you reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PerfXLab] Add and optimize gemm op#2220

[PerfXLab] Add and optimize gemm op#2220
bin913 wants to merge 3 commits intoflagos-ai:masterfrom
bin913:gemm_ok

bin913 commented Apr 2, 2026

Uh oh!

you-and-you Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bin913 commented Apr 2, 2026

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

you-and-you Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants