[PerfXLab] optimize fill performance by bin913 · Pull Request #2216 · flagos-ai/FlagGems

bin913 · 2026-04-02T08:59:37Z

PR Category

[ Operator]

Type of Change

[ Performance Optimization]

Description

optimize fill.fill_scalar_ performance for fill

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

Operator: fill_scalar_  Performance Test (dtype=torch.float16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.653856            0.653792               1.000          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.005008            0.004992               1.003          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.015104            0.015296               0.987          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.015328            0.015328               1.000          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               0.654080            0.654224               1.000          [torch.Size([1024, 1024, 1024]), 3.14159]


Operator: fill_scalar_  Performance Test (dtype=torch.float32, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               1.305952            1.303360               1.002          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.005120            0.005024               1.019          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.025280            0.025376               0.996          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.025120            0.025216               0.996          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               1.306016            1.303264               1.002          [torch.Size([1024, 1024, 1024]), 3.14159]


Operator: fill_scalar_  Performance Test (dtype=torch.bfloat16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.654176            0.654080               1.000          [torch.Size([1073741824]), 3.14159]
SUCCESS               0.004992            0.005152               0.969          [torch.Size([64, 64]), 3.14159]
SUCCESS               0.015232            0.015296               0.996          [torch.Size([4096, 4096]), 3.14159]
SUCCESS               0.015488            0.015328               1.010          [torch.Size([64, 512, 512]), 3.14159]
SUCCESS               0.653888            0.653904               1.000          [torch.Size([1024, 1024, 1024]), 3.14159]

you-and-you · 2026-04-03T08:32:33Z

benchmark/test_tensor_constructor_perf.py

    # tensor constructor with given value
    ("fill_", torch.fill_, fill_input_fn),
+    ("fill_scalar_", torch.ops.aten.fill_.Scalar, fill_input_fn),
+    # ("fill_scalar_", flag_gems.ops.fill.fill_scalar_, fill_input_fn),


Why is the FlagGems benchmark for fill_scalar_ commented out?

bin913 added 3 commits March 30, 2026 19:43

add perf test for pow_scalar

8e749ab

add fill_scalar_ perf test

d08eeed

opt fill

f3cf0ff

bin913 requested review from 0x45f, huangyiqun, kiddyjinjin and zhangpeiyang1 as code owners April 2, 2026 08:59

github-actions bot added benchmark vendor/NVIDIA size/Medium labels Apr 2, 2026

huangyiqun self-assigned this Apr 3, 2026

you-and-you reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PerfXLab] optimize fill performance#2216

[PerfXLab] optimize fill performance#2216
bin913 wants to merge 3 commits intoflagos-ai:masterfrom
bin913:fill

bin913 commented Apr 2, 2026

Uh oh!

you-and-you Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bin913 commented Apr 2, 2026

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

you-and-you Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants