Skip to content

Commit db3ba66

Browse files
authored
Bug fix and optimisation for persistent reduction kernel tuning (#2596)
Original PR (#2417) had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only. Tested locally on test_torchinductor: Ran 894 tests in 952.242s FAILED (failures=1, skipped=28) And completed autotune runs for microbench models Microbenchmark for network : resnet152 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.09107530117034912 Throughput [img/sec] : 702.7152167226226
1 parent 675f868 commit db3ba66

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

torch/_inductor/runtime/triton_heuristics.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2595,20 +2595,20 @@ def _persistent_reduction_configs(
25952595
elif reduction_hint == ReductionHint.OUTER:
25962596
configs = configs[-1:]
25972597

2598-
if reduction_hint == ReductionHint.OUTER_TINY:
2599-
tiny_configs = [
2600-
triton_config_reduction(
2601-
size_hints,
2602-
2 * (256 // rnumel) if rnumel <= 256 else 1,
2603-
rnumel,
2604-
)
2605-
]
2606-
if max_autotune_enabled:
2607-
for tconfig in tiny_configs:
2608-
if tconfig not in configs:
2609-
configs.append(tconfig)
2610-
else:
2611-
configs = tiny_configs
2598+
tiny_configs = [
2599+
triton_config_reduction(
2600+
size_hints,
2601+
2 * (256 // rnumel) if rnumel <= 256 else 1,
2602+
rnumel,
2603+
)
2604+
]
2605+
2606+
if max_autotune_enabled:
2607+
for conf in tiny_configs:
2608+
if conf not in configs:
2609+
configs.append(conf)
2610+
elif reduction_hint == ReductionHint.OUTER_TINY:
2611+
configs = tiny_configs
26122612

26132613
for c in configs:
26142614
# we don't need Rn_BLOCK for persistent reduction

0 commit comments

Comments
 (0)