You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug fix and optimisation for persistent reduction kernel tuning (#2596)
Original PR (#2417) had incorrect
indentation. Updated PR such that autotune will always add tiny configs,
otherwise use the hinted configs only.
Tested locally on test_torchinductor:
Ran 894 tests in 952.242s
FAILED (failures=1, skipped=28)
And completed autotune runs for microbench models
Microbenchmark for network : resnet152
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.09107530117034912
Throughput [img/sec] : 702.7152167226226
0 commit comments