Skip to content

perf: optimize sample_floyd by unsafe APIs #1622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Unparalleled-Calvin
Copy link

@Unparalleled-Calvin Unparalleled-Calvin commented Apr 3, 2025

  • Added a CHANGELOG.md entry

Summary

This PR uses unsafe APIs to boost performance of sample_floyd. The optimization is totally safe because the index is bounded by the length of the vec.

Motivation

Rust's bounds checking are sometimes unnecessary. Removing bounds checking by unsafe APIs can boost its performance.This optimization makes related functions more faster with safety ensured.

Details

The benchmark results from my environment is listed as below.

seq_slice_choose_multiple_1_of_1000
                        time:   [17.377 ns 17.480 ns 17.597 ns]
                        change: [-5.6211% -4.8379% -4.0867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

seq_slice_choose_multiple_10_of_100
                        time:   [53.166 ns 53.654 ns 54.192 ns]
                        change: [-7.2861% -6.4089% -5.4998%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

@dhardy
Copy link
Member

dhardy commented Apr 5, 2025

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

@Unparalleled-Calvin
Copy link
Author

Thanks for considering!
The unsafe block is minimized with clear // safety comments and encapsulated by the safe sample_floyd function. And tests/MIRI can confirm no UB. So I think it is ok to use unsafe code and gain more performance.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two unsafe operations here; I'd like to see the perf impact of each.

@RalfJung
Copy link
Contributor

RalfJung commented Apr 7, 2025

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

Not sure what exactly you want my input on here. :) Happy to consult on whether some use of unsafe is sound or not, but that doesn't seem to be the question here? As to whether you think the bit of unsafe is worth the perf gain -- that's a maintainer decision. There's absolutely cases where the perf gain is important enough to justify a bit of unsafe and there are other cases where it's not worth it. I don't have to maintain this code going forward so I can't make this decision for you. :)

And tests/MIRI can confirm no UB.

Of course, testing != verification, so there could still be UB in edge cases not covered by the tests.

@Unparalleled-Calvin
Copy link
Author

Unparalleled-Calvin commented Apr 12, 2025

Thank you for your review! Here are the benchmark results of using the unsafe functions.

Only use *indices.get_unchecked_mut(pos) = j; [Compared to the original version]

seq_slice_choose_multiple_1_of_1000
                        time:   [19.089 ns 19.212 ns 19.346 ns]
                        change: [-0.8332% +0.7077% +2.5236%] (p = 0.46 > 0.05)
                        No change in performance detected.

seq_slice_choose_multiple_10_of_100
                        time:   [58.523 ns 58.777 ns 59.060 ns]
                        change: [-6.5131% -5.2792% -4.0724%] (p = 0.00 < 0.05)
                        Performance has improved.

Additionally use ptr::write instead of push [compared to above]

seq_slice_choose_multiple_1_of_1000
                        time:   [18.896 ns 19.158 ns 19.446 ns]
                        change: [-5.8287% -3.9616% -2.4368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)

seq_slice_choose_multiple_10_of_100
                        time:   [56.478 ns 56.843 ns 57.254 ns]
                        change: [-4.5315% -3.8087% -3.0675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)

From my perspective, the elimination of bounds checking in [] operation contributes more optimization in this PR. And the rewriting of push also brings effect.

@dhardy
Copy link
Member

dhardy commented Jun 17, 2025

Sorry for the delay; I finally got around to running benches on my 5800X desktop. This is 07d4e92 vs d468501.

Full results

seq_slice_choose_1_of_100
                        time:   [2.6486 ns 2.6497 ns 2.6513 ns]
                        change: [+10.827% +10.904% +10.979%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

seq_slice_choose_multiple_1_of_1000
time: [14.917 ns 14.966 ns 15.020 ns]
change: [-3.7378% -3.2299% -2.7506%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

seq_slice_choose_multiple_950_of_1000
time: [2.1368 µs 2.1378 µs 2.1389 µs]
change: [-0.5592% -0.4433% -0.3273%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe

seq_slice_choose_multiple_10_of_100
time: [49.832 ns 49.943 ns 50.089 ns]
change: [-5.1852% -4.6156% -4.0766%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe

seq_slice_choose_multiple_90_of_100
time: [221.04 ns 221.16 ns 221.31 ns]
change: [-3.8395% -3.2941% -2.7416%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

seq_slice_choose_multiple_weighted_1_of_1000
time: [1.3422 µs 1.3436 µs 1.3456 µs]
change: [-1.1407% -0.9365% -0.7424%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
3 (3.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe

seq_slice_choose_multiple_weighted_950_of_1000
time: [37.589 µs 37.598 µs 37.608 µs]
change: [-0.0223% +0.1557% +0.3243%] (p = 0.08 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low severe
1 (1.00%) high mild

seq_slice_choose_multiple_weighted_10_of_100
time: [2.2598 µs 4.4461 µs 9.4964 µs]
change: [-1.3195% +71.001% +287.04%] (p = 0.60 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe

seq_slice_choose_multiple_weighted_90_of_100
time: [3.9278 µs 3.9714 µs 4.0205 µs]
change: [+5.9603% +6.6836% +7.3714%] (p = 0.00 < 0.05)
Performance has regressed.
Found 26 outliers among 100 measurements (26.00%)
17 (17.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

seq_iter_choose_multiple_10_of_100
time: [409.17 ns 409.60 ns 410.00 ns]
change: [+7.8477% +8.1398% +8.4209%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

seq_iter_choose_multiple_fill_10_of_100
time: [371.32 ns 372.18 ns 373.14 ns]
change: [+1.9152% +2.1495% +2.3769%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe

choose_size-hinted_from_1_ChaCha20
time: [533.14 ps 533.41 ps 533.72 ps]
change: [+0.8776% +1.0959% +1.3212%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

choose_stable_from_1_ChaCha20
time: [8.6338 ns 8.8176 ns 8.9933 ns]
change: [-12.136% -9.5246% -6.6585%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

choose_unhinted_from_1_ChaCha20
time: [5.6307 ns 5.6463 ns 5.6624 ns]
change: [+0.7961% +1.1273% +1.4456%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_windowed_from_1_ChaCha20
time: [7.0225 ns 7.1445 ns 7.2883 ns]
change: [-12.509% -10.517% -8.5681%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe

choose_size-hinted_from_2_ChaCha20
time: [4.4525 ns 4.4739 ns 4.4975 ns]
change: [-4.3195% -4.0243% -3.6848%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild

choose_stable_from_2_ChaCha20
time: [17.758 ns 17.813 ns 17.872 ns]
change: [-0.1884% +0.1905% +0.5192%] (p = 0.30 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_2_ChaCha20
time: [15.527 ns 15.559 ns 15.593 ns]
change: [+1.9199% +2.1527% +2.3974%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe

choose_windowed_from_2_ChaCha20
time: [12.766 ns 12.776 ns 12.788 ns]
change: [-1.5386% -1.2811% -1.0575%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_size-hinted_from_3_ChaCha20
time: [4.4146 ns 4.4229 ns 4.4323 ns]
change: [-5.4112% -5.0060% -4.6991%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe

choose_stable_from_3_ChaCha20
time: [31.406 ns 31.463 ns 31.523 ns]
change: [+1.5129% +1.6787% +1.8203%] (p = 0.00 < 0.05)
Performance has regressed.

choose_unhinted_from_3_ChaCha20
time: [28.585 ns 28.643 ns 28.695 ns]
change: [-0.9503% -0.8059% -0.6268%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_windowed_from_3_ChaCha20
time: [14.593 ns 14.617 ns 14.645 ns]
change: [+0.0893% +0.3011% +0.5185%] (p = 0.01 < 0.05)
Change within noise threshold.

choose_size-hinted_from_10_ChaCha20
time: [4.4173 ns 4.4245 ns 4.4325 ns]
change: [-5.2512% -5.0591% -4.8575%] (p = 0.00 < 0.05)
Performance has improved.

choose_stable_from_10_ChaCha20
time: [77.354 ns 77.598 ns 77.883 ns]
change: [-0.4206% -0.1263% +0.1400%] (p = 0.40 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
9 (9.00%) high mild
3 (3.00%) high severe

choose_unhinted_from_10_ChaCha20
time: [71.879 ns 72.142 ns 72.458 ns]
change: [-2.6380% -2.1592% -1.6905%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
10 (10.00%) high mild
2 (2.00%) high severe

choose_windowed_from_10_ChaCha20
time: [27.757 ns 27.791 ns 27.823 ns]
change: [-2.0492% -1.8655% -1.6903%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low mild
3 (3.00%) high mild

choose_size-hinted_from_100_ChaCha20
time: [4.4348 ns 4.4618 ns 4.4907 ns]
change: [-5.1198% -4.7793% -4.4258%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe

choose_stable_from_100_ChaCha20
time: [471.23 ns 472.81 ns 474.74 ns]
change: [+0.2505% +0.6019% +0.9250%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe

choose_unhinted_from_100_ChaCha20
time: [429.48 ns 429.88 ns 430.37 ns]
change: [-4.5379% -4.4257% -4.3012%] (p = 0.00 < 0.05)
Performance has improved.

choose_windowed_from_100_ChaCha20
time: [152.45 ns 152.63 ns 152.81 ns]
change: [-2.1996% -1.9982% -1.8082%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

choose_size-hinted_from_1000_ChaCha20
time: [4.4321 ns 4.4401 ns 4.4499 ns]
change: [-4.8297% -4.6664% -4.4985%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

choose_stable_from_1000_ChaCha20
time: [3.5763 µs 3.5825 µs 3.5891 µs]
change: [-1.0493% -0.8017% -0.5696%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe

choose_unhinted_from_1000_ChaCha20
time: [3.2282 µs 3.2311 µs 3.2346 µs]
change: [-7.1662% -6.9770% -6.8021%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe

choose_windowed_from_1000_ChaCha20
time: [1.0882 µs 1.0898 µs 1.0914 µs]
change: [-1.6050% -1.1989% -0.8548%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild

choose_size-hinted_from_1_Pcg32
time: [530.23 ps 530.50 ps 530.82 ps]
change: [-0.5104% -0.4148% -0.3119%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_stable_from_1_Pcg32
time: [10.419 ns 10.592 ns 10.740 ns]
change: [+1.7287% +4.3907% +6.9825%] (p = 0.00 < 0.05)
Performance has regressed.

choose_unhinted_from_1_Pcg32
time: [5.3097 ns 5.3168 ns 5.3242 ns]
change: [-5.7915% -5.5652% -5.3655%] (p = 0.00 < 0.05)
Performance has improved.

choose_windowed_from_1_Pcg32
time: [7.9914 ns 8.1458 ns 8.2921 ns]
change: [-6.1896% -4.9242% -3.7086%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild

choose_size-hinted_from_2_Pcg32
time: [2.6646 ns 2.6695 ns 2.6752 ns]
change: [-0.4699% -0.2344% -0.0000%] (p = 0.06 > 0.05)
No change in performance detected.

choose_stable_from_2_Pcg32
time: [15.605 ns 15.635 ns 15.663 ns]
change: [+1.1656% +1.4562% +1.8266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe

choose_unhinted_from_2_Pcg32
time: [13.376 ns 13.407 ns 13.443 ns]
change: [+2.0915% +2.6592% +3.2658%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
10 (10.00%) high mild
1 (1.00%) high severe

choose_windowed_from_2_Pcg32
time: [10.816 ns 10.831 ns 10.847 ns]
change: [-1.1565% -0.8701% -0.5873%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe

choose_size-hinted_from_3_Pcg32
time: [2.6576 ns 2.6633 ns 2.6705 ns]
change: [-0.0831% +0.1218% +0.3585%] (p = 0.31 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe

choose_stable_from_3_Pcg32
time: [28.924 ns 29.026 ns 29.138 ns]
change: [+1.0419% +1.2694% +1.5207%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_3_Pcg32
time: [26.324 ns 26.374 ns 26.426 ns]
change: [-0.3882% -0.1439% +0.1104%] (p = 0.27 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_windowed_from_3_Pcg32
time: [12.880 ns 12.900 ns 12.923 ns]
change: [+2.6186% +2.8269% +3.0184%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_size-hinted_from_10_Pcg32
time: [2.6587 ns 2.6640 ns 2.6699 ns]
change: [-0.5874% -0.3001% -0.0189%] (p = 0.04 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

choose_stable_from_10_Pcg32
time: [73.983 ns 74.182 ns 74.416 ns]
change: [+0.6416% +0.8830% +1.1465%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_10_Pcg32
time: [68.973 ns 69.035 ns 69.100 ns]
change: [-0.8100% -0.4104% -0.1083%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

choose_windowed_from_10_Pcg32
time: [24.306 ns 24.335 ns 24.369 ns]
change: [-0.0080% +0.4368% +0.8851%] (p = 0.06 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe

choose_size-hinted_from_100_Pcg32
time: [2.6640 ns 2.6747 ns 2.6890 ns]
change: [-0.3722% -0.0756% +0.2830%] (p = 0.65 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

choose_stable_from_100_Pcg32
time: [462.04 ns 462.59 ns 463.19 ns]
change: [-1.5476% -1.2043% -0.8930%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_100_Pcg32
time: [424.52 ns 425.35 ns 426.24 ns]
change: [+0.8196% +1.0394% +1.2658%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_windowed_from_100_Pcg32
time: [128.68 ns 128.87 ns 129.08 ns]
change: [+2.3181% +2.4577% +2.6024%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_1000_Pcg32
time: [2.6733 ns 2.6817 ns 2.6907 ns]
change: [-0.6667% -0.4014% -0.0932%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_stable_from_1000_Pcg32
time: [3.5208 µs 3.5233 µs 3.5265 µs]
change: [-0.9690% -0.7959% -0.6314%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_1000_Pcg32
time: [3.1779 µs 3.1842 µs 3.1905 µs]
change: [+1.8954% +2.0685% +2.2204%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

choose_windowed_from_1000_Pcg32
time: [845.19 ns 847.21 ns 849.67 ns]
change: [+3.7596% +4.2007% +4.6226%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

choose_size-hinted_from_1_Pcg64
time: [532.71 ps 533.89 ps 535.21 ps]
change: [-0.9249% -0.5829% -0.2582%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe

choose_stable_from_1_Pcg64
time: [8.6180 ns 8.6683 ns 8.7227 ns]
change: [-7.3260% -6.0438% -4.7466%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_unhinted_from_1_Pcg64
time: [5.8759 ns 5.8879 ns 5.9007 ns]
change: [-0.6008% -0.4198% -0.2278%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

choose_windowed_from_1_Pcg64
time: [7.3120 ns 7.3835 ns 7.4550 ns]
change: [+0.3778% +2.0338% +3.4672%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_size-hinted_from_2_Pcg64
time: [3.1898 ns 3.1937 ns 3.1982 ns]
change: [+7.9230% +8.3100% +8.6936%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe

choose_stable_from_2_Pcg64
time: [17.301 ns 17.352 ns 17.405 ns]
change: [-0.1188% +0.1582% +0.4491%] (p = 0.28 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe

choose_unhinted_from_2_Pcg64
time: [14.620 ns 14.647 ns 14.678 ns]
change: [-1.3772% -1.0038% -0.4491%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
9 (9.00%) high mild
5 (5.00%) high severe

choose_windowed_from_2_Pcg64
time: [11.943 ns 11.966 ns 11.989 ns]
change: [+1.0566% +1.3383% +1.6054%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe

choose_size-hinted_from_3_Pcg64
time: [3.1938 ns 3.1946 ns 3.1954 ns]
change: [+8.2477% +8.4764% +8.7045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
16 (16.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe

choose_stable_from_3_Pcg64
time: [30.372 ns 30.388 ns 30.402 ns]
change: [-0.3199% -0.2350% -0.1464%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low severe
6 (6.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe

choose_unhinted_from_3_Pcg64
time: [27.919 ns 27.955 ns 27.999 ns]
change: [-1.1647% -0.8801% -0.6087%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe

choose_windowed_from_3_Pcg64
time: [13.925 ns 13.984 ns 14.057 ns]
change: [+1.8663% +2.1689% +2.5540%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

choose_size-hinted_from_10_Pcg64
time: [3.2195 ns 3.2348 ns 3.2512 ns]
change: [+8.8768% +9.2586% +9.6379%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe

choose_stable_from_10_Pcg64
time: [76.719 ns 76.944 ns 77.176 ns]
change: [+1.0071% +1.2420% +1.4907%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

choose_unhinted_from_10_Pcg64
time: [71.071 ns 71.257 ns 71.481 ns]
change: [-1.3636% -1.1698% -0.9684%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe

choose_windowed_from_10_Pcg64
time: [26.847 ns 26.951 ns 27.067 ns]
change: [+3.5211% +3.9402% +4.3993%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

choose_size-hinted_from_100_Pcg64
time: [3.1996 ns 3.2071 ns 3.2170 ns]
change: [+8.6437% +8.9352% +9.2191%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe

choose_stable_from_100_Pcg64
time: [471.35 ns 472.14 ns 473.05 ns]
change: [+0.3193% +0.4841% +0.6685%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe

choose_unhinted_from_100_Pcg64
time: [423.57 ns 424.32 ns 425.20 ns]
change: [-0.9277% -0.6107% -0.3160%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe

choose_windowed_from_100_Pcg64
time: [140.44 ns 140.71 ns 141.02 ns]
change: [+2.3933% +2.6175% +2.8512%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe

choose_size-hinted_from_1000_Pcg64
time: [3.1980 ns 3.2012 ns 3.2044 ns]
change: [+8.7161% +8.8721% +9.0295%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

choose_stable_from_1000_Pcg64
time: [3.6179 µs 3.6188 µs 3.6197 µs]
change: [-0.1558% -0.0882% -0.0232%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild

choose_unhinted_from_1000_Pcg64
time: [3.1599 µs 3.1621 µs 3.1643 µs]
change: [-0.2891% -0.1276% +0.0231%] (p = 0.11 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high severe

choose_windowed_from_1000_Pcg64
time: [925.38 ns 925.75 ns 926.10 ns]
change: [+3.5396% +3.6956% +3.8390%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe

On average, that's +1% (range -11% to +71%).

Yes, there are caveats to this type of benchmarking: variance (I repeated one test a few times and had less than 1% change so probably okay), relevance (and weighting), but on the available evidence I don't see any significant benefit to this change.

@dhardy dhardy closed this Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants