perf: optimize sample_floyd by unsafe APIs #1622

Unparalleled-Calvin · 2025-04-03T07:05:30Z

Added a CHANGELOG.md entry

Summary

This PR uses unsafe APIs to boost performance of sample_floyd. The optimization is totally safe because the index is bounded by the length of the vec.

Motivation

Rust's bounds checking are sometimes unnecessary. Removing bounds checking by unsafe APIs can boost its performance.This optimization makes related functions more faster with safety ensured.

Details

The benchmark results from my environment is listed as below.

seq_slice_choose_multiple_1_of_1000
                        time:   [17.377 ns 17.480 ns 17.597 ns]
                        change: [-5.6211% -4.8379% -4.0867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

seq_slice_choose_multiple_10_of_100
                        time:   [53.166 ns 53.654 ns 54.192 ns]
                        change: [-7.2861% -6.4089% -5.4998%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

dhardy · 2025-04-05T07:56:09Z

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

Unparalleled-Calvin · 2025-04-07T08:15:27Z

Thanks for considering!
The unsafe block is minimized with clear // safety comments and encapsulated by the safe sample_floyd function. And tests/MIRI can confirm no UB. So I think it is ok to use unsafe code and gain more performance.

dhardy

There are two unsafe operations here; I'd like to see the perf impact of each.

src/seq/index.rs

RalfJung · 2025-04-07T17:22:25Z

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

Not sure what exactly you want my input on here. :) Happy to consult on whether some use of unsafe is sound or not, but that doesn't seem to be the question here? As to whether you think the bit of unsafe is worth the perf gain -- that's a maintainer decision. There's absolutely cases where the perf gain is important enough to justify a bit of unsafe and there are other cases where it's not worth it. I don't have to maintain this code going forward so I can't make this decision for you. :)

And tests/MIRI can confirm no UB.

Of course, testing != verification, so there could still be UB in edge cases not covered by the tests.

Unparalleled-Calvin · 2025-04-12T15:01:09Z

Thank you for your review! Here are the benchmark results of using the unsafe functions.

Only use *indices.get_unchecked_mut(pos) = j; [Compared to the original version]

seq_slice_choose_multiple_1_of_1000
                        time:   [19.089 ns 19.212 ns 19.346 ns]
                        change: [-0.8332% +0.7077% +2.5236%] (p = 0.46 > 0.05)
                        No change in performance detected.

seq_slice_choose_multiple_10_of_100
                        time:   [58.523 ns 58.777 ns 59.060 ns]
                        change: [-6.5131% -5.2792% -4.0724%] (p = 0.00 < 0.05)
                        Performance has improved.

Additionally use ptr::write instead of push [compared to above]

seq_slice_choose_multiple_1_of_1000
                        time:   [18.896 ns 19.158 ns 19.446 ns]
                        change: [-5.8287% -3.9616% -2.4368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)

seq_slice_choose_multiple_10_of_100
                        time:   [56.478 ns 56.843 ns 57.254 ns]
                        change: [-4.5315% -3.8087% -3.0675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)

From my perspective, the elimination of bounds checking in [] operation contributes more optimization in this PR. And the rewriting of push also brings effect.

dhardy · 2025-06-17T10:41:24Z

Sorry for the delay; I finally got around to running benches on my 5800X desktop. This is 07d4e92 vs d468501.

Full results


seq_slice_choose_1_of_100
                        time:   [2.6486 ns 2.6497 ns 2.6513 ns]
                        change: [+10.827% +10.904% +10.979%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
seq_slice_choose_multiple_1_of_1000

time:   [14.917 ns 14.966 ns 15.020 ns]

change: [-3.7378% -3.2299% -2.7506%] (p = 0.00 < 0.05)

Performance has improved.

Found 5 outliers among 100 measurements (5.00%)

4 (4.00%) high mild

1 (1.00%) high severe
seq_slice_choose_multiple_950_of_1000

time:   [2.1368 µs 2.1378 µs 2.1389 µs]

change: [-0.5592% -0.4433% -0.3273%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 15 outliers among 100 measurements (15.00%)

4 (4.00%) low mild

2 (2.00%) high mild

9 (9.00%) high severe
seq_slice_choose_multiple_10_of_100

time:   [49.832 ns 49.943 ns 50.089 ns]

change: [-5.1852% -4.6156% -4.0766%] (p = 0.00 < 0.05)

Performance has improved.

Found 12 outliers among 100 measurements (12.00%)

3 (3.00%) high mild

9 (9.00%) high severe
seq_slice_choose_multiple_90_of_100

time:   [221.04 ns 221.16 ns 221.31 ns]

change: [-3.8395% -3.2941% -2.7416%] (p = 0.00 < 0.05)

Performance has improved.

Found 11 outliers among 100 measurements (11.00%)

6 (6.00%) high mild

5 (5.00%) high severe
seq_slice_choose_multiple_weighted_1_of_1000

time:   [1.3422 µs 1.3436 µs 1.3456 µs]

change: [-1.1407% -0.9365% -0.7424%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 12 outliers among 100 measurements (12.00%)

1 (1.00%) low severe

3 (3.00%) low mild

4 (4.00%) high mild

4 (4.00%) high severe
seq_slice_choose_multiple_weighted_950_of_1000

time:   [37.589 µs 37.598 µs 37.608 µs]

change: [-0.0223% +0.1557% +0.3243%] (p = 0.08 > 0.05)

No change in performance detected.

Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) low severe

1 (1.00%) high mild
seq_slice_choose_multiple_weighted_10_of_100

time:   [2.2598 µs 4.4461 µs 9.4964 µs]

change: [-1.3195% +71.001% +287.04%] (p = 0.60 > 0.05)

No change in performance detected.

Found 13 outliers among 100 measurements (13.00%)

3 (3.00%) high mild

10 (10.00%) high severe
seq_slice_choose_multiple_weighted_90_of_100

time:   [3.9278 µs 3.9714 µs 4.0205 µs]

change: [+5.9603% +6.6836% +7.3714%] (p = 0.00 < 0.05)

Performance has regressed.

Found 26 outliers among 100 measurements (26.00%)

17 (17.00%) low severe

1 (1.00%) low mild

1 (1.00%) high mild

7 (7.00%) high severe
seq_iter_choose_multiple_10_of_100

time:   [409.17 ns 409.60 ns 410.00 ns]

change: [+7.8477% +8.1398% +8.4209%] (p = 0.00 < 0.05)

Performance has regressed.

Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) high mild

1 (1.00%) high severe
seq_iter_choose_multiple_fill_10_of_100

time:   [371.32 ns 372.18 ns 373.14 ns]

change: [+1.9152% +2.1495% +2.3769%] (p = 0.00 < 0.05)

Performance has regressed.

Found 11 outliers among 100 measurements (11.00%)

2 (2.00%) low mild

2 (2.00%) high mild

7 (7.00%) high severe
choose_size-hinted_from_1_ChaCha20

time:   [533.14 ps 533.41 ps 533.72 ps]

change: [+0.8776% +1.0959% +1.3212%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 11 outliers among 100 measurements (11.00%)

6 (6.00%) high mild

5 (5.00%) high severe
choose_stable_from_1_ChaCha20

time:   [8.6338 ns 8.8176 ns 8.9933 ns]

change: [-12.136% -9.5246% -6.6585%] (p = 0.00 < 0.05)

Performance has improved.

Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild
choose_unhinted_from_1_ChaCha20

time:   [5.6307 ns 5.6463 ns 5.6624 ns]

change: [+0.7961% +1.1273% +1.4456%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_windowed_from_1_ChaCha20

time:   [7.0225 ns 7.1445 ns 7.2883 ns]

change: [-12.509% -10.517% -8.5681%] (p = 0.00 < 0.05)

Performance has improved.

Found 17 outliers among 100 measurements (17.00%)

4 (4.00%) high mild

13 (13.00%) high severe
choose_size-hinted_from_2_ChaCha20

time:   [4.4525 ns 4.4739 ns 4.4975 ns]

change: [-4.3195% -4.0243% -3.6848%] (p = 0.00 < 0.05)

Performance has improved.

Found 6 outliers among 100 measurements (6.00%)

6 (6.00%) high mild
choose_stable_from_2_ChaCha20

time:   [17.758 ns 17.813 ns 17.872 ns]

change: [-0.1884% +0.1905% +0.5192%] (p = 0.30 > 0.05)

No change in performance detected.

Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe
choose_unhinted_from_2_ChaCha20

time:   [15.527 ns 15.559 ns 15.593 ns]

change: [+1.9199% +2.1527% +2.3974%] (p = 0.00 < 0.05)

Performance has regressed.

Found 8 outliers among 100 measurements (8.00%)

4 (4.00%) high mild

4 (4.00%) high severe
choose_windowed_from_2_ChaCha20

time:   [12.766 ns 12.776 ns 12.788 ns]

change: [-1.5386% -1.2811% -1.0575%] (p = 0.00 < 0.05)

Performance has improved.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_size-hinted_from_3_ChaCha20

time:   [4.4146 ns 4.4229 ns 4.4323 ns]

change: [-5.4112% -5.0060% -4.6991%] (p = 0.00 < 0.05)

Performance has improved.

Found 8 outliers among 100 measurements (8.00%)

2 (2.00%) high mild

6 (6.00%) high severe
choose_stable_from_3_ChaCha20

time:   [31.406 ns 31.463 ns 31.523 ns]

change: [+1.5129% +1.6787% +1.8203%] (p = 0.00 < 0.05)

Performance has regressed.
choose_unhinted_from_3_ChaCha20

time:   [28.585 ns 28.643 ns 28.695 ns]

change: [-0.9503% -0.8059% -0.6268%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild
choose_windowed_from_3_ChaCha20

time:   [14.593 ns 14.617 ns 14.645 ns]

change: [+0.0893% +0.3011% +0.5185%] (p = 0.01 < 0.05)

Change within noise threshold.
choose_size-hinted_from_10_ChaCha20

time:   [4.4173 ns 4.4245 ns 4.4325 ns]

change: [-5.2512% -5.0591% -4.8575%] (p = 0.00 < 0.05)

Performance has improved.
choose_stable_from_10_ChaCha20

time:   [77.354 ns 77.598 ns 77.883 ns]

change: [-0.4206% -0.1263% +0.1400%] (p = 0.40 > 0.05)

No change in performance detected.

Found 12 outliers among 100 measurements (12.00%)

9 (9.00%) high mild

3 (3.00%) high severe
choose_unhinted_from_10_ChaCha20

time:   [71.879 ns 72.142 ns 72.458 ns]

change: [-2.6380% -2.1592% -1.6905%] (p = 0.00 < 0.05)

Performance has improved.

Found 12 outliers among 100 measurements (12.00%)

10 (10.00%) high mild

2 (2.00%) high severe
choose_windowed_from_10_ChaCha20

time:   [27.757 ns 27.791 ns 27.823 ns]

change: [-2.0492% -1.8655% -1.6903%] (p = 0.00 < 0.05)

Performance has improved.

Found 9 outliers among 100 measurements (9.00%)

6 (6.00%) low mild

3 (3.00%) high mild
choose_size-hinted_from_100_ChaCha20

time:   [4.4348 ns 4.4618 ns 4.4907 ns]

change: [-5.1198% -4.7793% -4.4258%] (p = 0.00 < 0.05)

Performance has improved.

Found 18 outliers among 100 measurements (18.00%)

3 (3.00%) high mild

15 (15.00%) high severe
choose_stable_from_100_ChaCha20

time:   [471.23 ns 472.81 ns 474.74 ns]

change: [+0.2505% +0.6019% +0.9250%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 11 outliers among 100 measurements (11.00%)

3 (3.00%) high mild

8 (8.00%) high severe
choose_unhinted_from_100_ChaCha20

time:   [429.48 ns 429.88 ns 430.37 ns]

change: [-4.5379% -4.4257% -4.3012%] (p = 0.00 < 0.05)

Performance has improved.
choose_windowed_from_100_ChaCha20

time:   [152.45 ns 152.63 ns 152.81 ns]

change: [-2.1996% -1.9982% -1.8082%] (p = 0.00 < 0.05)

Performance has improved.

Found 3 outliers among 100 measurements (3.00%)

1 (1.00%) high mild

2 (2.00%) high severe
choose_size-hinted_from_1000_ChaCha20

time:   [4.4321 ns 4.4401 ns 4.4499 ns]

change: [-4.8297% -4.6664% -4.4985%] (p = 0.00 < 0.05)

Performance has improved.

Found 4 outliers among 100 measurements (4.00%)

4 (4.00%) high mild
choose_stable_from_1000_ChaCha20

time:   [3.5763 µs 3.5825 µs 3.5891 µs]

change: [-1.0493% -0.8017% -0.5696%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 8 outliers among 100 measurements (8.00%)

4 (4.00%) high mild

4 (4.00%) high severe
choose_unhinted_from_1000_ChaCha20

time:   [3.2282 µs 3.2311 µs 3.2346 µs]

change: [-7.1662% -6.9770% -6.8021%] (p = 0.00 < 0.05)

Performance has improved.

Found 5 outliers among 100 measurements (5.00%)

3 (3.00%) high mild

2 (2.00%) high severe
choose_windowed_from_1000_ChaCha20

time:   [1.0882 µs 1.0898 µs 1.0914 µs]

change: [-1.6050% -1.1989% -0.8548%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 10 outliers among 100 measurements (10.00%)

5 (5.00%) low severe

4 (4.00%) low mild

1 (1.00%) high mild
choose_size-hinted_from_1_Pcg32

time:   [530.23 ps 530.50 ps 530.82 ps]

change: [-0.5104% -0.4148% -0.3119%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_stable_from_1_Pcg32

time:   [10.419 ns 10.592 ns 10.740 ns]

change: [+1.7287% +4.3907% +6.9825%] (p = 0.00 < 0.05)

Performance has regressed.
choose_unhinted_from_1_Pcg32

time:   [5.3097 ns 5.3168 ns 5.3242 ns]

change: [-5.7915% -5.5652% -5.3655%] (p = 0.00 < 0.05)

Performance has improved.
choose_windowed_from_1_Pcg32

time:   [7.9914 ns 8.1458 ns 8.2921 ns]

change: [-6.1896% -4.9242% -3.7086%] (p = 0.00 < 0.05)

Performance has improved.

Found 5 outliers among 100 measurements (5.00%)

5 (5.00%) high mild
choose_size-hinted_from_2_Pcg32

time:   [2.6646 ns 2.6695 ns 2.6752 ns]

change: [-0.4699% -0.2344% -0.0000%] (p = 0.06 > 0.05)

No change in performance detected.
choose_stable_from_2_Pcg32

time:   [15.605 ns 15.635 ns 15.663 ns]

change: [+1.1656% +1.4562% +1.8266%] (p = 0.00 < 0.05)

Performance has regressed.

Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high severe
choose_unhinted_from_2_Pcg32

time:   [13.376 ns 13.407 ns 13.443 ns]

change: [+2.0915% +2.6592% +3.2658%] (p = 0.00 < 0.05)

Performance has regressed.

Found 11 outliers among 100 measurements (11.00%)

10 (10.00%) high mild

1 (1.00%) high severe
choose_windowed_from_2_Pcg32

time:   [10.816 ns 10.831 ns 10.847 ns]

change: [-1.1565% -0.8701% -0.5873%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 10 outliers among 100 measurements (10.00%)

8 (8.00%) high mild

2 (2.00%) high severe
choose_size-hinted_from_3_Pcg32

time:   [2.6576 ns 2.6633 ns 2.6705 ns]

change: [-0.0831% +0.1218% +0.3585%] (p = 0.31 > 0.05)

No change in performance detected.

Found 11 outliers among 100 measurements (11.00%)

9 (9.00%) high mild

2 (2.00%) high severe
choose_stable_from_3_Pcg32

time:   [28.924 ns 29.026 ns 29.138 ns]

change: [+1.0419% +1.2694% +1.5207%] (p = 0.00 < 0.05)

Performance has regressed.

Found 6 outliers among 100 measurements (6.00%)

4 (4.00%) high mild

2 (2.00%) high severe
choose_unhinted_from_3_Pcg32

time:   [26.324 ns 26.374 ns 26.426 ns]

change: [-0.3882% -0.1439% +0.1104%] (p = 0.27 > 0.05)

No change in performance detected.

Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) high mild

1 (1.00%) high severe
choose_windowed_from_3_Pcg32

time:   [12.880 ns 12.900 ns 12.923 ns]

change: [+2.6186% +2.8269% +3.0184%] (p = 0.00 < 0.05)

Performance has regressed.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_size-hinted_from_10_Pcg32

time:   [2.6587 ns 2.6640 ns 2.6699 ns]

change: [-0.5874% -0.3001% -0.0189%] (p = 0.04 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) high mild

1 (1.00%) high severe
choose_stable_from_10_Pcg32

time:   [73.983 ns 74.182 ns 74.416 ns]

change: [+0.6416% +0.8830% +1.1465%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 6 outliers among 100 measurements (6.00%)

4 (4.00%) high mild

2 (2.00%) high severe
choose_unhinted_from_10_Pcg32

time:   [68.973 ns 69.035 ns 69.100 ns]

change: [-0.8100% -0.4104% -0.1083%] (p = 0.01 < 0.05)

Change within noise threshold.

Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high severe
choose_windowed_from_10_Pcg32

time:   [24.306 ns 24.335 ns 24.369 ns]

change: [-0.0080% +0.4368% +0.8851%] (p = 0.06 > 0.05)

No change in performance detected.

Found 8 outliers among 100 measurements (8.00%)

3 (3.00%) high mild

5 (5.00%) high severe
choose_size-hinted_from_100_Pcg32

time:   [2.6640 ns 2.6747 ns 2.6890 ns]

change: [-0.3722% -0.0756% +0.2830%] (p = 0.65 > 0.05)

No change in performance detected.

Found 7 outliers among 100 measurements (7.00%)

4 (4.00%) high mild

3 (3.00%) high severe
choose_stable_from_100_Pcg32

time:   [462.04 ns 462.59 ns 463.19 ns]

change: [-1.5476% -1.2043% -0.8930%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) high mild

1 (1.00%) high severe
choose_unhinted_from_100_Pcg32

time:   [424.52 ns 425.35 ns 426.24 ns]

change: [+0.8196% +1.0394% +1.2658%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_windowed_from_100_Pcg32

time:   [128.68 ns 128.87 ns 129.08 ns]

change: [+2.3181% +2.4577% +2.6024%] (p = 0.00 < 0.05)

Performance has regressed.

Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe
choose_size-hinted_from_1000_Pcg32

time:   [2.6733 ns 2.6817 ns 2.6907 ns]

change: [-0.6667% -0.4014% -0.0932%] (p = 0.01 < 0.05)

Change within noise threshold.

Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe
choose_stable_from_1000_Pcg32

time:   [3.5208 µs 3.5233 µs 3.5265 µs]

change: [-0.9690% -0.7959% -0.6314%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe
choose_unhinted_from_1000_Pcg32

time:   [3.1779 µs 3.1842 µs 3.1905 µs]

change: [+1.8954% +2.0685% +2.2204%] (p = 0.00 < 0.05)

Performance has regressed.

Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild
choose_windowed_from_1000_Pcg32

time:   [845.19 ns 847.21 ns 849.67 ns]

change: [+3.7596% +4.2007% +4.6226%] (p = 0.00 < 0.05)

Performance has regressed.

Found 4 outliers among 100 measurements (4.00%)

4 (4.00%) high mild
choose_size-hinted_from_1_Pcg64

time:   [532.71 ps 533.89 ps 535.21 ps]

change: [-0.9249% -0.5829% -0.2582%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 10 outliers among 100 measurements (10.00%)

6 (6.00%) high mild

4 (4.00%) high severe
choose_stable_from_1_Pcg64

time:   [8.6180 ns 8.6683 ns 8.7227 ns]

change: [-7.3260% -6.0438% -4.7466%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild
choose_unhinted_from_1_Pcg64

time:   [5.8759 ns 5.8879 ns 5.9007 ns]

change: [-0.6008% -0.4198% -0.2278%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high mild
choose_windowed_from_1_Pcg64

time:   [7.3120 ns 7.3835 ns 7.4550 ns]

change: [+0.3778% +2.0338% +3.4672%] (p = 0.01 < 0.05)

Change within noise threshold.

Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild
choose_size-hinted_from_2_Pcg64

time:   [3.1898 ns 3.1937 ns 3.1982 ns]

change: [+7.9230% +8.3100% +8.6936%] (p = 0.00 < 0.05)

Performance has regressed.

Found 10 outliers among 100 measurements (10.00%)

3 (3.00%) high mild

7 (7.00%) high severe
choose_stable_from_2_Pcg64

time:   [17.301 ns 17.352 ns 17.405 ns]

change: [-0.1188% +0.1582% +0.4491%] (p = 0.28 > 0.05)

No change in performance detected.

Found 7 outliers among 100 measurements (7.00%)

6 (6.00%) high mild

1 (1.00%) high severe
choose_unhinted_from_2_Pcg64

time:   [14.620 ns 14.647 ns 14.678 ns]

change: [-1.3772% -1.0038% -0.4491%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 14 outliers among 100 measurements (14.00%)

9 (9.00%) high mild

5 (5.00%) high severe
choose_windowed_from_2_Pcg64

time:   [11.943 ns 11.966 ns 11.989 ns]

change: [+1.0566% +1.3383% +1.6054%] (p = 0.00 < 0.05)

Performance has regressed.

Found 12 outliers among 100 measurements (12.00%)

5 (5.00%) high mild

7 (7.00%) high severe
choose_size-hinted_from_3_Pcg64

time:   [3.1938 ns 3.1946 ns 3.1954 ns]

change: [+8.2477% +8.4764% +8.7045%] (p = 0.00 < 0.05)

Performance has regressed.

Found 24 outliers among 100 measurements (24.00%)

16 (16.00%) low mild

4 (4.00%) high mild

4 (4.00%) high severe
choose_stable_from_3_Pcg64

time:   [30.372 ns 30.388 ns 30.402 ns]

change: [-0.3199% -0.2350% -0.1464%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 12 outliers among 100 measurements (12.00%)

2 (2.00%) low severe

6 (6.00%) low mild

2 (2.00%) high mild

2 (2.00%) high severe
choose_unhinted_from_3_Pcg64

time:   [27.919 ns 27.955 ns 27.999 ns]

change: [-1.1647% -0.8801% -0.6087%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 9 outliers among 100 measurements (9.00%)

2 (2.00%) high mild

7 (7.00%) high severe
choose_windowed_from_3_Pcg64

time:   [13.925 ns 13.984 ns 14.057 ns]

change: [+1.8663% +2.1689% +2.5540%] (p = 0.00 < 0.05)

Performance has regressed.

Found 3 outliers among 100 measurements (3.00%)

1 (1.00%) high mild

2 (2.00%) high severe
choose_size-hinted_from_10_Pcg64

time:   [3.2195 ns 3.2348 ns 3.2512 ns]

change: [+8.8768% +9.2586% +9.6379%] (p = 0.00 < 0.05)

Performance has regressed.

Found 11 outliers among 100 measurements (11.00%)

9 (9.00%) high mild

2 (2.00%) high severe
choose_stable_from_10_Pcg64

time:   [76.719 ns 76.944 ns 77.176 ns]

change: [+1.0071% +1.2420% +1.4907%] (p = 0.00 < 0.05)

Performance has regressed.

Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild
choose_unhinted_from_10_Pcg64

time:   [71.071 ns 71.257 ns 71.481 ns]

change: [-1.3636% -1.1698% -0.9684%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 5 outliers among 100 measurements (5.00%)

3 (3.00%) high mild

2 (2.00%) high severe
choose_windowed_from_10_Pcg64

time:   [26.847 ns 26.951 ns 27.067 ns]

change: [+3.5211% +3.9402% +4.3993%] (p = 0.00 < 0.05)

Performance has regressed.

Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) high mild

1 (1.00%) high severe
choose_size-hinted_from_100_Pcg64

time:   [3.1996 ns 3.2071 ns 3.2170 ns]

change: [+8.6437% +8.9352% +9.2191%] (p = 0.00 < 0.05)

Performance has regressed.

Found 5 outliers among 100 measurements (5.00%)

2 (2.00%) high mild

3 (3.00%) high severe
choose_stable_from_100_Pcg64

time:   [471.35 ns 472.14 ns 473.05 ns]

change: [+0.3193% +0.4841% +0.6685%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 12 outliers among 100 measurements (12.00%)

2 (2.00%) low mild

5 (5.00%) high mild

5 (5.00%) high severe
choose_unhinted_from_100_Pcg64

time:   [423.57 ns 424.32 ns 425.20 ns]

change: [-0.9277% -0.6107% -0.3160%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 8 outliers among 100 measurements (8.00%)

6 (6.00%) high mild

2 (2.00%) high severe
choose_windowed_from_100_Pcg64

time:   [140.44 ns 140.71 ns 141.02 ns]

change: [+2.3933% +2.6175% +2.8512%] (p = 0.00 < 0.05)

Performance has regressed.

Found 9 outliers among 100 measurements (9.00%)

5 (5.00%) high mild

4 (4.00%) high severe
choose_size-hinted_from_1000_Pcg64

time:   [3.1980 ns 3.2012 ns 3.2044 ns]

change: [+8.7161% +8.8721% +9.0295%] (p = 0.00 < 0.05)

Performance has regressed.

Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild
choose_stable_from_1000_Pcg64

time:   [3.6179 µs 3.6188 µs 3.6197 µs]

change: [-0.1558% -0.0882% -0.0232%] (p = 0.01 < 0.05)

Change within noise threshold.

Found 5 outliers among 100 measurements (5.00%)

1 (1.00%) low severe

3 (3.00%) low mild

1 (1.00%) high mild
choose_unhinted_from_1000_Pcg64

time:   [3.1599 µs 3.1621 µs 3.1643 µs]

change: [-0.2891% -0.1276% +0.0231%] (p = 0.11 > 0.05)

No change in performance detected.

Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) low mild

1 (1.00%) high severe
choose_windowed_from_1000_Pcg64

time:   [925.38 ns 925.75 ns 926.10 ns]

change: [+3.5396% +3.6956% +3.8390%] (p = 0.00 < 0.05)

Performance has regressed.

Found 5 outliers among 100 measurements (5.00%)

1 (1.00%) low mild

3 (3.00%) high mild

1 (1.00%) high severe

On average, that's +1% (range -11% to +71%).

Yes, there are caveats to this type of benchmarking: variance (I repeated one test a few times and had less than 1% change so probably okay), relevance (and weighting), but on the available evidence I don't see any significant benefit to this change.

Boost performance for sample_floyd

07d4e92

Unparalleled-Calvin force-pushed the master branch from 2bdea23 to 07d4e92 Compare April 3, 2025 07:21

dhardy reviewed Apr 7, 2025

View reviewed changes

src/seq/index.rs Show resolved Hide resolved

src/seq/index.rs Show resolved Hide resolved

Merge branch 'master' into master

5e03158

dhardy closed this Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: optimize sample_floyd by unsafe APIs #1622

perf: optimize sample_floyd by unsafe APIs #1622

Uh oh!

Unparalleled-Calvin commented Apr 3, 2025 •

edited

Loading

Uh oh!

dhardy commented Apr 5, 2025

Uh oh!

Unparalleled-Calvin commented Apr 7, 2025

Uh oh!

dhardy left a comment

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Apr 7, 2025 •

edited

Loading

Uh oh!

Unparalleled-Calvin commented Apr 12, 2025 •

edited

Loading

Uh oh!

dhardy commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

perf: optimize sample_floyd by unsafe APIs #1622

perf: optimize sample_floyd by unsafe APIs #1622

Uh oh!

Conversation

Unparalleled-Calvin commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Details

Uh oh!

dhardy commented Apr 5, 2025

Uh oh!

Unparalleled-Calvin commented Apr 7, 2025

Uh oh!

dhardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Unparalleled-Calvin commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Jun 17, 2025

Uh oh!

Uh oh!

Unparalleled-Calvin commented Apr 3, 2025 •

edited

Loading

RalfJung commented Apr 7, 2025 •

edited

Loading

Unparalleled-Calvin commented Apr 12, 2025 •

edited

Loading