You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rope: Micro optimize the creation of masks (#41132)
Using compiler explorer I saw that the compiler wasn't clever enough to
optimise away the branches in the masking code. I thought the compiler
would have a better chance if we always branched, which [turned out to
be the case](https://godbolt.org/z/PM594Pz18).
Running the benchmarks the biggest benefit I saw was:
```
push/65536 time: [2.9067 ms 2.9243 ms 2.9417 ms]
thrpt: [21.246 MiB/s 21.373 MiB/s 21.502 MiB/s]
change:
time: [-8.3452% -7.2617% -6.2009%] (p = 0.00 < 0.05)
thrpt: [+6.6108% +7.8303% +9.1050%]
Performance has improved.
```
But I did also see some regressions:
```
slice/4096 time: [66.195 µs 66.815 µs 67.448 µs]
thrpt: [57.915 MiB/s 58.464 MiB/s 59.012 MiB/s]
change:
time: [+3.7131% +5.1698% +6.6971%] (p = 0.00 < 0.05)
thrpt: [-6.2768% -4.9157% -3.5802%]
Performance has regressed.
```
Release Notes:
- N/A
0 commit comments