-
Notifications
You must be signed in to change notification settings - Fork 385
Commit cc5cbb0
committed
Auto merge of #130733 - okaneco:is_ascii, r=scottmcm
Optimize `is_ascii` for `str` and `[u8]` further
Replace the existing optimized function with one that enables auto-vectorization.
This is especially beneficial on x86-64 as `pmovmskb` can be emitted with careful structuring of the code. The instruction can detect non-ASCII characters one vector register width at a time instead of the current `usize` at a time check.
The resulting implementation is completely safe.
`case00_libcore` is the current implementation, `case04_while_loop` is this PR.
```
benchmarks:
ascii::is_ascii_slice::long::case00_libcore 22.25/iter +/- 1.09
ascii::is_ascii_slice::long::case04_while_loop 6.78/iter +/- 0.92
ascii::is_ascii_slice::medium::case00_libcore 2.81/iter +/- 0.39
ascii::is_ascii_slice::medium::case04_while_loop 1.56/iter +/- 0.78
ascii::is_ascii_slice::short::case00_libcore 5.55/iter +/- 0.85
ascii::is_ascii_slice::short::case04_while_loop 3.75/iter +/- 0.22
ascii::is_ascii_slice::unaligned_both_long::case00_libcore 26.59/iter +/- 0.66
ascii::is_ascii_slice::unaligned_both_long::case04_while_loop 5.78/iter +/- 0.16
ascii::is_ascii_slice::unaligned_both_medium::case00_libcore 2.97/iter +/- 0.32
ascii::is_ascii_slice::unaligned_both_medium::case04_while_loop 2.41/iter +/- 0.10
ascii::is_ascii_slice::unaligned_head_long::case00_libcore 23.71/iter +/- 0.79
ascii::is_ascii_slice::unaligned_head_long::case04_while_loop 7.83/iter +/- 1.31
ascii::is_ascii_slice::unaligned_head_medium::case00_libcore 3.69/iter +/- 0.54
ascii::is_ascii_slice::unaligned_head_medium::case04_while_loop 7.05/iter +/- 0.32
ascii::is_ascii_slice::unaligned_tail_long::case00_libcore 24.44/iter +/- 1.41
ascii::is_ascii_slice::unaligned_tail_long::case04_while_loop 5.12/iter +/- 0.18
ascii::is_ascii_slice::unaligned_tail_medium::case00_libcore 3.24/iter +/- 0.40
ascii::is_ascii_slice::unaligned_tail_medium::case04_while_loop 2.86/iter +/- 0.14
```
`unaligned_head_medium` is the main regression in the benchmarks. It is a 32 byte string being sliced `bytes[1..]`.
The first commit can be used to run the benchmarks against the current core implementation.
Previous implementation was done in #74066
---
Two potential drawbacks of this implementation are that it increases instruction count and may regress other platforms/architectures. The benches here may also be too artificial to glean much insight from.
https://rust.godbolt.org/z/G9znGfY36File tree
0 file changed
+0
-0
lines changedFilter options
0 file changed
+0
-0
lines changed
0 commit comments