Skip to content

Commit 90c02c1

Browse files
committed
Auto merge of #111296 - Sp00ph:const_gcd, r=nagisa,Mark-Simulacrum
Always const-evaluate the GCD in `slice::align_to_offsets` Use an inline `const`-block to force the compiler to calculate the GCD at compile time, even in debug mode. This shouldn't affect the behavior of the program at all, but it drastically cuts down on the number of instructions emitted with optimizations disabled. With the current implementation, a single `slice::align_to` instantiation (specifically `<[u8]>::align_to::<u128>()`) generates 676 instructions (on x86-64). Forcing the GCD computation to be const cuts it down to 327 instructions, so just over 50% less. This is obviously not representative of actual runtime gains, but I still see it as a significant win as long as it doesn't degrade compile times. Not having to worry about LLVM const-evaluating the GCD function also allows it to use the textbook recursive euclidean algorithm instead of a much more complicated iterative implementation with multiple `unsafe`-blocks.
2 parents 2f2c438 + b5ee324 commit 90c02c1

File tree

1 file changed

+6
-37
lines changed

1 file changed

+6
-37
lines changed

library/core/src/slice/mod.rs

+6-37
Original file line numberDiff line numberDiff line change
@@ -3478,44 +3478,13 @@ impl<T> [T] {
34783478
// Ts = size_of::<U> / gcd(size_of::<T>, size_of::<U>)
34793479
//
34803480
// Luckily since all this is constant-evaluated... performance here matters not!
3481-
#[inline]
3482-
fn gcd(a: usize, b: usize) -> usize {
3483-
use crate::intrinsics;
3484-
// iterative stein’s algorithm
3485-
// We should still make this `const fn` (and revert to recursive algorithm if we do)
3486-
// because relying on llvm to consteval all this is… well, it makes me uncomfortable.
3487-
3488-
// SAFETY: `a` and `b` are checked to be non-zero values.
3489-
let (ctz_a, mut ctz_b) = unsafe {
3490-
if a == 0 {
3491-
return b;
3492-
}
3493-
if b == 0 {
3494-
return a;
3495-
}
3496-
(intrinsics::cttz_nonzero(a), intrinsics::cttz_nonzero(b))
3497-
};
3498-
let k = ctz_a.min(ctz_b);
3499-
let mut a = a >> ctz_a;
3500-
let mut b = b;
3501-
loop {
3502-
// remove all factors of 2 from b
3503-
b >>= ctz_b;
3504-
if a > b {
3505-
mem::swap(&mut a, &mut b);
3506-
}
3507-
b = b - a;
3508-
// SAFETY: `b` is checked to be non-zero.
3509-
unsafe {
3510-
if b == 0 {
3511-
break;
3512-
}
3513-
ctz_b = intrinsics::cttz_nonzero(b);
3514-
}
3515-
}
3516-
a << k
3481+
const fn gcd(a: usize, b: usize) -> usize {
3482+
if b == 0 { a } else { gcd(b, a % b) }
35173483
}
3518-
let gcd: usize = gcd(mem::size_of::<T>(), mem::size_of::<U>());
3484+
3485+
// Explicitly wrap the function call in a const block so it gets
3486+
// constant-evaluated even in debug mode.
3487+
let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
35193488
let ts: usize = mem::size_of::<U>() / gcd;
35203489
let us: usize = mem::size_of::<T>() / gcd;
35213490

0 commit comments

Comments
 (0)