-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure swap_nonoverlapping
is really always untyped
#137412
base: master
Are you sure you want to change the base?
Conversation
r? @ibraheemdev rustbot has assigned @ibraheemdev. Use |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The Miri subtree was changed cc @rust-lang/miri |
src/tools/miri/tests/pass/stdlib/swap_nonoverlapping_is_untyped.rs
Outdated
Show resolved
Hide resolved
Passing this one along because I'm not the best person to review this. r? libs |
I'm mostly sticking to the trivial PRs at the moment as my review capacity is limited. Re-rolling again. r? libs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes LGTM for the non-const
path; see the comment for the const
path.
I did not look at the tests; I think that needs an LLVM/codegen expert. @nikic maybe?
To be clear I only reviewed for correctness, I cannot judge whether all the tricks to get better performance actually work.
|
// Ensure we do better than a long run of byte copies, | ||
// see <https://github.com/rust-lang/rust/issues/134946> | ||
|
||
// CHECK-NOT: movb | ||
// CHECK-COUNT-8: movups{{.+}}xmm | ||
// CHECK-NOT: movb | ||
// CHECK-COUNT-4: movq | ||
// CHECK-NOT: movb | ||
// CHECK-COUNT-4: movl | ||
// CHECK-NOT: movb | ||
// CHECK: retq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewers: the codegen tests here are more about demonstrating what actually happens on a variety of types, and the exact details don't matter that much.
Reviewing the rust code is enough to know that LLVM will swap it, but for example here what we're trying to see is that it's not just a huge row of movb
s like you can see in https://rust.godbolt.org/z/MKfxn1Tjr
@@ -52,39 +49,61 @@ pub fn swap_slice(x: &mut [KeccakBuffer], y: &mut [KeccakBuffer]) { | |||
} | |||
} | |||
|
|||
// But for a large align-1 type, vectorized byte copying is what we want. | |||
|
|||
type OneKilobyteBuffer = [u8; 1024]; | |||
|
|||
// CHECK-LABEL: @swap_1kb_slices | |||
#[no_mangle] | |||
pub fn swap_1kb_slices(x: &mut [OneKilobyteBuffer], y: &mut [OneKilobyteBuffer]) { | |||
// CHECK-NOT: alloca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, the goal here isn't to check the exact codegen, but two main things:
- it's not just working by copying the big buffer to the stack and back out
- it's doing some kind of vectorization so it's loading things smarter than just byte-by-byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that goal deserve an inline comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment block for the test file.
c1b9092
to
9d37b4b
Compare
☔ The latest upstream changes (presumably #137848) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #138155) made this pull request unmergeable. Please resolve the merge conflicts. |
This replaces #134954, which was arguably overcomplicated.
Fixes #134713
Actually using the type passed to
ptr::swap_nonoverlapping
for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.(Except in
const
, which keeps doing the same thing as before to preserve @RalfJung's fix from #134689)Fixes #134946
I'd previously moved the swapping to use auto-vectorization on bytes, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around. This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values. (So you'll see
<4 x i64>
onx86-64-v3
for example.)