Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure swap_nonoverlapping is really always untyped #137412

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

This replaces #134954, which was arguably overcomplicated.

Fixes #134713

Actually using the type passed to ptr::swap_nonoverlapping for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in const, which keeps doing the same thing as before to preserve @RalfJung's fix from #134689)

Fixes #134946

I'd previously moved the swapping to use auto-vectorization on bytes, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around. This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values. (So you'll see <4 x i64> on x86-64-v3 for example.)

@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2025

r? @ibraheemdev

rustbot has assigned @ibraheemdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 22, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2025

The Miri subtree was changed

cc @rust-lang/miri

@ibraheemdev
Copy link
Member

Passing this one along because I'm not the best person to review this. r? libs

@rustbot rustbot assigned jhpratt and unassigned ibraheemdev Feb 26, 2025
@jhpratt
Copy link
Member

jhpratt commented Feb 26, 2025

I'm mostly sticking to the trivial PRs at the moment as my review capacity is limited. Re-rolling again.

r? libs

@rustbot rustbot assigned cuviper and unassigned jhpratt Feb 26, 2025
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes LGTM for the non-const path; see the comment for the const path.

I did not look at the tests; I think that needs an LLVM/codegen expert. @nikic maybe?

@RalfJung
Copy link
Member

RalfJung commented Feb 27, 2025 via email

Comment on lines +69 to +79
// Ensure we do better than a long run of byte copies,
// see <https://github.com/rust-lang/rust/issues/134946>

// CHECK-NOT: movb
// CHECK-COUNT-8: movups{{.+}}xmm
// CHECK-NOT: movb
// CHECK-COUNT-4: movq
// CHECK-NOT: movb
// CHECK-COUNT-4: movl
// CHECK-NOT: movb
// CHECK: retq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: the codegen tests here are more about demonstrating what actually happens on a variety of types, and the exact details don't matter that much.

Reviewing the rust code is enough to know that LLVM will swap it, but for example here what we're trying to see is that it's not just a huge row of movbs like you can see in https://rust.godbolt.org/z/MKfxn1Tjr

@@ -52,39 +49,61 @@ pub fn swap_slice(x: &mut [KeccakBuffer], y: &mut [KeccakBuffer]) {
}
}

// But for a large align-1 type, vectorized byte copying is what we want.

type OneKilobyteBuffer = [u8; 1024];

// CHECK-LABEL: @swap_1kb_slices
#[no_mangle]
pub fn swap_1kb_slices(x: &mut [OneKilobyteBuffer], y: &mut [OneKilobyteBuffer]) {
// CHECK-NOT: alloca
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, the goal here isn't to check the exact codegen, but two main things:

  1. it's not just working by copying the big buffer to the stack and back out
  2. it's doing some kind of vectorization so it's loading things smarter than just byte-by-byte.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that goal deserve an inline comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment block for the test file.

@scottmcm scottmcm force-pushed the redo-swap branch 2 times, most recently from c1b9092 to 9d37b4b Compare March 1, 2025 05:57
@bors
Copy link
Contributor

bors commented Mar 1, 2025

☔ The latest upstream changes (presumably #137848) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 7, 2025

☔ The latest upstream changes (presumably #138155) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-log-analyzer

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
8 participants