-
-
Notifications
You must be signed in to change notification settings - Fork 461
perf: optimize sample_floyd by unsafe APIs #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
perf: optimize sample_floyd by unsafe APIs #1622
Conversation
2bdea23
to
07d4e92
Compare
Thanks for the PR. My main concern here is simply: should we be adding more CC @RalfJung |
Thanks for considering! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two unsafe operations here; I'd like to see the perf impact of each.
if let Some(pos) = indices.iter().position(|&x| x == t) { | ||
*indices.get_unchecked_mut(pos) = j; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to use an unsafe
function here at all. Try re-writing the iterator with a simple for loop:
for pos in 0..indices.len() {
if indices[pos] == t {
indices[pos] = j;
break;
}
}
Sure, that uses index operations but the compiler should be able to remove the range checks. (I'm a little surprised if it can't here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm, I try to rewrite this part with for loop. The benchmark result shows that it has similar performance to the original one.. It seems that the part optimized by unsafe
APIs can not be achieved by the compiler easily.
ptr::write(ptr.add(len), t); | ||
len += 1; | ||
indices.set_len(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually any faster than push
? I suppose it may be (eliminating the capacity check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for sure :)
Not sure what exactly you want my input on here. :) Happy to consult on whether some use of unsafe is sound or not, but that doesn't seem to be the question here? As to whether you think the bit of unsafe is worth the perf gain -- that's a maintainer decision. There's absolutely cases where the perf gain is important enough to justify a bit of unsafe and there are other cases where it's not worth it. I don't have to maintain this code going forward so I can't make this decision for you. :)
Of course, testing != verification, so there could still be UB in edge cases not covered by the tests. |
Thank you for your review! Here are the benchmark results of using the unsafe functions. Only use
Additionally use
From my perspective, the elimination of bounds checking in |
CHANGELOG.md
entrySummary
This PR uses unsafe APIs to boost performance of
sample_floyd
. The optimization is totally safe because the index is bounded by the length of the vec.Motivation
Rust's bounds checking are sometimes unnecessary. Removing bounds checking by unsafe APIs can boost its performance.This optimization makes related functions more faster with safety ensured.
Details
The benchmark results from my environment is listed as below.