Skip to content

perf: optimize sample_floyd by unsafe APIs #1622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Unparalleled-Calvin
Copy link

@Unparalleled-Calvin Unparalleled-Calvin commented Apr 3, 2025

  • Added a CHANGELOG.md entry

Summary

This PR uses unsafe APIs to boost performance of sample_floyd. The optimization is totally safe because the index is bounded by the length of the vec.

Motivation

Rust's bounds checking are sometimes unnecessary. Removing bounds checking by unsafe APIs can boost its performance.This optimization makes related functions more faster with safety ensured.

Details

The benchmark results from my environment is listed as below.

seq_slice_choose_multiple_1_of_1000
                        time:   [17.377 ns 17.480 ns 17.597 ns]
                        change: [-5.6211% -4.8379% -4.0867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

seq_slice_choose_multiple_10_of_100
                        time:   [53.166 ns 53.654 ns 54.192 ns]
                        change: [-7.2861% -6.4089% -5.4998%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

@dhardy
Copy link
Member

dhardy commented Apr 5, 2025

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

@Unparalleled-Calvin
Copy link
Author

Thanks for considering!
The unsafe block is minimized with clear // safety comments and encapsulated by the safe sample_floyd function. And tests/MIRI can confirm no UB. So I think it is ok to use unsafe code and gain more performance.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two unsafe operations here; I'd like to see the perf impact of each.

Comment on lines +457 to +459
if let Some(pos) = indices.iter().position(|&x| x == t) {
*indices.get_unchecked_mut(pos) = j;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use an unsafe function here at all. Try re-writing the iterator with a simple for loop:

for pos in 0..indices.len() {
    if indices[pos] == t {
        indices[pos] = j;
        break;
    }
}

Sure, that uses index operations but the compiler should be able to remove the range checks. (I'm a little surprised if it can't here.)

Copy link
Author

@Unparalleled-Calvin Unparalleled-Calvin Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I try to rewrite this part with for loop. The benchmark result shows that it has similar performance to the original one.. It seems that the part optimized by unsafe APIs can not be achieved by the compiler easily.

Comment on lines +460 to +462
ptr::write(ptr.add(len), t);
len += 1;
indices.set_len(len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually any faster than push? I suppose it may be (eliminating the capacity check).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for sure :)

@RalfJung
Copy link
Contributor

RalfJung commented Apr 7, 2025

Thanks for the PR.

My main concern here is simply: should we be adding more unsafe code for ~5% perf gains?

CC @RalfJung

Not sure what exactly you want my input on here. :) Happy to consult on whether some use of unsafe is sound or not, but that doesn't seem to be the question here? As to whether you think the bit of unsafe is worth the perf gain -- that's a maintainer decision. There's absolutely cases where the perf gain is important enough to justify a bit of unsafe and there are other cases where it's not worth it. I don't have to maintain this code going forward so I can't make this decision for you. :)

And tests/MIRI can confirm no UB.

Of course, testing != verification, so there could still be UB in edge cases not covered by the tests.

@Unparalleled-Calvin
Copy link
Author

Unparalleled-Calvin commented Apr 12, 2025

Thank you for your review! Here are the benchmark results of using the unsafe functions.

Only use *indices.get_unchecked_mut(pos) = j; [Compared to the original version]

seq_slice_choose_multiple_1_of_1000
                        time:   [19.089 ns 19.212 ns 19.346 ns]
                        change: [-0.8332% +0.7077% +2.5236%] (p = 0.46 > 0.05)
                        No change in performance detected.

seq_slice_choose_multiple_10_of_100
                        time:   [58.523 ns 58.777 ns 59.060 ns]
                        change: [-6.5131% -5.2792% -4.0724%] (p = 0.00 < 0.05)
                        Performance has improved.

Additionally use ptr::write instead of push [compared to above]

seq_slice_choose_multiple_1_of_1000
                        time:   [18.896 ns 19.158 ns 19.446 ns]
                        change: [-5.8287% -3.9616% -2.4368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)

seq_slice_choose_multiple_10_of_100
                        time:   [56.478 ns 56.843 ns 57.254 ns]
                        change: [-4.5315% -3.8087% -3.0675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)

From my perspective, the elimination of bounds checking in [] operation contributes more optimization in this PR. And the rewriting of push also brings effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants