-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sha1: implement collision detection and mitigation #566
sha1: implement collision detection and mitigation #566
Conversation
1077909
to
5088ffe
Compare
sha1/tests/mod.rs
Outdated
hex!("e1761773e6a35916d99f891b77663e6405313587"), | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an explicit test for the default values of checked::Config
, somewhere, would have helped me in understanding the collision_test!
macro, as it wasn't obvious to me that many of these flags that are turned off, are on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the api to use a builder pattern, which should make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I am spoiled already, knowing what I know. However, since it's the 'checked' version of the hash, I think it's absolutely fair to assume that all checking features are enabled by default unless they are disabled explicitly. The builder pattern doesn't make the more discoverable in tests I think (but an assertion for the desired value of the fields would have made it clear).
Using a builder for that might be a matter of whether that pattern was used in this crate before or not. I'd personally lean towards fields in a struct mainly because they are well-known, won't change (it's all part of the spec implemented here), and are easy enough to use.
Benchmarks on my macbook (arm M1), using the defaults, so both detection and mitigation enabled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks on my macbook (arm M1), using the defaults, so both detection and mitigation enabled.
test sha1_10 ... bench: 12 ns/iter (+/- 0) = 833 MB/s test sha1_100 ... bench: 107 ns/iter (+/- 3) = 934 MB/s test sha1_1000 ... bench: 1,055 ns/iter (+/- 48) = 947 MB/s test sha1_10000 ... bench: 10,521 ns/iter (+/- 659) = 950 MB/s test sha1_collision_10 ... bench: 19 ns/iter (+/- 1) = 526 MB/s test sha1_collision_100 ... bench: 180 ns/iter (+/- 6) = 555 MB/s test sha1_collision_1000 ... bench: 1,660 ns/iter (+/- 55) = 602 MB/s test sha1_collision_10000 ... bench: 16,414 ns/iter (+/- 509) = 609 MB/s
Thanks a lot for adding a benchmark, great for a better feel for what to expect.
I think Git gets about 1.2GB/s, and gitoxide
is in the same ballpark when the ASM version of the implementation is used (which is great). Somehow, Git manages to do that with collision detection (probably), it's quite a bit magical.
However, it's without any doubt great to have the implementation and probably it also has to be the new default as correctness has to win, and those who can or want will get a switch to turn it off to get back to the previous (current) behaviour where collisions aren't checked for at all.
This will probably mean that a typical kernel pack, that decompresses to ~100GB, will need ~80s more CPU time. But probably I should actually test this with the version from this branch as other factors may matter as well, like maintaining the additional state that can start to accumulate if there is more than 1m streams hashed per second (but on 10 cores with one hasher per core).
I shall be back (but that really shouldn't hold up this PR at all).
sha1/tests/mod.rs
Outdated
hex!("e1761773e6a35916d99f891b77663e6405313587"), | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I am spoiled already, knowing what I know. However, since it's the 'checked' version of the hash, I think it's absolutely fair to assume that all checking features are enabled by default unless they are disabled explicitly. The builder pattern doesn't make the more discoverable in tests I think (but an assertion for the desired value of the fields would have made it clear).
Using a builder for that might be a matter of whether that pattern was used in this crate before or not. I'd personally lean towards fields in a struct mainly because they are well-known, won't change (it's all part of the spec implemented here), and are easy enough to use.
Here is my measurements, for informative purposes only, to demonstrate the pack resolution performance - an operation that happens for every clone or fetch. BaselineThis is comparing the assembly version with the one without assembly, with the default sha1 0.10.6 crate from crates.io.
With default features, ASM didn't really kick in apparently. Collision DetectionWith the
Comparison by hand
We are seeing a little less than 1/3rd of the ASM performance. SummaryHere I ran out of time as for some reason, when patching in the modified version of the crate alone (which required a clone and a version modification to match the one available locally), it triggered a lot of compile errors across the codebase that had nothing to do with it. Usually along the line of However, if the benchmarks above are an indication, we might be talking 609/950 (roughly 64%) reduction of this particular operation. I truly wonder how Git can be this fast, as its hash performance doesn't seem to need ASM and seemingly supports collision detection as well. Usability in
|
Do you have a link to that ASM code, would love to check out how they do it, after simplifying the code a bit, I think there is probably more that can be done, but adding an optimised version certainly can come later. |
This is quite surprising indeed, as the implementation here is based on the exact same code |
When running on my linux machine with an AMD that has Sha intrinsics
|
These can be activated by adding the |
Ah yeah, I know more about these than I want to actually 🤣 , on The regular ASM implementation could be ported to do detection I believe, as it does rounds manually (just checked). But the overall work is still around 1.5x at least, soo... |
There can only be one ;)! But in all seriousness, it's great to hear as it would remove one reason for incompatibility that was plaguing From my tests, I have filled them into the original comment, performance (with collision-checks) looks like close to 1/3rd of what it is with ASM, while the latest version, 0.11 with ASM feature detection, seems to not kick in ASM on my machine (M1) or with the But there is no way around biting that bullet, even though I will definitely put in an escape hatch for those who want to live like it's 2016😅. In any case, and in case it's not clear over my ramblings about performance: Absolutely stunning work, I am loving it! (it's just terrible to realize that a lot of the advertised performance came from not comparing apples to apples) In case it matters, I also had to make these unrelated changes because the Rust compiler didn't like it anymore when using the code in this PR. I have no explanation. commit 798b11a57977e353ecb53cd7b85f0525e74c15d0
Author: Sebastian Thiel <[email protected]>
Date: Fri Feb 23 19:00:41 2024 +0100
TBD
diff --git a/gix-object/src/commit/message/body.rs b/gix-object/src/commit/message/body.rs
index 0da6d1d14..6e84f3e2e 100644
--- a/gix-object/src/commit/message/body.rs
+++ b/gix-object/src/commit/message/body.rs
@@ -33,7 +33,7 @@ pub struct TrailerRef<'a> {
fn parse_single_line_trailer<'a, E: ParserError<&'a [u8]>>(i: &mut &'a [u8]) -> PResult<(&'a BStr, &'a BStr), E> {
*i = i.trim_end();
- let (token, value) = separated_pair(take_until(1.., b":".as_ref()), b": ", rest).parse_next(i)?;
+ let (token, value) = separated_pair(take_until(1.., &b":"[..]), b": ", rest).parse_next(i)?;
if token.trim_end().len() != token.len() || value.trim_start().len() != value.len() {
Err(winnow::error::ErrMode::from_error_kind(i, ErrorKind::Fail).cut())
diff --git a/gix-pack/src/data/input/bytes_to_entries.rs b/gix-pack/src/data/input/bytes_to_entries.rs
index 7450e9134..e089250eb 100644
--- a/gix-pack/src/data/input/bytes_to_entries.rs
+++ b/gix-pack/src/data/input/bytes_to_entries.rs
@@ -138,7 +138,7 @@ where
let crc32 = if self.compressed.crc32() {
let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
- let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut())?;
+ let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut_slice())?;
let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
Some(gix_features::hash::crc32_update(state, &compressed))
} else {
diff --git a/gix-pack/src/data/input/entry.rs b/gix-pack/src/data/input/entry.rs
index 7d3d9b3cb..cb5c2af80 100644
--- a/gix-pack/src/data/input/entry.rs
+++ b/gix-pack/src/data/input/entry.rs
@@ -33,7 +33,7 @@ impl input::Entry {
let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
let header_len = self
.header
- .write_to(self.decompressed_size, &mut header_buf.as_mut())
+ .write_to(self.decompressed_size, &mut header_buf.as_mut_slice())
.expect("write to memory will not fail");
let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
gix_features::hash::crc32_update(state, self.compressed.as_ref().expect("we always set it"))
diff --git a/gix-worktree/src/stack/state/mod.rs b/gix-worktree/src/stack/state/mod.rs
index 30e3c609f..07bffbfbc 100644
--- a/gix-worktree/src/stack/state/mod.rs
+++ b/gix-worktree/src/stack/state/mod.rs
@@ -96,7 +96,7 @@ impl State {
let a1_backing;
#[cfg(feature = "attributes")]
let a2_backing;
- let names = match self {
+ let names: &[(&bstr::BStr, Option<_>)] = match self {
State::IgnoreStack(ignore) => {
a1_backing = [(
ignore.exclude_file_name_for_directories.as_bytes().as_bstr(), |
Two questions, (1) do you happen to have performance numbers for the version |
Regarding 1): No, and it's not truly comparable as Git has a lot of extra overhead and it doesn't scale with cores at all. Maybe in single-core mode it can be somewhat comparable though, especially with a profiler to see where it spends time.
If that is normalized to a single core with So I think the 1.5x (better 1.4x) number for the ASM version seems to be what Git has, more or less, assuming all else equal. It's a respectable result that Git presents on a single core. Regarding 2) All I know is the source in Just to be sure it wasn't laced with too much fairy dust, I built e02ecfcc534e2021aae29077a958dd11c3897e4c of |
have you tried building with rustflags="target-cpu=native" |
That's a valid point! It doesn't make a difference for this branch, it's virtually unchanged. After updating to the most recent version in this branch I did notice a 5% speedup though. I think what I will end up with is to use v0.10 for the performance-toggle, and the checked version here for everything else. Also, I couldn't resist to make measurements on Git's hashing alone, using a 200MB file as input.
If that is extrapolated, we get to a whopping 1.5GB/s - 1.7GB/s. With a bigger file of 1326MB, I get:
Which puts it at around 1.64GB/s . And because I couldn't believe that a bunch of C in a file with a casual build can be this fast, I profiled the run with the 1.3GB file, and here is the answer to all of our questions. The entrypoint in Git seems to be here, which clearly talks about platform specific implementations. All the configuration for these seems to come from here. Maybe they deal with shattered, maybe they don't? I don't know how to validate that as unfortunately, That was an interesting excursion, I learned something. Maybe in the end, in order to be competitive, I'd need a |
Okay, that hash is not using collision detection, that is using the regular accelerated sha1 routines, which is why it can be so fast. I suspect what git is doing is that it only uses the hash collision version on very specific points, and not internally, to avoid the slow down. |
The regression comes from here: #542 the asm versions are actually fully removed :( I guess it really is time to port the assembly to |
I looked for 15 minutes and could only conclude that this seems to be a compile-time-only affair. Git doesn't always get compiled with its own implementation, but if enabled it would detect collisions and die when one was found. But I might well be wrong with that. It feels a bit like the mitigation was quickly made available, but over time was supplanted by faster implementations again. Git also doesn't have a choice for this beyond compile time it seems (or I can't find it), which makes me think that if, for some reason, the runtime switching doesn't work or reduces performance, I could also go back to a compile time switch and probably would reduce complexity that way. And I am saying this in case it would simplify things here as well to go compile-time only. |
This is really great work. Thank you both for the thoughtful back-and-forth, for documenting your sleuthing here, and for valuing correctness over speed. |
sha1/src/checked/ubc_check.rs
Outdated
//! Copyright 2017 Marc Stevens <[email protected]>, Dan Shumow <[email protected]> | ||
//! Distributed under the MIT Software License. | ||
//! See accompanying file LICENSE.txt or copy at | ||
//! https://opensource.org/licenses/MIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is MIT only which slightly changes the licensing situation for this crate, which is otherwise Apache 2.0+MIT.
We could note that the code for the collision
feature is MIT-only in the toplevel README.md, or potentially ask the original authors if they would be okay with the resulting translation being dual licensed as Apache 2.0 in addition to MIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cr-marcstevens according to the license file you are the original license holder, would you be okay with us relicensing under apache 2.0 + MIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also @shumow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Friedel, Tony,
Thanks for your nice work making our code also available in rust!
And I'm certainly fine with dual licensing Apache 2.0+MIT (with the proper attribution and copyright notices).
But wouldn't the same situation also apply to the rust translation of our sha1.c
code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm certainly fine with dual licensing Apache 2.0+MIT (with the proper attribution and copyright notices).
awesome, thank you!
But wouldn't the same situation also apply to the rust translation of our sha1.c code?
Yes very much, I will be doing a final pass of notes for attribution on the relevant files in a bit. I'll ping you when it is ready for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a detailed reference to src/checked.rs
and pointers to this and the original files in both checked/ubc_check.rs
and checked/compress.rs
. Please let me know if this is good for you
98e498e
to
d2afd9e
Compare
Hi! Is there anything left to get this merged? Sadly some of us are stuck with SHA-1 and this PR would make it a little bit less... explosive. Thanks for your time! |
Sorry, I forgot to return to this PR. I will merge it and release the crate shortly. |
This implements sha1 collision detection, including rehashing for mitigation.
As the code is 1-1 based on the version that git uses, the mitigation hashes should match as well.
Closes #204
Open TODOs
Limitations
Can only be used with the pure software implementation, asfaiu. The reason for this is, that the algorithm needs access to all intermediary states, and so processing 4 rounds at once through the intrinsics will screw things up.
For that reason I have made it it's own implementation, instead of adapting the existing
compress
implementations.It might be possible to add support for the "simpler" assembly implementations that do round for round processing, but I think this could be a follow up in the future, if this is too slow for these platforms.
Prior art