-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx2 decode fix #31
Avx2 decode fix #31
Conversation
@Nick-Nuon I simplified our benchmarks (so that it is faster) and I have added a table to our README. It is not bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the help and pointing these out : it didn't occur to me I might still have bugs.
I get similar results on my rig and otherwise I couldn't find any bugbears so I'm cool with merging. :)
To be fair, they were minor bugs. Everything else was fine. |
Merging. |
This is basically PR #30 with various minor fixes.
@Nick-Nuon you had two bugs:
Vector256<sbyte> chkVector = Avx2.AddSaturate(Avx2.Shuffle(checkValues.AsByte(), checkHash).AsSByte(), src.AsSByte());
, you hadVector256<sbyte> chkVector = Avx2.AddSaturate(Avx2.Shuffle(checkValues.AsByte(), checkHash).AsByte(), src.AsByte());
.The AVX results should be 3 times faster at least, but this PR helps a great deal because we no longer fall on the scalar path.
I am not sure I understand what is our performance limitation, but something is hurting us. Still, the results are not bad.
I invite you to run a benchmark.
❤️