-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAES support #372
Comments
Note that AES-NI can already process more than one block-at-a-time by leveraging Instruction Level Parallelism (ILP). We have separate benchmarks for serial That said, it would probably be good to add VAES support for microarchitectures where it does provide performance benefits beyond what's possible with ILP. |
Absolutely, I am aware of that. Already using Those methods are the reason I looked inside wondering if it is also VAES-capable on top of instruction parallelism. |
Unfortunately, currently I do not have access to a machine with AVX-512 so can not work on it myself. If someone will work on this, it also could be worthwhile to also add support of VPCLMULQDQ to the |
Shouldn't be difficult to get server VM with AVX512 support, I can help with that if you'd like. |
@newpavlov ...though perhaps we could be explicit about it. |
Also I have several servers with AVX-512 support I can test on. |
@tarcieri The only annoying part should be plumbing in the autodetection module, but we can leave it for later, since VAES support will be gated either way (the relevant intrinsics are Nightly-only right now). I may draft a PR on this weekend or during next week, if you will not get to it before that.
I thought about using parallelism (VPCLMULQDQ can process four 128-bit blocks at once). But I forgot that GHASH/POLYVAL is inherently sequential, so we can not utilize it while processing one text... |
@newpavlov I opened a separate issue for VPCLMULQDQ here, I think that should be (potentially) fairly easy: RustCrypto/universal-hashes#184 POLYVAL/GHASH can be broken down into a parallelizable portion and a sequential portion... there's an accumulation of the output that is inherently sequential, but multiplication of the inputs can be performed in parallel. |
Vectorized AES can process more than one block at a time, greatly improving throughput, but it doesn't appear to be used by
aes
crate yet, which is unfortunate.The text was updated successfully, but these errors were encountered: