-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize base64 decoding #27
Comments
@k0ekk0ek Are spaces allowed within the codes? That's a big limiting factor pre-AVX-512. |
Sort-of, but I don't think they're an issue(?) Spaces (among others) are delimiters within zone files so we get a separate token for each block of characters (we require this data to be specified as contiguous, so no spaces inside the token). However, we will need to track state because there may be multiple tokens (at least for RRSIG (presentation format)). An example of how this occurs in zone data:
Looking at |
This is an important problem. Node.js has a related issue. I want to work on solving this issue as soon as I can. |
Sounds good @lemire! The client9 decoders are probably good to have as fallback(?) At least the base64 one got adopted and modified by @aklomp. The same can be done for base16 (hex). At least, I am planning to do so. The stream decoding interface fits simdzone nicely. Something I noticed in the modified version is that whole blocks need to be written. So the base64 AVX2 version requires at least 45 bytes of input (8 bytes for the fallback) to ensure proper output. Since simdzone guarantees enough padding in input and output buffers, we can perhaps let that requirement go. That may make things speedier/easier(?) Of course, this may or may not apply to Node.js. I'm currently focused on getting the SVCB RRTYPE done, after that testing and api cleanup to get to a first release, but if you need me to look at something, I'd be happy to help! |
The main challenge I am concerned with are the spaces in the input… The overflows can be handled in some way but I agree that simdzone can benefit of faster routines than other systems. |
I am not sure how much of a difference this makes. It is an empirical issue and it is easy to implement both padded and strict versions. |
I/we'd have to test. Adding the faster base64 decoder made a big difference for the |
I'd have to study the library a bit better, but the loop currently selects for rounds 8, 4, 2 and 1. As 8x4 = 32 and 4x4 = 16, we could simply add AVX2 and SSE versions there. |
I'll be working on this (starting 'now'). My current 'vision' is to make it configurable so one can explore the usage space. |
Sounds good 👍 |
So I think I solved the problem, although maybe not for simdzone directly. The way the problem appears in simdzone is somewhat non-standard and I wanted to work on a more generic problem. So I have worked on the WHATWG forgiving-base64 problem: you must decode base64 in a string but there might be arbitrary ASCII spaces all over the string. It is part of the latest release of simdutf. We have currently integrating it in Node.js with decent results: nodejs/node#52428 It passes all the tests in simdutf and in Node.js, so it is pretty solid and it is fast. Whether it applies to simdzone as is is trickier because simdzone has the pattern where the location of the white spaces are known before you start parsing. So maybe it is worth experimenting. |
Thanks for the update @lemire. Looked at the numbers, that's another impressive boost! (sorry for the overlong/overcomplete reply) Indeed, simdzone tokenizes based on whitespace (amongst other delimiters), so situation is a little different. The work may still be very useful. There are quite some RR TYPEs where the RDATA section contains base64 data. e.g.
However, the RDATA section may use parentheses, without nesting, anywhere. i.e. this is valid too:
Of course, no one ever writes/generates the latter, but it is something to consider. |
Base64 encoding is used quite a lot. i.e. in
RRSIG
andDNSKEY
records. Wojciech Muła and Daniel Lemire wrote the paper "Faster Base64 Encoding and Decoding Using AVX2 Instructions" in which they outline use of SIMD instructions to improve performance. The current implementation is very basic and improving it is extremely likely to result in a significant performance boost. More research is available too and even improving the current scalar implementation is a big gain.The text was updated successfully, but these errors were encountered: