Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize base64 decoding #27

Open
k0ekk0ek opened this issue Feb 20, 2023 · 12 comments
Open

Vectorize base64 decoding #27

k0ekk0ek opened this issue Feb 20, 2023 · 12 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@k0ekk0ek
Copy link
Contributor

k0ekk0ek commented Feb 20, 2023

Base64 encoding is used quite a lot. i.e. in RRSIG and DNSKEY records. Wojciech Muła and Daniel Lemire wrote the paper "Faster Base64 Encoding and Decoding Using AVX2 Instructions" in which they outline use of SIMD instructions to improve performance. The current implementation is very basic and improving it is extremely likely to result in a significant performance boost. More research is available too and even improving the current scalar implementation is a big gain.

The flamegraph for parsing an .se zone looks nice and balanced, except for base64 data as the parser spends about 30% of the time in parse_base64.

@k0ekk0ek k0ekk0ek added enhancement New feature or request good first issue Good for newcomers labels Feb 20, 2023
@k0ekk0ek k0ekk0ek added this to the Release 0.2.0 milestone Feb 20, 2023
@k0ekk0ek k0ekk0ek added the help wanted Extra attention is needed label Feb 20, 2023
@lemire
Copy link
Collaborator

lemire commented Jun 20, 2023

@k0ekk0ek Are spaces allowed within the codes? That's a big limiting factor pre-AVX-512.

@k0ekk0ek
Copy link
Contributor Author

Sort-of, but I don't think they're an issue(?) Spaces (among others) are delimiters within zone files so we get a separate token for each block of characters (we require this data to be specified as contiguous, so no spaces inside the token). However, we will need to track state because there may be multiple tokens (at least for RRSIG (presentation format)).

An example of how this occurs in zone data:

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                               20030220173103 2642 example.com.
                               oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                               PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                               B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                               GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                               J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

Looking at .se they cut the signature up into blocks of 56 characters (without ( ... ), so one line), while .com seems to keep it as one set of contiguous characters.

@lemire
Copy link
Collaborator

lemire commented Nov 11, 2023

This is an important problem. Node.js has a related issue.

See
nodejs/performance#128

I want to work on solving this issue as soon as I can.

@k0ekk0ek
Copy link
Contributor Author

Sounds good @lemire! The client9 decoders are probably good to have as fallback(?) At least the base64 one got adopted and modified by @aklomp. The same can be done for base16 (hex). At least, I am planning to do so. The stream decoding interface fits simdzone nicely. Something I noticed in the modified version is that whole blocks need to be written. So the base64 AVX2 version requires at least 45 bytes of input (8 bytes for the fallback) to ensure proper output. Since simdzone guarantees enough padding in input and output buffers, we can perhaps let that requirement go. That may make things speedier/easier(?) Of course, this may or may not apply to Node.js. I'm currently focused on getting the SVCB RRTYPE done, after that testing and api cleanup to get to a first release, but if you need me to look at something, I'd be happy to help!

@lemire
Copy link
Collaborator

lemire commented Nov 12, 2023

The main challenge I am concerned with are the spaces in the input… The overflows can be handled in some way but I agree that simdzone can benefit of faster routines than other systems.

@lemire
Copy link
Collaborator

lemire commented Dec 13, 2023

Of course, this may or may not apply to Node.js

I am not sure how much of a difference this makes. It is an empirical issue and it is easy to implement both padded and strict versions.

@k0ekk0ek
Copy link
Contributor Author

I/we'd have to test. Adding the faster base64 decoder made a big difference for the .se zone. Going by that example, any TLD zone that has widespread DNSSEC usage (such as .nl) will benefit. A really simple test would be to replace consecutive dec_loop_generic_32_inner calls in dec_loop_generic_32 by one (or more) SIMD function replacements. Also note that we don't have to account for spaces in the input. That part is handled automatically because they're separate tokens (base64 is to be written as <contiguous> not <quoted>, except for svcb parameters, but there we simply do not allow spaces), so we can quite easily select the right decoder based on token length.

@k0ekk0ek
Copy link
Contributor Author

I'd have to study the library a bit better, but the loop currently selects for rounds 8, 4, 2 and 1. As 8x4 = 32 and 4x4 = 16, we could simply add AVX2 and SSE versions there.

@lemire
Copy link
Collaborator

lemire commented Dec 14, 2023

I'll be working on this (starting 'now'). My current 'vision' is to make it configurable so one can explore the usage space.

@k0ekk0ek
Copy link
Contributor Author

Sounds good 👍

@lemire
Copy link
Collaborator

lemire commented Apr 10, 2024

So I think I solved the problem, although maybe not for simdzone directly.

The way the problem appears in simdzone is somewhat non-standard and I wanted to work on a more generic problem. So I have worked on the WHATWG forgiving-base64 problem: you must decode base64 in a string but there might be arbitrary ASCII spaces all over the string. It is part of the latest release of simdutf.

We have currently integrating it in Node.js with decent results: nodejs/node#52428 It passes all the tests in simdutf and in Node.js, so it is pretty solid and it is fast.

Whether it applies to simdzone as is is trickier because simdzone has the pattern where the location of the white spaces are known before you start parsing.

So maybe it is worth experimenting.

@k0ekk0ek
Copy link
Contributor Author

k0ekk0ek commented Apr 11, 2024

Thanks for the update @lemire. Looked at the numbers, that's another impressive boost!

(sorry for the overlong/overcomplete reply)

Indeed, simdzone tokenizes based on whitespace (amongst other delimiters), so situation is a little different. The work may still be very useful. There are quite some RR TYPEs where the RDATA section contains base64 data. e.g. OPENPGPKEY, DNSKEY, RRSIG, etc. While it may be possible to pop all tokens of the tape and use the work to decode those sequences, simdzone does not read the entire file at once, but in blocks. Popping a token may lead to the parser flushing the buffer and proceding to read the next section. The scanner merely guarantees the complete token is available until the next one is read. Of course, it's entirely possible to alter tape operation for these types of sequences. Whether that's worth the effort is something to investigate. Another complexity is the possible uses of parentheses to wrap newlines. For example, this is a valid RRSIG RR (RFC4034 section 3.2):

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                                  20030220173103 2642 example.com.
                                  oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                                  PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                                  B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                                  GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                                  J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

However, the RDATA section may use parentheses, without nesting, anywhere. i.e. this is valid too:

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                                  20030220173103 2642 example.com.
                                  oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                                  ) ( PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                                  ) ( B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                                  ) ( GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                                  ) ( J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

Of course, no one ever writes/generates the latter, but it is something to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants