Skip to content

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Feb 20, 2023

Currently, we use a non-blocking parser to transform input JSON to a JsonNode, then send it downstream, and finally convert it to the desired request type when the route arguments are bound. This patch delays parsing until the conversion to the route type. It consists of these parts:

  • A change to JsonContentProcessor. The JsonContentProcessor has to handle certain scenarios (application/x-json-stream, or our support for streaming a json array as a Publisher) where the input contains multiple json tokens, Instead of forwarding the input bytes to an async parser, they are first split into individual json values, that are then forwarded or parsed in bulk.
  • A new JsonCounter class. This class handles splitting the input into its individual json nodes. It is basically a limited JSON parser that counts braces. It is already very optimized, but it also has a special optimization when the content type is application/json, in which case it assumes only a single json value in the input (this is technically a breaking change, but breaks no tests). I also wrote fuzz tests for JsonCounter, I will submit them in a separate PR to https://github.com/micronaut-projects/micronaut-fuzzing .
  • A benchmark. It's not run by the build tools, but it is good to have around.
  • An option in NettyHttpServerConfiguration to re-enable eager parsing to JsonNode. JsonCounter is still used, so it's not 100% the old behavior, but it should be more compatible in some respects because it doesn't change the conversion logic anymore.
  • A new LazyJsonNode class. It contains a ByteBuffer with the actual unparsed bytes. It's a bit complicated because it contains a reference counted buffer that has to be released after conversion, but also sometimes multiple converters are called for the same LazyJsonNode (once to JsonNode, once to a user-defined type). To solve this, when there is a conversion to JsonNode, it keeps that JsonNode around (and releases the buffer). If there are future conversions to a specific type, it will use the JsonNode as the source instead.
  • A new JsonSyntaxException class. JsonMappers can throw this to signal a syntax error in the JSON, which will be reported differently than a data binding error.
  • Changes to JsonMapper API: The asynchronous parser is deprecated. There is a new readValue method that takes a ByteBuffer. The jackson implementation has an optimization for netty ByteBuffers.
  • New converters for LazyJsonNode. Also removed some old superfluous converters for JsonNode.
  • A change to RequestLifecycle to show JSON syntax error messages like we did previously, instead of the opaque 400 that we usually send for conversion errors.
  • Some test changes to reflect changes in error messages. Because we now parse the input in bulk, in some cases jackson can decorate errors with short snippets of the failing input data. Is this an issue?

There are a few potential incompatibilities with this change:

  • Non-standard JSON features when combined with JsonCounter (i.e. when streaming a JSON array, or when using application/x-json-stream) will break even when the JSON parser is configured to support them. e.g. if the user configured jackson to ignore comments, that may fail now.
  • When the input body is never bound, JSON syntax errors may not be caught.
  • JsonCounter only supports UTF-8, by design. This is permitted by the JSON standard, however Jackson also supports UTF-16 and UTF-32. imo this is a net benefit, to avoid potential parser differential vulnerabilities. I also made a jackson feature to match this, though JsonCounter is sufficient now: Add JsonFactory.Feature.CHARSET_DETECTION to disable charset detection (default to UTF-8) FasterXML/jackson-core#921
  • JsonMapper implementations that do not use JsonSyntaxException yet will have less verbose HTTP errors until they switch to JsonSyntaxException.

@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed: https://ge.micronaut.io/s/7svqiea27dciy

@github-actions
Copy link

❌ Java CI failed: https://ge.micronaut.io/s/7stdqwoinjbsq

@yawkat
Copy link
Member Author

yawkat commented Feb 21, 2023

oh there is a simple solution: don't scan when the content type is application/json, because for that type, only a single root value is valid anyway.

@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed:

@graemerocher
Copy link
Contributor

It isn't clear to me from your outline if there are benefits to this change.

@yawkat
Copy link
Member Author

yawkat commented Feb 21, 2023

@graemerocher pretty much just performance. but until it's faster in benchmarks, it stays draft.

@yawkat
Copy link
Member Author

yawkat commented Feb 21, 2023

and tbh if this turns out to be worth it with the application/json exception, then most of the optimizations in JsonCounter are probably not worth it anymore.

@github-actions
Copy link

❌ Java CI failed: https://ge.micronaut.io/s/vpp22bmohb2ko

@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed:

@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed:

@yawkat
Copy link
Member Author

yawkat commented Feb 22, 2023

Okay with this change, the results look a lot better:

image

Top is before this change, bottom is after this change.

The benchmark basically parses an array of strings. The green runs have 6 strings of varying length (length=6 for light green, length=1000 for normal green, length=100000 for dark green). The other two runs have the same string length 6, but different array lengths (1000 elements for blue, 100000 elements for pink).

As expected, this patch shows an improvement when there are many JSON tokens (the blue and pink cases). This is also imo the more realistic scenario. At this point, this PR is still slightly worse for extremely long strings, but only maybe 10%, and this is an extreme case (100000 bytes per string). I think this is acceptable to get a >30% speedup for the many-tokens case.

WDYT @graemerocher ?

@graemerocher
Copy link
Contributor

Looks good 👍

@yawkat yawkat changed the title Buffered parsing draft Buffered JSON parsing Feb 22, 2023
@yawkat yawkat marked this pull request as ready for review February 22, 2023 09:16
@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed:

@github-actions
Copy link

❌ Java CI failed: https://ge.micronaut.io/s/mte7ykpbtddze

@github-actions
Copy link

❌ GraalVM CE CI 19 dev failed:

@github-actions
Copy link

❌ Java CI failed: https://ge.micronaut.io/s/ifia44jk6duim

@graemerocher
Copy link
Contributor

not compiling

# Conflicts:
#	buffer-netty/src/main/java/io/micronaut/buffer/netty/NettyFeature.java
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 15 Code Smells

80.7% 80.7% Coverage
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants