Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294
+742
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new, high-performance
Readimplementation,BufferedIoRead, to significantly speed up deserialization fromstd::io::Readsources.The Problem
The current
IoReadimplementation is a simple byte-by-byte iterator over anio::Readsource. Because it operates one byte at a time, it has two major performance drawbacks:SliceRead's powerful,memchr-based optimizations (likeskip_to_escape) because it doesn't have a slice to operate on.This remains true even when a user manually wraps their reader in a
std::io::BufReader.IoReadis unaware of the underlying buffer and cannot take advantage of it, so the per-byte overhead remains.The Solution:
BufferedIoReadThis new implementation,
BufferedIoRead<R, B>, wraps anio::Readsource and uses an internal buffer (generic overAsMut<[u8]>).Its core optimization is simple but powerful: it creates a temporary
SliceReadover its internal buffer.This allows the deserializer to use the hyper-optimized
SliceReadpaths (likeskip_to_escape) for large chunks of data at a time. The per-byte bookkeeping logic is now deferred and runs only once per buffer refill, rather than once per byte.The implementation intelligently handles all parsing logic (including strings, escape sequences, and
raw_value) across buffer boundaries, ensuring correctness while maximizing performance.Performance Benchmarks
The results speak for themselves. Benchmarking against the
canada.json(2.2MB) file shows a ~28.6% speedup for streaming deserialization compared to the currentfrom_reader+BufReaderapproach.BufferedIoReadcloses a significant portion of the gap between streaming parsing (from_reader) and non-streaming parsing (from_slice).from_sliceread_to_end_then_slicefrom_reader(stdBufReader8k)BufferedIoRead(ours, 8k buffer)BufferedIoRead(ours, 16k buffer)Testing
Benchmark Methodology
Details
Correctness Tests
This PR includes extensive new integration tests (
tests/buffered_io.rs) that "torture test" the buffer boundary logic. These tests use a customSlowReaderto force buffer refills at critical parsing points (e.g., in the middle of a string, during a\uescape sequence, and while parsing aRawValue) to ensure correctness.