Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294

veecore · 2025-10-29T02:15:16Z

This PR introduces a new, high-performance Read implementation, BufferedIoRead, to significantly speed up deserialization from std::io::Read sources.

The Problem

The current IoRead implementation is a simple byte-by-byte iterator over an io::Read source. Because it operates one byte at a time, it has two major performance drawbacks:

Per-Byte Bookkeeping: It must check for IO errors and update position tracking (line/column) for every single byte processed.
Inability to Optimize: It cannot use SliceRead's powerful, memchr-based optimizations (like skip_to_escape) because it doesn't have a slice to operate on.

This remains true even when a user manually wraps their reader in a std::io::BufReader. IoRead is unaware of the underlying buffer and cannot take advantage of it, so the per-byte overhead remains.

The Solution: `BufferedIoRead`

This new implementation, BufferedIoRead<R, B>, wraps an io::Read source and uses an internal buffer (generic over AsMut<[u8]>).

Its core optimization is simple but powerful: it creates a temporary SliceRead over its internal buffer.

This allows the deserializer to use the hyper-optimized SliceRead paths (like skip_to_escape) for large chunks of data at a time. The per-byte bookkeeping logic is now deferred and runs only once per buffer refill, rather than once per byte.

The implementation intelligently handles all parsing logic (including strings, escape sequences, and raw_value) across buffer boundaries, ensuring correctness while maximizing performance.

Performance Benchmarks

The results speak for themselves. Benchmarking against the canada.json (2.2MB) file shows a ~28.6% speedup for streaming deserialization compared to the current from_reader + BufReader approach.

BufferedIoRead closes a significant portion of the gap between streaming parsing (from_reader) and non-streaming parsing (from_slice).

Method	Time (canada.json, 2.2MB)	Throughput	Notes
`from_slice`	~65.0 ms	~33.0 MiB/s	Theoretical ideal (baseline)
`read_to_end_then_slice`	~68.9 ms	~31.1 MiB/s	Not a streaming solution (loads all to RAM)
`from_reader` (std `BufReader` 8k)	~112.2 ms	~19.1 MiB/s	Current streaming perf
`BufferedIoRead` (ours, 8k buffer)	~80.0 ms	~26.8 MiB/s	New streaming perf (28.6% faster)
`BufferedIoRead` (ours, 16k buffer)	~79.5 ms	~27.0 MiB/s	(Slightly faster with larger buffer)

Testing

Benchmark Methodology

Details

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use serde::Deserialize;
use serde_json::{
    de::{BufferedIoRead, Deserializer, IoRead},
    value::Value,
};
use std::io::{self, BufReader, Read};

// Load the 2.2MB canada.json file.
const DATA: &[u8] = include_bytes!("canada.json");

/// Public function to use our BufferedIoRead
/// We need to re-create the buffer for every iteration.
fn from_reader_buffered<'de, R, B>(reader: R, buffer: B) -> Deserializer<BufferedIoRead<R, B>>
where
    R: io::Read,
    B: AsMut<[u8]> + AsRef<[u8]> + 'de,
{
    Deserializer::new(BufferedIoRead::new(reader, buffer))
}

fn benchmark_group(c: &mut Criterion) {
    let mut group = c.benchmark_group("canada.json (2.2MB)");
    group.throughput(Throughput::Bytes(DATA.len() as u64));

    // ----- Benchmark 1: The "Goal" (fastest possible) -----
    // This is `from_slice`, which is our ideal baseline.
    group.bench_function("from_slice", |b| {
        b.iter(|| {
            let val: Value = serde_json::from_slice(DATA).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 2: "Collect all, then parse" -----
    // This is the common "buffer everything to a Vec" strategy.
    // This *includes* the I/O cost of reading to the Vec.
    group.bench_function("read_to_end_then_slice", |b| {
        b.iter(|| {
            // We use `DATA` as the "reader"
            let mut reader = DATA;
            let mut vec = Vec::with_capacity(DATA.len());
            reader.read_to_end(&mut vec).unwrap();

            let val: Value = serde_json::from_slice(&vec).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 3: The "Status Quo" -----
    // This is `from_reader` with the standard library's `BufReader`.
    // This highlights the per-byte bookkeeping cost of `IoRead`.
    group.bench_function("from_reader (std BufReader 8k)", |b| {
        b.iter(|| {
            // `DATA` acts as the underlying `Read`
            let reader = BufReader::new(DATA);
            let mut de = Deserializer::new(IoRead::new(reader));
            let val: Value = Value::deserialize(&mut de).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 4: Our `BufferedIoRead` -----
    // We parameterize this to test different internal buffer sizes.
    // The *IO* buffer is irrelevant since `DATA` is in memory,
    // but the *internal* buffer size is critical.
    let buffer_sizes = [
        128,   // Default
        1024,  // 1 KiB
        8192,  // 8 KiB (matches std BufReader)
        16384, // 16 KiB
    ];

    for size in buffer_sizes.iter() {
        group.bench_with_input(
            BenchmarkId::new("from_reader_buffered (ours)", *size),
            size,
            |b, &size| {
                b.iter(|| {
                    // We must create a new buffer *inside* the iter
                    // because `BufferedIoRead` consumes it.
                    let buffer: Vec<u8> = vec![0; size];

                    let mut de = from_reader_buffered(DATA, buffer);
                    let val: Value = Value::deserialize(&mut de).unwrap();
                    black_box(val);
                });
            },
        );
    }

    group.finish();
}

criterion_group!(benches, benchmark_group);
criterion_main!(benches);

Correctness Tests

This PR includes extensive new integration tests (tests/buffered_io.rs) that "torture test" the buffer boundary logic. These tests use a custom SlowReader to force buffer refills at critical parsing points (e.g., in the middle of a string, during a \u escape sequence, and while parsing a RawValue) to ensure correctness.

veecore added 2 commits October 29, 2025 03:03

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing

4468b00

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing

fdfea72

veecore force-pushed the perf-buffered-io-read branch from 767dbba to fdfea72 Compare October 29, 2025 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294

veecore commented Oct 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294

Are you sure you want to change the base?

Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294

Conversation

veecore commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem

The Solution: BufferedIoRead

Performance Benchmarks

Testing

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

veecore commented Oct 29, 2025 •

edited

Loading

The Solution: `BufferedIoRead`