Skip to content

Conversation

@veecore
Copy link

@veecore veecore commented Oct 29, 2025

This PR introduces a new, high-performance Read implementation, BufferedIoRead, to significantly speed up deserialization from std::io::Read sources.

The Problem

The current IoRead implementation is a simple byte-by-byte iterator over an io::Read source. Because it operates one byte at a time, it has two major performance drawbacks:

  1. Per-Byte Bookkeeping: It must check for IO errors and update position tracking (line/column) for every single byte processed.
  2. Inability to Optimize: It cannot use SliceRead's powerful, memchr-based optimizations (like skip_to_escape) because it doesn't have a slice to operate on.

This remains true even when a user manually wraps their reader in a std::io::BufReader. IoRead is unaware of the underlying buffer and cannot take advantage of it, so the per-byte overhead remains.

The Solution: BufferedIoRead

This new implementation, BufferedIoRead<R, B>, wraps an io::Read source and uses an internal buffer (generic over AsMut<[u8]>).

Its core optimization is simple but powerful: it creates a temporary SliceRead over its internal buffer.

This allows the deserializer to use the hyper-optimized SliceRead paths (like skip_to_escape) for large chunks of data at a time. The per-byte bookkeeping logic is now deferred and runs only once per buffer refill, rather than once per byte.

The implementation intelligently handles all parsing logic (including strings, escape sequences, and raw_value) across buffer boundaries, ensuring correctness while maximizing performance.

Performance Benchmarks

The results speak for themselves. Benchmarking against the canada.json (2.2MB) file shows a ~28.6% speedup for streaming deserialization compared to the current from_reader + BufReader approach.

BufferedIoRead closes a significant portion of the gap between streaming parsing (from_reader) and non-streaming parsing (from_slice).

Method Time (canada.json, 2.2MB) Throughput Notes
from_slice ~65.0 ms ~33.0 MiB/s Theoretical ideal (baseline)
read_to_end_then_slice ~68.9 ms ~31.1 MiB/s Not a streaming solution (loads all to RAM)
from_reader (std BufReader 8k) ~112.2 ms ~19.1 MiB/s Current streaming perf
BufferedIoRead (ours, 8k buffer) ~80.0 ms ~26.8 MiB/s New streaming perf (28.6% faster)
BufferedIoRead (ours, 16k buffer) ~79.5 ms ~27.0 MiB/s (Slightly faster with larger buffer)

Testing

Benchmark Methodology

Details

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use serde::Deserialize;
use serde_json::{
    de::{BufferedIoRead, Deserializer, IoRead},
    value::Value,
};
use std::io::{self, BufReader, Read};

// Load the 2.2MB canada.json file.
const DATA: &[u8] = include_bytes!("canada.json");

/// Public function to use our BufferedIoRead
/// We need to re-create the buffer for every iteration.
fn from_reader_buffered<'de, R, B>(reader: R, buffer: B) -> Deserializer<BufferedIoRead<R, B>>
where
    R: io::Read,
    B: AsMut<[u8]> + AsRef<[u8]> + 'de,
{
    Deserializer::new(BufferedIoRead::new(reader, buffer))
}

fn benchmark_group(c: &mut Criterion) {
    let mut group = c.benchmark_group("canada.json (2.2MB)");
    group.throughput(Throughput::Bytes(DATA.len() as u64));

    // ----- Benchmark 1: The "Goal" (fastest possible) -----
    // This is `from_slice`, which is our ideal baseline.
    group.bench_function("from_slice", |b| {
        b.iter(|| {
            let val: Value = serde_json::from_slice(DATA).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 2: "Collect all, then parse" -----
    // This is the common "buffer everything to a Vec" strategy.
    // This *includes* the I/O cost of reading to the Vec.
    group.bench_function("read_to_end_then_slice", |b| {
        b.iter(|| {
            // We use `DATA` as the "reader"
            let mut reader = DATA;
            let mut vec = Vec::with_capacity(DATA.len());
            reader.read_to_end(&mut vec).unwrap();

            let val: Value = serde_json::from_slice(&vec).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 3: The "Status Quo" -----
    // This is `from_reader` with the standard library's `BufReader`.
    // This highlights the per-byte bookkeeping cost of `IoRead`.
    group.bench_function("from_reader (std BufReader 8k)", |b| {
        b.iter(|| {
            // `DATA` acts as the underlying `Read`
            let reader = BufReader::new(DATA);
            let mut de = Deserializer::new(IoRead::new(reader));
            let val: Value = Value::deserialize(&mut de).unwrap();
            black_box(val);
        })
    });

    // ----- Benchmark 4: Our `BufferedIoRead` -----
    // We parameterize this to test different internal buffer sizes.
    // The *IO* buffer is irrelevant since `DATA` is in memory,
    // but the *internal* buffer size is critical.
    let buffer_sizes = [
        128,   // Default
        1024,  // 1 KiB
        8192,  // 8 KiB (matches std BufReader)
        16384, // 16 KiB
    ];

    for size in buffer_sizes.iter() {
        group.bench_with_input(
            BenchmarkId::new("from_reader_buffered (ours)", *size),
            size,
            |b, &size| {
                b.iter(|| {
                    // We must create a new buffer *inside* the iter
                    // because `BufferedIoRead` consumes it.
                    let buffer: Vec<u8> = vec![0; size];

                    let mut de = from_reader_buffered(DATA, buffer);
                    let val: Value = Value::deserialize(&mut de).unwrap();
                    black_box(val);
                });
            },
        );
    }

    group.finish();
}

criterion_group!(benches, benchmark_group);
criterion_main!(benches);

Correctness Tests

This PR includes extensive new integration tests (tests/buffered_io.rs) that "torture test" the buffer boundary logic. These tests use a custom SlowReader to force buffer refills at critical parsing points (e.g., in the middle of a string, during a \u escape sequence, and while parsing a RawValue) to ensure correctness.

@veecore veecore force-pushed the perf-buffered-io-read branch from 767dbba to fdfea72 Compare October 29, 2025 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant