Skip to content

feat: add blob encoding benchmark and profiling harness#3085

Open
wbbradley wants to merge 3 commits intomainfrom
wbbradley/profiling-encoding
Open

feat: add blob encoding benchmark and profiling harness#3085
wbbradley wants to merge 3 commits intomainfrom
wbbradley/profiling-encoding

Conversation

@wbbradley
Copy link
Contributor

@wbbradley wbbradley commented Mar 8, 2026

Description

Adds phase-level benchmarking and profiling infrastructure for the blob encoding pipeline (encode_with_metadata()) at production parameters (n_shards=1000).

  • examples/profile_encoding.rs — Standalone profiling binary for use with samply record or cargo flamegraph. Accepts --size, --shards, --iterations flags. Reports wall-clock time and throughput (MiB/s).
  • benches/encoding_phases.rs — Criterion benchmark measuring individual phases: secondary encoding, primary encoding, primary encoding + hashing, metadata/Merkle tree construction, and full pipeline. Blob sizes: 1MiB, 32MiB, 256MiB.
  • Makes leaf_hash public in merkle.rs so benchmarks can measure hashing independently.
  • Adds clap dev-dependency for the profiling binary's CLI.

Test plan

  • Verified compilation via chk (formatting, clippy).
  • Ran cargo nextest run -p walrus-core (232 tests pass).
  • Ran cargo bench -p walrus-core --bench encoding_phases to verify benchmarks execute correctly.
  • Ran cargo build --release --example profile_encoding && ./target/release/examples/profile_encoding to verify the profiling binary works.

Release notes

  • Storage node:
  • Aggregator:
  • Publisher:
  • CLI:

@wbbradley wbbradley force-pushed the wbbradley/profiling-encoding branch from 89f3b76 to 19db448 Compare March 9, 2026 03:53
Copy link
Contributor

@mlegner mlegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for expanding our benchmarking/profiling toolbox. A few questions mainly about code duplication.

Verified compilation via chk (formatting, clippy).

Where is that chk defined?

Comment on lines 47 to +54
[[bench]]
name = "blob_encoding"
harness = false

[[bench]]
name = "encoding_phases"
harness = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What is the relationship to the existing benchmarks? Can we combine them?

Comment on lines +39 to +52
fn parse_size(s: &str) -> Result<usize, String> {
let s = s.to_lowercase();
let (num, mult) = if let Some(n) = s.strip_suffix('g') {
(n, 1 << 30)
} else if let Some(n) = s.strip_suffix('m') {
(n, 1 << 20)
} else if let Some(n) = s.strip_suffix('k') {
(n, 1 << 10)
} else {
(s.as_str(), 1)
};
let n: usize = num.parse().map_err(|e| format!("invalid size: {e}"))?;
Ok(n * mult)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: We already have a struct that does this in the walrus-service crate. That could be moved to walrus-core or walrus-utils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, many of the sub-benchmarks here mostly copy some code from crates/walrus-core/src/encoding/blob_encoding.rs. Can we instead create some functions there that are called in both the production code and here in the benchmarks? In that case we probably also don't have to export the leaf_hash function.

Add phase-level criterion benchmarks (encoding_phases) that measure
secondary encoding, primary encoding, hashing, and metadata construction
independently at production parameters (n_shards=1000). Add a standalone
profiling binary (profile_encoding) designed for use with samply/flamegraph
without criterion overhead. Make leaf_hash public to support external
benchmarking of hashing costs.
@wbbradley wbbradley force-pushed the wbbradley/profiling-encoding branch from 19db448 to f13ab19 Compare March 10, 2026 00:51
Add heap peak tracking via peakmem-alloc and RSS peak via
libc::getrusage() to the profiling binary. Each iteration now reports
peak_heap, peak_rss, and heap expansion ratio. Multi-iteration runs
report max_peak_heap in the summary.
Add --concurrent-blobs N flag that encodes N blobs simultaneously
using std::thread::scope, simulating multi-blob uploads. Reports
per-blob latency, total wall time, and peak memory with per-blob
expansion ratio for direct comparison with single-blob runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants