feat: serialize `BasicBlock`s in padded representation (1/3) #2466

huitseeker · 2025-12-12T18:33:15Z

As a prelude to #2448, this changes the serialization of BasicBlock to reflect the padded contents, to not need to re-batch and pad those blocks again.

The goal of this PR is twofold:

experiment with and analyze the padded on-disk representation, unlocked by this comment 🎉 — get conversation started about how permissible it is to serialize metadata:
- One important sub-part of this: in this PR, we're over-consuming bits in the indptr representation, not only because each index is represented with one byte (the value is in [0, 72] which fits in 7 bits) but because the groups contain at most 8 ops, so a simple delta-encoding would give us $8 \times 4$ bits $= 32$ bits with an implicit start at 0, instead of $9 \times 8 = 72$. This is not yet implemented, but would cut the ovehead numbers below about in half.
Groundwork for Simplify MastForest serialization by directly serializing DebugInfo #2448, which needs to change the over-the-wire representation of decorator info anyway, but has a long paragraph in there on the difficulty of un-padding decorators ("The Padding Wrinkle") that disappears in a puff of simplification after this work.

As a consequence of bumping the version number for serialization of MastForest, this PR is intended to stay open until we land on the right over-the-wire format. I.e. there's a version bump in there which I don't intend to have in the PRs which will live on top of this. This stack will be three PRs (one PR delta-encoding, one PR #2448)

Test Data: Miden Standard Library

727 total nodes
439 basic block nodes (60.4%)
284 procedures
94,744 operations (with padding)

Summary

The padded format adds 4.05% size overhead:

0.67% from NOOP padding (633 ops)
3.38% from batch metadata (33,316 bytes)

Most blocks (92.7%) add ≤34 bytes.

Size comparison between unpadded and padded serialization formats for MastForest.

Size Comparison

Format	Size	Overhead
Unpadded (`next`)	838,400 bytes (818.75 KB)	baseline
Padded (`serialize-padded-opbatches`)	872,375 bytes (851.93 KB)	+33,975 bytes (+4.05%)

The unpadded format cannot guarantee exact OpBatch reconstruction after deserialization.

Overhead Sources

NOOP Padding: 633 operations (0.67%)

Metric	Count
Total operations	94,744
NOOP padding	633
Real operations	94,111

Batch Metadata: 33,316 bytes (98% of overhead)

Component	Per Unit	Count	Total
Indptr array	9 bytes	3,156 batches	28,404 bytes
Padding flags	1 byte	3,156 batches	3,156 bytes
Batch count	4 bytes	439 blocks	1,756 bytes

Total metadata: 33,316 bytes (32.54 KB)

Distribution summary:

92.7% of blocks: ≤34 bytes overhead (1-3 batches)
2.3% of blocks: >100 bytes overhead (9+ batches)
Average: 75.89 bytes per block (skewed by outliers)

Metadata per block: 4 + (num_batches × 10) bytes

Wire Format

Each basic block stores:

┌─────────────────────────────────────────────────┐
│ Padded Operations (variable)                    │
├─────────────────────────────────────────────────┤
│ Batch Count (u32, 4 bytes)                      │
├─────────────────────────────────────────────────┤
│ Indptr Arrays (9 × u8 per batch)               │
├─────────────────────────────────────────────────┤
│ Padding Flags (1 byte per batch, bit-packed)   │
└─────────────────────────────────────────────────┘

Size: ops_size + 4 + (10 × num_batches) bytes

Script: https://gist.github.com/huitseeker/f957014b15b0ed08a36b7f936079b698

bobbinth

This is a very shallow review from me - but looks great! Thank you!

bobbinth · 2025-12-15T01:04:49Z

core/src/mast/node/basic_block_node/mod.rs

+/// Represents the operation data for a [`BasicBlockNodeBuilder`].
+///
+/// The decorators are bundled with the operation data to maintain the invariant that
+/// decorator indices match the format of the operations:
+/// - `Raw`: decorators have raw (unpadded) indices
+/// - `Batched`: decorators have padded indices
+#[derive(Debug)]
+enum OperationData {


Question: do we need this "duality" mostly to support the slow processor? That is, is the slow processor the only reason why we need to keep track of the raw (unpadded) indexes?

No, we still support raw indexes for those reasons:

Assembly block merging (raw ops & raw decorator indexes)

Node creation (raw → padded conversion in initial batching)

Fingerprinting (padded → raw conversion for stability, which I will remove, after this stack of PRs)

This stack of PRs also removes the use of raw indexes in serialization.

plafer

LGTM!

CHANGELOG.md

plafer · 2026-01-07T18:04:02Z

I think the stack of 3 PRs are good to merge - any reason to hold off?

Serialize BasicBlockNode operations in padded form with batch metadata to enable exact OpBatch reconstruction during deserialization. Changes: - Add batch metadata to serialization format (indptr, padding, groups) - Add OperationData enum to bundle operations with matching decorator indices - Add from_op_batches constructor to BasicBlockNodeBuilder - Serialize decorators with padded indices to match padded operations - Bump serialization version from [0,0,0] to [0,0,1] - Add comprehensive tests including 7 unit tests and 3 proptests This preserves the exact OpBatch structure across serialization boundaries, eliminating the need for re-batching during deserialization.

Add module-level documentation showing the over-the-wire format for basic blocks with byte consumption formula.

Modified BasicBlockNode::to_builder to directly use pre-batched operations and padded decorators already stored in the node, eliminating redundant re-batching and decorator adjustment. The implementation now: - Uses from_op_batches constructor to preserve existing op_batches - Extracts padded decorators directly from Owned or Linked storage - Avoids wasteful extraction of unpadded operations followed by re-batching

- Merged test_to_builder_identity_{owned,linked} into single test covering both storage types - Simplified OpBatch roundtrip tests by using PartialEq instead of checking each field individually - Simplified proptest assertions to compare OpBatch directly instead of checking ops, indptr, padding, groups, num_groups separately

Add comprehensive serialization round-trip test using the standard library to verify multi-batch basic block serialization.

huitseeker force-pushed the serialize-padded-opbatches branch 3 times, most recently from 41d5579 to 20b4415 Compare December 12, 2025 22:17

huitseeker changed the title ~~feat: serialize BasicBlocks in padded representation (1/2 or 1/3)~~ feat: serialize BasicBlocks in padded representation (1/3) Dec 12, 2025

huitseeker force-pushed the serialize-padded-opbatches branch from 20b4415 to 99bb3be Compare December 12, 2025 23:07

huitseeker mentioned this pull request Dec 13, 2025

feat: delta-encode BasicBlockNode metadata (2/3) #2469

Merged

huitseeker requested review from adr1anh, bobbinth and plafer and removed request for adr1anh December 13, 2025 16:38

bobbinth approved these changes Dec 15, 2025

View reviewed changes

plafer approved these changes Dec 15, 2025

View reviewed changes

huitseeker force-pushed the serialize-padded-opbatches branch from 99bb3be to 2103969 Compare December 16, 2025 21:07

huitseeker force-pushed the serialize-padded-opbatches branch 2 times, most recently from cd5c728 to db92d2d Compare December 30, 2025 04:04

huitseeker force-pushed the serialize-padded-opbatches branch from db92d2d to 9204e3d Compare January 5, 2026 13:53

This was referenced Jan 6, 2026

remove Owned variant of DecoratorStorage #2411

Open

Allow to truncate decorator data when serializing MAST #1580

Closed

plafer reviewed Jan 6, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

huitseeker force-pushed the serialize-padded-opbatches branch from 9204e3d to edea9b7 Compare January 7, 2026 07:10

huitseeker mentioned this pull request Jan 7, 2026

feat(core): add stripped MastForest serialization #2549

Merged

huitseeker force-pushed the serialize-padded-opbatches branch from edea9b7 to f0f925e Compare January 7, 2026 18:31

huitseeker added 7 commits January 7, 2026 13:42

chore: allow bincode exception in deny.toml

bff7607

docs(core): add wire format schematic for basic block serialization

144d2eb

Add module-level documentation showing the over-the-wire format for basic blocks with byte consumption formula.

chore: Changelog

62361a4

test(core): add stdlib integration test for multi-batch serialization

b383cbf

Add comprehensive serialization round-trip test using the standard library to verify multi-batch basic block serialization.

huitseeker force-pushed the serialize-padded-opbatches branch from f0f925e to b383cbf Compare January 7, 2026 18:42

huitseeker mentioned this pull request Jan 7, 2026

Change stack ordering through unified LE convention and sponge state remapping #2547

Merged

huitseeker merged commit 3dc9dd5 into next Jan 7, 2026
16 checks passed

huitseeker deleted the serialize-padded-opbatches branch January 7, 2026 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: serialize `BasicBlock`s in padded representation (1/3) #2466

feat: serialize `BasicBlock`s in padded representation (1/3) #2466

Uh oh!

huitseeker commented Dec 12, 2025 •

edited

Loading

Uh oh!

bobbinth left a comment

Uh oh!

bobbinth Dec 15, 2025

Uh oh!

huitseeker Dec 15, 2025 •

edited

Loading

Uh oh!

plafer left a comment

Uh oh!

Uh oh!

plafer commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: serialize BasicBlocks in padded representation (1/3) #2466

feat: serialize BasicBlocks in padded representation (1/3) #2466

Uh oh!

Conversation

huitseeker commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Data: Miden Standard Library

Summary

Size Comparison

Overhead Sources

NOOP Padding: 633 operations (0.67%)

Batch Metadata: 33,316 bytes (98% of overhead)

Wire Format

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

bobbinth Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

huitseeker Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

plafer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

plafer commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: serialize `BasicBlock`s in padded representation (1/3) #2466

feat: serialize `BasicBlock`s in padded representation (1/3) #2466

huitseeker commented Dec 12, 2025 •

edited

Loading

huitseeker Dec 15, 2025 •

edited

Loading