-
Notifications
You must be signed in to change notification settings - Fork 250
feat: serialize BasicBlocks in padded representation (1/3)
#2466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
41d5579 to
20b4415
Compare
BasicBlocks in padded representation (1/2 or 1/3)BasicBlocks in padded representation (1/3)
20b4415 to
99bb3be
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very shallow review from me - but looks great! Thank you!
| /// Represents the operation data for a [`BasicBlockNodeBuilder`]. | ||
| /// | ||
| /// The decorators are bundled with the operation data to maintain the invariant that | ||
| /// decorator indices match the format of the operations: | ||
| /// - `Raw`: decorators have raw (unpadded) indices | ||
| /// - `Batched`: decorators have padded indices | ||
| #[derive(Debug)] | ||
| enum OperationData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do we need this "duality" mostly to support the slow processor? That is, is the slow processor the only reason why we need to keep track of the raw (unpadded) indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we still support raw indexes for those reasons:
- Assembly block merging (raw ops & raw decorator indexes)
- Node creation (raw → padded conversion in initial batching)
- Fingerprinting (padded → raw conversion for stability, which I will remove, after this stack of PRs)
This stack of PRs also removes the use of raw indexes in serialization.
plafer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
99bb3be to
2103969
Compare
cd5c728 to
db92d2d
Compare
db92d2d to
9204e3d
Compare
9204e3d to
edea9b7
Compare
|
I think the stack of 3 PRs are good to merge - any reason to hold off? |
edea9b7 to
f0f925e
Compare
Serialize BasicBlockNode operations in padded form with batch metadata to enable exact OpBatch reconstruction during deserialization. Changes: - Add batch metadata to serialization format (indptr, padding, groups) - Add OperationData enum to bundle operations with matching decorator indices - Add from_op_batches constructor to BasicBlockNodeBuilder - Serialize decorators with padded indices to match padded operations - Bump serialization version from [0,0,0] to [0,0,1] - Add comprehensive tests including 7 unit tests and 3 proptests This preserves the exact OpBatch structure across serialization boundaries, eliminating the need for re-batching during deserialization.
Add module-level documentation showing the over-the-wire format for basic blocks with byte consumption formula.
Modified BasicBlockNode::to_builder to directly use pre-batched operations and padded decorators already stored in the node, eliminating redundant re-batching and decorator adjustment. The implementation now: - Uses from_op_batches constructor to preserve existing op_batches - Extracts padded decorators directly from Owned or Linked storage - Avoids wasteful extraction of unpadded operations followed by re-batching
- Merged test_to_builder_identity_{owned,linked} into single test
covering both storage types
- Simplified OpBatch roundtrip tests by using PartialEq instead of
checking each field individually
- Simplified proptest assertions to compare OpBatch directly instead of
checking ops, indptr, padding, groups, num_groups separately
Add comprehensive serialization round-trip test using the standard library to verify multi-batch basic block serialization.
f0f925e to
b383cbf
Compare
As a prelude to #2448, this changes the serialization of
BasicBlockto reflect the padded contents, to not need to re-batch and pad those blocks again.The goal of this PR is twofold:
indptrrepresentation, not only because each index is represented with one byte (the value is in [0, 72] which fits in 7 bits) but because the groups contain at most 8 ops, so a simple delta-encoding would give usAs a consequence of bumping the version number for serialization of
MastForest, this PR is intended to stay open until we land on the right over-the-wire format. I.e. there's a version bump in there which I don't intend to have in the PRs which will live on top of this. This stack will be three PRs (one PR delta-encoding, one PR #2448)Test Data: Miden Standard Library
Summary
The padded format adds 4.05% size overhead:
Most blocks (92.7%) add ≤34 bytes.
Size comparison between unpadded and padded serialization formats for MastForest.
Size Comparison
next)serialize-padded-opbatches)The unpadded format cannot guarantee exact OpBatch reconstruction after deserialization.
Overhead Sources
NOOP Padding: 633 operations (0.67%)
Batch Metadata: 33,316 bytes (98% of overhead)
Total metadata: 33,316 bytes (32.54 KB)
Distribution summary:
Metadata per block:
4 + (num_batches × 10)bytesWire Format
Each basic block stores:
Size:
ops_size + 4 + (10 × num_batches)bytesScript: https://gist.github.com/huitseeker/f957014b15b0ed08a36b7f936079b698