-
Notifications
You must be signed in to change notification settings - Fork 247
feat: delta-encode BasicBlockNode metadata (2/3) #2469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5ec380d to
55fd826
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not a very deep review - but looks good! Thank you!
| //! - Padding flags per batch (1 byte each, bit-packed) | ||
| //! | ||
| //! **Total**: `ops_size + 4 + (10 * num_batches)` bytes | ||
| //! **Total**: `ops_size + 4 + (5 * num_batches)` bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought for the future about serialization of operations:
- Currently, we use 1 byte to serialize each opcode - though, technically, each opcode is only 7 bits.
- For operations like
Assert,MpVerify, andU32assert2, we also serialize the error code (8 bytes) which is not semantically-relevant and ideally would somehow go into the debug info section. - For the
Pushoperation we always serialize the immediate values as 8 bytes. We could optimize this in the future by using a variable-length encoding, and also maybe building an immediate values table (since there is probably quite a lot of duplication of immediate values - though, with variable length encoding, the benefit of the table is somewhat questionable).
plafer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
99bb3be to
2103969
Compare
55fd826 to
53ca96f
Compare
2103969 to
cd5c728
Compare
84aca58 to
59bd1b9
Compare
cd5c728 to
db92d2d
Compare
59bd1b9 to
778060f
Compare
db92d2d to
9204e3d
Compare
778060f to
7724be5
Compare
9204e3d to
edea9b7
Compare
7724be5 to
aad51b4
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you!
edea9b7 to
f0f925e
Compare
aad51b4 to
ce2fad2
Compare
f0f925e to
b383cbf
Compare
ce2fad2 to
842094c
Compare
1. Updated decoder to use unpack_indptr_deltas (4 bytes per batch) 2. Fixed OpBatchAccumulator::into_batch to fill indptr tail with final value, ensuring monotonicity required by delta encoding 3. Updated snapshot tests to reflect new indptr representation All serialization tests now pass with full round-trip working.
Add documentation explaining that indptr/padding/groups arrays are partial structures where only prefixes are semantically valid. Add debug assertions to validate invariants including full array monotonicity required for serialization.
Add validation for full indptr array monotonicity and final value in MastForest validation. Call validate() after deserialization to ensure reconstructed forests meet all invariants required for serialization.
Use rstest in unit tests.
842094c to
1aa41a1
Compare
This PR implements delta encoding for
BasicBlockNodeindptr arrays, building on the padded serialization format from #2466. It reduces batch metadata from 10 bytes to 5 bytes per batch through 4-bit delta encoding.The delta-encoding mentioned in the previous PR is now implemented:
Clarified the partial validity semantics of OpBatch arrays:
indptr[0..num_groups+1]- semantically valid prefixpadding[0..num_groups]- semantically valid prefixgroups[0..num_groups]- semantically valid prefixThe entire
indptrarray must be monotonically non-decreasing for delta encoding.OpBatchAccumulatorfills the tail with the final ops count to maintain this invariant while keeping the tail semantically meaningless.Summary
The delta-encoded format adds 2.20% size overhead compared to unpadded baseline:
Savings vs padded format: 15,780 bytes (-1.84%)
Most blocks (92.7%) add ≤22 bytes.
Size comparison across serialization formats for MastForest.
Size Comparison
next)Overhead Sources
NOOP Padding: 633 operations (0.67%)
Batch Metadata: 17,536 bytes (with delta encoding)
Total metadata: 17,536 bytes (17.12 KB)
Metadata per block:
4 + (num_batches × 5)bytesDelta Encoding Details
Each indptr array is encoded as:
indptr[0](always 0)indptr[i+1] - indptr[i]for i ∈ [0,7]Valid delta range: [0, 9] (max 9 ops per group)
Tail semantics: Elements beyond
[0..num_groups]must maintain monotonicity for serialization but are semantically undefined. The tail is filled with the final ops count during construction.Wire Format
Each basic block stores:
Size:
ops_size + 4 + (5 × num_batches)bytesPrevious format size:
ops_size + 4 + (10 × num_batches)bytes