-
Notifications
You must be signed in to change notification settings - Fork 76
PartialSmt serialization and deserialization, extract re-usable primitives for protobuf#746Description
Context
When sharing SmtProofs - and many of it - we want to avoid duplication of InnerNodes being sent (those with two children). The first step of optmization here is to construct a PartialSmt from the SmtProofs and with increasing likelyhood towards the root of the tree deduplicate them.
Status Quo to_bytes/from_bytes
Currently we serialize all inner nodes of a PartialSmt and re-construct it by using all of these inner nodes.
Problem
- the current serialization is unversioned
- the current serialization is not done at the protobuf (out current serialization solution) level (minor) but represented as an opaque byte blob (only a problem in conjunction with 1.)
- we still leave a lot on the table when it comes to space efficiency
Proposal
- punting, since we implement 2
- move to protobuf as a serialization format, but keep the current
as_bytes/from_bytesas is. - we can be more optimal, i.e. particularly re inner nodes - we can re-construct a lot algorithmically particularly and will never be worse than both the set of
SmtProofs or the full dense set ofInnerNodes
Details Re 3
High-Level Flow
We traverse as follows:
Leaves → Intermediate Nodes → Root
Why? It gives us the ability to locally decide which entries need to be transferred to be fully re-constructable.
Assumption: We are guaranteed to have a PartialSmt in front of us that is the superposition of a number of SmtProofs of the same tree/same root.
Algorithm Steps
Before diving in, we want a notion of re-constructability. This means both parents are either reconstructable themselves, or are stored deliberately since they themselves were not re-constructable.
1. Initialize state
Known leaves are accessible via index, we store the indices in a set.
These are the nodes we can fallback to when not re-constructable and send deliberately. If we find ourselves needing a node that is outside this set to be sent deliberately, that is a bug of the algorithm.
2. BFS backtrack
Now we start from the leaves and backtrack towards the root. For every inner node we check which other inner nodes (or leaves in the first step) we need to send in order to be re-constructable. Collect those that we need to send deliberately. Enqueue parent, dedup, rinse and reapet.
3. Collect
We only need to define an encoding for the given indices, we can continue to use the (index, digest) approach.
---{transfer}-->
Reconstruction
1. BFS backtrack
Start from the leaves again, and reconstruct all parent nodes from the given children. If at any point a child node is not known (either received or reconstructed) it's an error. The result is a PartialSmt again.
Common failure cases:
- Missing minimal inner nodes (incomplete data)
- Inconsistent node hashes (corrupted data)
- Unreachable nodes (structural inconsistency)
Complexity Analysis
Time Complexity
- Worst case: O(n * d) where
n= number of leaves,d= tree depth (typically 64) - Each iteration processes at most O(n) nodes
- Maximum
diterations needed to reach root from leaves
Practical concerns
The re-construction algorithm by definition will succeed with excess inner nodes transfered, which allows us to expose primitives to construct and re-construct, and use them in both protobuf and to_bytes/from_bytes without further concerns for backward compat, assuming we retain the general structure in the latter of (root count entries...)