Skip to content

optimize PartialSmt serialization and deserialization, extract re-usable primitives for protobuf #746

@drahnr

Description

@drahnr

Context

When sharing SmtProofs - and many of it - we want to avoid duplication of InnerNodes being sent (those with two children). The first step of optmization here is to construct a PartialSmt from the SmtProofs and with increasing likelyhood towards the root of the tree deduplicate them.

Status Quo to_bytes/from_bytes

Currently we serialize all inner nodes of a PartialSmt and re-construct it by using all of these inner nodes.

Problem

  1. the current serialization is unversioned
  2. the current serialization is not done at the protobuf (out current serialization solution) level (minor) but represented as an opaque byte blob (only a problem in conjunction with 1.)
  3. we still leave a lot on the table when it comes to space efficiency

Proposal

  1. punting, since we implement 2
  2. move to protobuf as a serialization format, but keep the current as_bytes/from_bytes as is.
  3. we can be more optimal, i.e. particularly re inner nodes - we can re-construct a lot algorithmically particularly and will never be worse than both the set of SmtProofs or the full dense set of InnerNodes

Details Re 3

High-Level Flow

We traverse as follows:

Leaves → Intermediate Nodes → Root

Why? It gives us the ability to locally decide which entries need to be transferred to be fully re-constructable.

Assumption: We are guaranteed to have a PartialSmt in front of us that is the superposition of a number of SmtProofs of the same tree/same root.

Algorithm Steps

Before diving in, we want a notion of re-constructability. This means both parents are either reconstructable themselves, or are stored deliberately since they themselves were not re-constructable.

1. Initialize state

Known leaves are accessible via index, we store the indices in a set.

These are the nodes we can fallback to when not re-constructable and send deliberately. If we find ourselves needing a node that is outside this set to be sent deliberately, that is a bug of the algorithm.

2. BFS backtrack

Now we start from the leaves and backtrack towards the root. For every inner node we check which other inner nodes (or leaves in the first step) we need to send in order to be re-constructable. Collect those that we need to send deliberately. Enqueue parent, dedup, rinse and reapet.

3. Collect

We only need to define an encoding for the given indices, we can continue to use the (index, digest) approach.


---{transfer}-->


Reconstruction

1. BFS backtrack

Start from the leaves again, and reconstruct all parent nodes from the given children. If at any point a child node is not known (either received or reconstructed) it's an error. The result is a PartialSmt again.


Common failure cases:

  • Missing minimal inner nodes (incomplete data)
  • Inconsistent node hashes (corrupted data)
  • Unreachable nodes (structural inconsistency)

Complexity Analysis

Time Complexity

  • Worst case: O(n * d) where n = number of leaves, d = tree depth (typically 64)
  • Each iteration processes at most O(n) nodes
  • Maximum d iterations needed to reach root from leaves

Practical concerns

The re-construction algorithm by definition will succeed with excess inner nodes transfered, which allows us to expose primitives to construct and re-construct, and use them in both protobuf and to_bytes/from_bytes without further concerns for backward compat, assuming we retain the general structure in the latter of (root count entries...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions