optimize `PartialSmt` serialization and deserialization, extract re-usable primitives for protobuf

## Context

When sharing `SmtProof`s - and many of it - we want to avoid duplication of `InnerNode`s being sent (those with two children). The first step of optmization here is to construct a `PartialSmt` from the `SmtProof`s and with increasing likelyhood towards the root of the tree deduplicate them.

## Status Quo `to_bytes`/`from_bytes`

Currently we serialize all inner nodes of a `PartialSmt` and re-construct it by using all of these inner nodes.

## Problem

1. the current serialization is unversioned
2. the current serialization is not done at the protobuf (out current serialization solution) level (minor) but represented as an opaque byte blob (only a problem in conjunction with 1.)
3. we still leave a lot on the table when it comes to space efficiency


## Proposal

1. punting, since we implement 2
2. move to protobuf as a serialization format, but keep the current `as_bytes`/`from_bytes` as is.
3. we can be more optimal, i.e. particularly re inner nodes - we can re-construct a lot algorithmically particularly and will never be worse than both the set of `SmtProof`s or the full dense set of `InnerNodes`

---

## Details Re 3

## High-Level Flow

We traverse as follows:

Leaves → Intermediate Nodes → Root

Why? It gives us the ability to locally decide which entries need to be transferred to be fully re-constructable.

Assumption: We are guaranteed to have a `PartialSmt` in front of us that is the superposition of a number of `SmtProof`s of the same tree/same root.

## Algorithm Steps

Before diving in, we want a notion of re-constructability. This means both parents are either reconstructable themselves, or are stored deliberately since they themselves were not re-constructable.

### 1. Initialize state

Known leaves are accessible via index, we store the indices in a set.

These are the nodes we can fallback to when not re-constructable and send deliberately. If we find ourselves needing a node that is outside this set to be sent deliberately, that is a bug of the algorithm.

### 2. BFS backtrack

Now we start from the leaves and backtrack towards the root. For every inner node we check which other inner nodes (or leaves in the first step) we need to send in order to be re-constructable. Collect those that we need to send deliberately. Enqueue parent, dedup, rinse and reapet.

### 3. Collect

We only need to define an encoding for the given indices, we can continue to use the (index, digest) approach.

---

---{transfer}-->

---

## Reconstruction

### 1. BFS backtrack

Start from the leaves again, and reconstruct all parent nodes from the given children. If at any point a child node is not known (either received or reconstructed) it's an error. The result is a `PartialSmt` again.

---

**Common failure cases**:
- Missing minimal inner nodes (incomplete data)
- Inconsistent node hashes (corrupted data)
- Unreachable nodes (structural inconsistency)

## Complexity Analysis

### Time Complexity

- **Worst case**: O(n * d) where `n` = number of leaves, `d` = tree depth (typically 64)
- Each iteration processes at most O(n) nodes
- Maximum `d` iterations needed to reach root from leaves

---

### Practical concerns

The re-construction algorithm by definition will succeed with excess inner nodes transfered, which allows us to expose primitives to construct and re-construct, and use them in both `protobuf` and `to_bytes`/`from_bytes` without further concerns for backward compat, assuming we retain the general structure in the latter of (`root` `count` `entries...`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize `PartialSmt` serialization and deserialization, extract re-usable primitives for protobuf #746

Context

Status Quo `to_bytes`/`from_bytes`

Problem

Proposal

Details Re 3

High-Level Flow

Algorithm Steps

1. Initialize state

2. BFS backtrack

3. Collect

Reconstruction

1. BFS backtrack

Complexity Analysis

Time Complexity

Practical concerns

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

optimize PartialSmt serialization and deserialization, extract re-usable primitives for protobuf #746

Description

Context

Status Quo to_bytes/from_bytes

Problem

Proposal

Details Re 3

High-Level Flow

Algorithm Steps

1. Initialize state

2. BFS backtrack

3. Collect

Reconstruction

1. BFS backtrack

Complexity Analysis

Time Complexity

Practical concerns

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

optimize `PartialSmt` serialization and deserialization, extract re-usable primitives for protobuf #746

Status Quo `to_bytes`/`from_bytes`