Skip to content

Conversation

@iamrecursion
Copy link
Collaborator

@iamrecursion iamrecursion commented Jan 9, 2026

Describe your changes

This commit primarily exists to sketch out the interface for the new forest implementation, aimed at getting agreement on the interface and basic architecture before implementation begins in earnest. It does this directly on the LargeSmtForest type, which will be the user-facing portion of the API and likely later renamed to simply SmtForest.

As part of doing this, it includes a number of important utility types, including:

  • Backend, which defines the minimal set of features required for the SMT forest's pluggable backends. Backends are aware of tree semantics.
  • Operation, contained in SmtUpdateBatch and SmtForestUpdateBatch which define sets of modifications to be made to individual trees in the forest and the entire forest respectively.
  • InMemoryPrefix, which implements the in-memory portion of a tree for the backends that keep only some prefix in memory. This was originally implemented for a different approach, but the code is still valid and will be useful in the future, and so is still included.
  • A small set of utility types: SubtreeLevels, RootInfo, and LinearIndex. These were also implemented for a different approach but will be useful in the future.
  • An assortment of error types to ensure strongly-typed error handling throughout the SMT forest API.

Contains work toward the completion of #670. I have set it as "no changelog" as it is not yet intended to be part of the public API.

Checklist before requesting a review

  • Repo forked and branch created from next according to naming convention.
  • Commit messages and codestyle follow conventions.
  • Relevant issues are linked in the PR description.
  • Tests added for new functionality.
  • Documentation/comments updated according to changes.

@iamrecursion iamrecursion self-assigned this Jan 9, 2026
@iamrecursion iamrecursion added the merkle Related to Merkle trees or MMRs label Jan 9, 2026
@iamrecursion iamrecursion force-pushed the forest-skeleton branch 2 times, most recently from 8316321 to a67dff8 Compare January 9, 2026 11:23
@iamrecursion iamrecursion added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Jan 9, 2026
This commit primarily exists to sketch out the interface for the new
forest implementation, aimed at getting agreement on the interface and
basic architecture before implementation begins in earnest. It does this
directly on the `LargeSmtForest` type, which will be the user-facing
portion of the API and likely later renamed to simply `SmtForest`.

As part of doing this, it includes a number of important utility types,
including:

- `Backend`, which defines the minimal set of features required for the
  SMT forest's pluggable backends. Backends are aware of tree semantics.
- `Operation`, contained in `SmtUpdateBatch` and `SmtForestUpdateBatch`
  which define sets of modifications to be made to individual trees in
  the forest and the entire forest respectively.
- `InMemoryPrefix`, which implements the in-memory portion of a tree for
  the backends that keep only some prefix in memory. This was originally
  implemented for a different approach, but the code is still valid and
  will be useful in the future, and so is still included.
- A small set of utility types: `SubtreeLevels`, `RootInfo`, and
  `LinearIndex`. These were also implemented for a different approach
  but will be useful in the future.
- An assortment of error types to ensure strongly-typed error handling
  throughout the SMT forest API.
@iamrecursion iamrecursion marked this pull request as ready for review January 9, 2026 11:55
@iamrecursion
Copy link
Collaborator Author

I wasn't sure if it made sense to include the extra bits of implementation that will be useful in the future. If you don't want them as part of this PR I can pull them out until they are needed; my main worry was just them bitrotting and ending up with a difficult merge/rebase when they do get used. Let me know!

@iamrecursion
Copy link
Collaborator Author

@drahnr, does the API for LargeSmtForest satisfy what you were asking for in #670?

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simplification compared to the Storage trait from PR #700.

The decision to make backends "aware of tree semantics" (as the module doc says) but not responsible for the actual tree logic seems like a good middle ground.

pub fn knows_root(&self, root: Word) -> RootInfo {
if self.histories.contains_key(&root) {
RootInfo::LatestVersion
} else if let Some(h) = self.histories.get(&root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a logic issue here. On line 175 we check if root is NOT a key in self.histories (which would make it a latest version). Then on line 177 we call self.histories.get(&root) again, but since we already know root is not a key, this will always return None.

So the HistoricalVersion branch seems unreachable. Did you mean to iterate over all the history values to see if any of them contain this root? Something like:

} else if self.histories.values().any(|h| h.is_known_root(root)) {
    RootInfo::HistoricalVersion

Or maybe I am missing something about how the histories map is keyed?

/// the provided `backend`.
pub fn new(_backend: B) -> Result<Self, B::Error> {
todo!("LargeSmtForest::new")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at issue #670, the design sketch shows an add_tree method for adding new independent trees to the forest:

fn add_tree(&self, leaves: LeafMap) -> Result<Root>;

I do not see an equivalent in this skeleton. How would a caller add a brand new tree (not derived from modifying an existing one) to the forest? Is the idea that you can call modify_tree on the empty tree root to create one, or is add_tree planned for a follow-up?

/// `operations`.
/// - If applying the provided `operations` results in no changes to a given lineage of trees in
/// the forest, then no new tree must be allocated in that lineage.
fn modify_forest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new Backend trait is simpler and does not have explicit begin/commit methods. But I am curious how error recovery would work for persistent backends. If modify_forest partially succeeds (say, updates 3 of 5 trees) and then hits an error, what is the expected state? Should the trait doc clarify whether implementations must be atomic (all-or-nothing) or whether partial failures are allowed?

(this is the thrust of my comments on abort /rollback in #700, BTW — though I'm more interested in the atomicity contract in the abstract sense than how that's conveyed in the API, here)

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - but we are pretty close.

I did not review InMemoryPrefix or subtree-related code. I'd probably remove them from this PR because this is more about aligning on the LargeSmtForest interface, and it would speed up review/merge process.

Comment on lines +57 to +63
/// Returns the version of the tree with the provided `root`.
fn version(&self, root: Word) -> Result<VersionId, Self::Error>;

/// Returns an iterator over all the tree roots and versions that the backend knows about.
///
/// The iteration order is unspecified.
fn versions(&self) -> Result<impl Iterator<Item = (Word, VersionId)>, Self::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend doesn't need to know anything about the versions - it should store only the latest states of the trees. So, I don't think these methods are needed.

Comment on lines +68 to +82
/// Performs the provided `operations` on the tree with the provided `root`, returning the new
/// root.
///
/// Implementations must guarantee the following behavior, with non-conforming implementations
/// considered to be a bug:
///
/// - At most one new root must be added to the forest for the entire batch.
/// - If applying the provided `operations` results in no changes to the tree, no new tree must
/// be allocated.
fn modify_tree(
&mut self,
root: Word,
new_version: VersionId,
operations: SmtUpdateBatch,
) -> Result<Word, Self::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return a MutationSet that describes "reverse mutations" - similar to what we get from LargeSmt::apply_mutations_with_reversion().

Also, nit: I'd rename operation -> updates and the method name to update_tree. So, the signature would look something like:

fn update_tree(
    &mut self,
    root: Word,
    new_version: VersionId,
    updates: SmtUpdateBatch,
) -> Result<MutationSet, Self::Error>;

Comment on lines +97 to +101
fn modify_forest(
&mut self,
operations: SmtForestUpdateBatch,
) -> Result<Map<Word, Word>, Self::Error>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above comment, we should be returning mutation sets from here. The signature could look something like:

fn update_forest(
    &mut self,
    upsates: SmtForestUpdateBatch,
) -> Result<Vec<MutationSet>, Self::Error>;

Comment on lines +338 to +350
// We start by clearing any history for which the `version` corresponds to the latest
// version and hence the full tree.
self.backend.versions()?.for_each(|(root, v)| {
if v == version {
self.histories
.get_mut(&root)
.expect(
"A full tree did not have a corresponding history, but is required
to",
)
.clear();
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why we need this. Wouldn't code on lines 354 - 357 below do the same thing?

Comment on lines +352 to +356
// Then we just run through all the histories and truncate them to this version if needed,
// which provides the correct behaviour.
self.histories.values_mut().for_each(|h| {
h.truncate(version);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the history becomes empty post truncation, should we remove it from self.histories map. Or is the idea to keep histories for all roots managed by the forest - even if some histories are empty?

If we do keep entries for all roots, it may be good to have a separate map which keeps track of trees with non-empty histories. This way, we won't need to iterate over all histories here (which could be expensive because there could be many millions of trees in the forest).

Comment on lines +141 to +146
pub fn roots(&self) -> Set<Word> {
let mut roots: Set<Word> = self.histories.keys().cloned().collect();
self.histories.values().for_each(|h| roots.extend(h.roots()));
roots.insert(*EmptySubtreeRoots::entry(SMT_DEPTH, 0));
roots
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to one of my previous comments: this would work only if we keep histories for all latest roots (even if histories are empty). This is probably a good idea and should simplify some things.

Assuming the above, this should probably return an iterator as there could be many millions of roots in the forest.

Separately, it would also be good to have a method that returns a set (or an iterator) of all current roots (omitting the historical ones).

Comment on lines +48 to +56
/// A batch of operations that can be performed on an arbitrary tree in a forest.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct SmtUpdateBatch {
/// The operations to be performed on a tree.
operations: Vec<Operation>,

/// The version that corresponds to the tree produced by applying these `operations`.
version: VersionId,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in one of the comments, I don't think we need to keep track of the version here. Or if we do, we don't need to pass the version into LargeSmtForest::modify_tree() method separately.

Comment on lines +122 to +125
pub struct SmtForestUpdateBatch {
/// The operations associated with each targeted tree in the forest.
operations: Map<Word, SmtUpdateBatch>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the version in the SmtUpdateBatch struct, then we should probably add version here too. We'll also need to make sure that all updates in SmtForestUpdateBatch have the same version.

Comment on lines +152 to +156
pub fn operations(&mut self, root: Word, version_if_ne: VersionId) -> &mut SmtUpdateBatch {
self.operations
.entry(root)
.or_insert_with(|| SmtUpdateBatch::empty(version_if_ne))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment, all tree updates that are a part of SmtForestUpdateBatch should have the same version. So, we don't need to pass version_if_ne parameter here.

Comment on lines +10 to +12
/// The queried root corresponds to a tree that is _not_ the latest version of a given tree in
/// the forest.
HistoricalVersion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be HistoricalVersion(VersionId).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merkle Related to Merkle trees or MMRs no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants