Skip to content

Conversation

@iamrecursion
Copy link
Collaborator

@iamrecursion iamrecursion commented Jan 9, 2026

Describe your changes

This commit primarily exists to sketch out the interface for the new forest implementation, aimed at getting agreement on the interface and basic architecture before implementation begins in earnest. It does this directly on the LargeSmtForest type, which will be the user-facing portion of the API and likely later renamed to simply SmtForest.

As part of doing this, it includes a number of important utility types, including:

  • Backend, which defines the minimal set of features required for the SMT forest's pluggable backends. Backends are aware of tree semantics.
  • Operation, contained in SmtUpdateBatch and SmtForestUpdateBatch which define sets of modifications to be made to individual trees in the forest and the entire forest respectively.
  • InMemoryPrefix, which implements the in-memory portion of a tree for the backends that keep only some prefix in memory. This was originally implemented for a different approach, but the code is still valid and will be useful in the future, and so is still included.
  • A small set of utility types: SubtreeLevels, RootInfo, and LinearIndex. These were also implemented for a different approach but will be useful in the future.
  • An assortment of error types to ensure strongly-typed error handling throughout the SMT forest API.

Contains work toward the completion of #670. I have set it as "no changelog" as it is not yet intended to be part of the public API.

Checklist before requesting a review

  • Repo forked and branch created from next according to naming convention.
  • Commit messages and codestyle follow conventions.
  • Relevant issues are linked in the PR description.
  • Tests added for new functionality.
  • Documentation/comments updated according to changes.

@iamrecursion iamrecursion self-assigned this Jan 9, 2026
@iamrecursion iamrecursion added the merkle Related to Merkle trees or MMRs label Jan 9, 2026
@iamrecursion iamrecursion force-pushed the forest-skeleton branch 2 times, most recently from 8316321 to a67dff8 Compare January 9, 2026 11:23
@iamrecursion iamrecursion added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Jan 9, 2026
This commit primarily exists to sketch out the interface for the new
forest implementation, aimed at getting agreement on the interface and
basic architecture before implementation begins in earnest. It does this
directly on the `LargeSmtForest` type, which will be the user-facing
portion of the API and likely later renamed to simply `SmtForest`.

As part of doing this, it includes a number of important utility types,
including:

- `Backend`, which defines the minimal set of features required for the
  SMT forest's pluggable backends. Backends are aware of tree semantics.
- `Operation`, contained in `SmtUpdateBatch` and `SmtForestUpdateBatch`
  which define sets of modifications to be made to individual trees in
  the forest and the entire forest respectively.
- `InMemoryPrefix`, which implements the in-memory portion of a tree for
  the backends that keep only some prefix in memory. This was originally
  implemented for a different approach, but the code is still valid and
  will be useful in the future, and so is still included.
- A small set of utility types: `SubtreeLevels`, `RootInfo`, and
  `LinearIndex`. These were also implemented for a different approach
  but will be useful in the future.
- An assortment of error types to ensure strongly-typed error handling
  throughout the SMT forest API.
@iamrecursion iamrecursion marked this pull request as ready for review January 9, 2026 11:55
@iamrecursion
Copy link
Collaborator Author

I wasn't sure if it made sense to include the extra bits of implementation that will be useful in the future. If you don't want them as part of this PR I can pull them out until they are needed; my main worry was just them bitrotting and ending up with a difficult merge/rebase when they do get used. Let me know!

@iamrecursion
Copy link
Collaborator Author

@drahnr, does the API for LargeSmtForest satisfy what you were asking for in #670?

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simplification compared to the Storage trait from PR #700.

The decision to make backends "aware of tree semantics" (as the module doc says) but not responsible for the actual tree logic seems like a good middle ground.

/// the provided `backend`.
pub fn new(_backend: B) -> Result<Self, B::Error> {
todo!("LargeSmtForest::new")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at issue #670, the design sketch shows an add_tree method for adding new independent trees to the forest:

fn add_tree(&self, leaves: LeafMap) -> Result<Root>;

I do not see an equivalent in this skeleton. How would a caller add a brand new tree (not derived from modifying an existing one) to the forest? Is the idea that you can call modify_tree on the empty tree root to create one, or is add_tree planned for a follow-up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is currently that you just modify tree on an empty root. Is add_tree needed or useful?

/// `operations`.
/// - If applying the provided `operations` results in no changes to a given lineage of trees in
/// the forest, then no new tree must be allocated in that lineage.
fn modify_forest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new Backend trait is simpler and does not have explicit begin/commit methods. But I am curious how error recovery would work for persistent backends. If modify_forest partially succeeds (say, updates 3 of 5 trees) and then hits an error, what is the expected state? Should the trait doc clarify whether implementations must be atomic (all-or-nothing) or whether partial failures are allowed?

(this is the thrust of my comments on abort /rollback in #700, BTW — though I'm more interested in the atomicity contract in the abstract sense than how that's conveyed in the API, here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on this is that atomicity is now an internal concept to the Backend. It was never intended to be user-facing in the previous design, just exposed to the forest itself, as I don't think we ever want to contend with user-facing rollback.

The main reason the concept exists is to prevent data corruption in the persistent backend. If something like your above scenario occurs in the persistent backend the whole transaction will be backed out.

That said, I think documenting something is probably valuable. What would you expect to happen for the in-memory backend should updating the 4th tree fail? To roll back to the previous state, or allow the partial delta? The same question applies if the backend is part-way through applying operations and fails.

I'm not really an API client here, so understanding how this should behave is important to me!

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - but we are pretty close.

I did not review InMemoryPrefix or subtree-related code. I'd probably remove them from this PR because this is more about aligning on the LargeSmtForest interface, and it would speed up review/merge process.

Comment on lines 338 to 350
// We start by clearing any history for which the `version` corresponds to the latest
// version and hence the full tree.
self.backend.versions()?.for_each(|(root, v)| {
if v == version {
self.histories
.get_mut(&root)
.expect(
"A full tree did not have a corresponding history, but is required
to",
)
.clear();
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why we need this. Wouldn't code on lines 354 - 357 below do the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently. The history will refuse to remove the last delta as it can't know if the truncation point it is given is between the latest delta and the current version, or whether it is newer than the current version.

Comment on lines 352 to 356
// Then we just run through all the histories and truncate them to this version if needed,
// which provides the correct behaviour.
self.histories.values_mut().for_each(|h| {
h.truncate(version);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the history becomes empty post truncation, should we remove it from self.histories map. Or is the idea to keep histories for all roots managed by the forest - even if some histories are empty?

If we do keep entries for all roots, it may be good to have a separate map which keeps track of trees with non-empty histories. This way, we won't need to iterate over all histories here (which could be expensive because there could be many millions of trees in the forest).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended that we keep histories for all lineages in the forest as it simplifies history operation by not having to deallocate/reallocate histories. I had a comment stating this, but it clearly got lost in the shuffle! I've added it back, and also added a set tracking which histories actually contain deltas.

@drahnr
Copy link
Contributor

drahnr commented Jan 12, 2026

@drahnr, does the API for LargeSmtForest satisfy what you were asking for in #670?

Looks good, left 👍 at the specific APIs that 'd be needed.

This commit predominantly cleans up a few elements of the forest design,
but also removes some code that was originally included for later usage.
@iamrecursion
Copy link
Collaborator Author

I've pushed a commit that addresses the majority of comments left above. There are some open discussions in my mind with regards to particular behaviours, so if we could clarify those that would be great!

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a coupe more small comments inline.

But, I just realized that I missed one major thing: I don't think we can actually use tree roots to identify trees. This was possible in the old implementation of the SmtForest where subtrees between trees were shared. But in this implementation, it is perfectly possible for two (or more) trees to exist with the same root but actually be distinct logical trees.

The main example of this is that we have many accounts and it may so happen that two accounts end up having the same exact storage trees. This could happen at the same version or at different versions. For example, account A could have tree with root R1 at block 10, and account B could have a tree with root R1 at block 20. Or both of them could have trees with roots R2 at block 30.

The fact that there are multiple trees doesn't really matter for querying, but it does matter for updates. For example, if both accounts have the R2 tree at block 30 and account B gets updated, it's tree may become R3 at block 31 - but this should affect account As tree.

I haven't thought too much about it yet, but I think a solution could be to just pass something like context or domain together with a tree root. So, instead of saying "I want to do something with a tree with root R1" we'd be saying "I want to do something with a tree with root R1 in the context A".

I believe this should address the use cases of miden-node as we always know in the context of which account we are querying the SmtForest (but @drahnr, please confirm).

So, there are two ways to proceed:

  1. We can merge this PR and then make a follow-up PR to fix the shortcomings.
  2. We can fix things in this PR.

I have a slight preference for the first approach.

@iamrecursion iamrecursion merged commit 4be02a0 into next Jan 13, 2026
27 checks passed
@iamrecursion iamrecursion deleted the forest-skeleton branch January 13, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merkle Related to Merkle trees or MMRs no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants