Create the storage abstraction for the new forest #700

iamrecursion · 2025-12-10T10:43:26Z

Describe your changes

This commit contains the Storage and StoredTreeHandle traits that encapsulate the operations that must be provided by a storage backend. They are designed so that the backend provides simple data operations, while the actual logic resides in the forest itself. This gives possible implementations enough leeway to be efficient while also providing the necessary functionality.

With no concrete implementations included in this commit, this serves primarily as documentation of the design space. The first concrete implementation of Storage and StoredTreeHandle will come in the commit that contains the SmtForest type itself.

A few places in the current code are marked as #[allow(dead_code)]. This is to prevent warnings from failing the build while still allowing the private code to be merged incrementally.

I also argue that, as this is internal code, it does not require a changelog entry. Please let me know if you disagree.

Checklist before requesting a review

Repo forked and branch created from next according to naming convention.
Commit messages and codestyle follow conventions.
Relevant issues are linked in the PR description.
Tests added for new functionality.
Documentation/comments updated according to changes.

bobbinth · 2026-01-05T08:46:15Z

miden-crypto/src/merkle/smt/large_forest/storage/mod.rs

+/// All methods are intended to handle potential storage errors by returning the trait's [`Result`]
+/// type. Suggested errors are documented for each method, but these may be changed by any concrete
+/// implementation of the trait.
+pub trait Storage


I'm wondering if we are overcomplicating this interface quite a bit. Specifically, I think we should be able to leave it up to the specific storage implementation to decide how to manage in-memory vs. on-disk data and how to implement transaction boundaries.

I'd rather make the storage interface simpler and move the complexity into the storage implementations. There are very few things that I think we need from the storage:

Batch mutation: given a tree root and a list of updates (both insertions and removals), update the underlying tree and return MutationSet with inverse mutations. This could be equivalent to a combination of LargeSmt::compute_mutations() and LargeSmt::apply_mutations_with_reversion().

Lookups: ability to get an SmtProof for a given key in a given tree - basically equivalent to LargeSmt::open().

There may be other utility methods that could be required (e.g., getting all tree roots or something similar), but these would be the main ones.

Maybe I'm misunderstanding your point here, but that really seems to me like effectively pushing all of the logic for the forest itself into the storage. I was under the impression that we wanted to keep storage dumb as rocks, where providing these operations requires the storage to be more complex, which goes against our previous discussions.

Maybe "storage" is not the right term here. Basically, at the SmtForest level, I'd like to end up with two high-level components:

In-memory histories - i.e., histories: Map<Word, History>

Latest versions of SMTs. This component would have the interface similar I described above.

Maybe there would be another layer within the second component that makes it customizable - but we could also decide on the interface for this later. For example, the simple in-memory implementation of the second component could be:

pub struct InMemorySmtStorage { maps: Map<Word, Smt>, }

The RocksDB-based one would look quite different of course - but I'm not sure we need to re-invent much for the in-memory approach.

I will summarise the discussion on Slack here when we get somewhere, but my instinct here is that this separation will reduce the available scope for data-parallelism, which is something that the future persistence will rely on heavily.

More to come...

huitseeker · 2026-01-05T10:29:36Z

miden-crypto/src/merkle/smt/large_forest/storage/mod.rs

+
+/// The backing storage for the large SMT forest, providing the necessary methods for reading
+/// from and writing to an arbitrary storage structure while allowing the forest itself to be
+/// storage agnostic.


I wonder if this interface could be simplified by pushing more logic into the forest and keeping the storage trait minimal.

For comparison, the minibytes crate from Sapling (https://github.com/facebook/sapling/blob/13b6958fbbf863cb4b08d09d158caff95e834518/eden/scm/lib/minibytes/src/lib.rs) uses a tiny BytesOwner trait that's basically just AsRef<[u8]> + Send + Sync + 'static. All the slicing, cloning, and lifetime logic lives in the Bytes wrapper type, not the trait. The owner just provides raw bytes.

That pattern seems to align with what @bobbinth is suggesting above: let the storage be a simple data store, and let the forest handle tree semantics. The existing SmtStorage trait in large/storage/mod.rs does something similar with its apply(StorageUpdates) method, it takes a batch of changes rather than exposing fine-grained primitives.

Unfortunately treating storage simply as bytes is not sufficient here. Storage has to have some knowledge of what operations are required to be able to efficiently perform access patterns for performance.

I want to keep storage as stupid as possible, and indeed I think what I have proposed here is that. It doesn't perform any actual tree operations—which is why the equivalent to apply cannot exist and we use transactions instead—but it provides semantically-relevant operations to allow the actual forest to perform the tree operations in a performant way.

You point to SmtStorage, but it also provides a whole host of finer-grained methods that are used by LargeSmt where relevant.

huitseeker · 2026-01-05T10:30:10Z

miden-crypto/src/merkle/smt/large_forest/storage/mod.rs

+    /// - [`StorageError::Leaf`] if a malformed leaf is found during the query.
+    fn entry_count(&self) -> Result<Word>;
+
+    /// Sets the number of **unique entries** that are currently stored across all leaf nodes for


Should this use usize instead of Word? Everywhere else in the codebase entry_count is a usize:

large/storage/mod.rs:49: fn entry_count(&self) -> Result<usize, StorageError>

large/mod.rs:286: entry_count: usize

large/batch_ops.rs: uses isize for deltas

A Word is 256 bits which seems like overkill for a count.

I picked Word because technically there can be $2^{256}$ entries in the forest, but if you think that ever exceeding $2^{64}$ is effectively impossible I'm happy to change it to usize.

huitseeker · 2026-01-05T10:30:17Z

miden-crypto/src/merkle/smt/large_forest/storage/mod.rs

+    ///   did not allocate to the caller.
+    fn commit(&self, handle: Self::TransactionHandle) -> Result<()>;
+
+    /// Gets a handle to the storage as if it only consists of the data for the tree with the


Is there intentionally no rollback or abort method? The docs mention that backends supporting transactions should guard operations on there being an active transaction, but if something fails between begin() and commit(), how does the caller signal that the transaction should be abandoned?

I'm mostly considering data-consistency for persistent data stores in the case of crashes, rather than any kind of transaction failure that could occur at runtime. Can you explain the kind of scenarios that you are envisioning?

This commit contains the `Storage` and `StoredTreeHandle` traits that encapsulate the operations that must be provided by a storage backend. They are designed so that the backend provides simple data operations, while the actual logic resides in the forest itself. This gives possible implementations enough leeway to be efficient while also providing the necessary functionality. With no concrete implementations included in this commit, this serves primarily as documentation of the design space. The first concrete implementation of `Storage` and `StoredTreeHandle` will come in the commit that contains the `SmtForest` type itself.

iamrecursion · 2026-01-08T11:31:17Z

Closing in favour of an upcoming PR with a different approach.

iamrecursion requested a review from bobbinth December 10, 2025 10:43

iamrecursion self-assigned this Dec 10, 2025

iamrecursion added no changelog This PR does not require an entry in the `CHANGELOG.md` file merkle Related to Merkle trees or MMRs labels Dec 10, 2025

iamrecursion force-pushed the forest-storage-interface branch from 6f31889 to 042b935 Compare December 10, 2025 10:43

iamrecursion marked this pull request as draft December 10, 2025 10:44

iamrecursion force-pushed the forest-storage-interface branch 6 times, most recently from 265ab72 to 84056a0 Compare December 10, 2025 11:57

iamrecursion marked this pull request as ready for review December 10, 2025 13:57

This was referenced Dec 10, 2025

Implement the Prefix type for the new forest #701

Closed

Sketch out the forest interface #702

Closed

iamrecursion force-pushed the forest-storage-interface branch from 84056a0 to 5556cd8 Compare December 29, 2025 09:37

bobbinth reviewed Jan 5, 2026

View reviewed changes

huitseeker reviewed Jan 5, 2026

View reviewed changes

iamrecursion force-pushed the forest-storage-interface branch 2 times, most recently from 6ada5f6 to 1571406 Compare January 7, 2026 10:18

iamrecursion force-pushed the forest-storage-interface branch from 1571406 to 416bd71 Compare January 8, 2026 11:28

iamrecursion closed this Jan 8, 2026

iamrecursion deleted the forest-storage-interface branch January 8, 2026 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create the storage abstraction for the new forest #700

Create the storage abstraction for the new forest #700

Uh oh!

iamrecursion commented Dec 10, 2025

Uh oh!

bobbinth Jan 5, 2026

Uh oh!

iamrecursion Jan 6, 2026

Uh oh!

bobbinth Jan 6, 2026

Uh oh!

iamrecursion Jan 7, 2026 •

edited

Loading

Uh oh!

huitseeker Jan 5, 2026 •

edited

Loading

Uh oh!

iamrecursion Jan 6, 2026

Uh oh!

huitseeker Jan 5, 2026

Uh oh!

iamrecursion Jan 6, 2026 •

edited

Loading

Uh oh!

huitseeker Jan 5, 2026

Uh oh!

iamrecursion Jan 6, 2026

Uh oh!

iamrecursion commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Create the storage abstraction for the new forest #700

Create the storage abstraction for the new forest #700

Uh oh!

Conversation

iamrecursion commented Dec 10, 2025

Describe your changes

Checklist before requesting a review

Uh oh!

bobbinth Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

iamrecursion Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

bobbinth Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

iamrecursion Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huitseeker Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamrecursion Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

huitseeker Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

iamrecursion Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huitseeker Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

iamrecursion Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

iamrecursion commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iamrecursion Jan 7, 2026 •

edited

Loading

huitseeker Jan 5, 2026 •

edited

Loading

iamrecursion Jan 6, 2026 •

edited

Loading