Skip to content

Conversation

@iamrecursion
Copy link
Collaborator

Describe your changes

This commit contains the Storage and StoredTreeHandle traits that encapsulate the operations that must be provided by a storage backend. They are designed so that the backend provides simple data operations, while the actual logic resides in the forest itself. This gives possible implementations enough leeway to be efficient while also providing the necessary functionality.

With no concrete implementations included in this commit, this serves primarily as documentation of the design space. The first concrete implementation of Storage and StoredTreeHandle will come in the commit that contains the SmtForest type itself.

A few places in the current code are marked as #[allow(dead_code)]. This is to prevent warnings from failing the build while still allowing the private code to be merged incrementally.

I also argue that, as this is internal code, it does not require a changelog entry. Please let me know if you disagree.

Checklist before requesting a review

  • Repo forked and branch created from next according to naming convention.
  • Commit messages and codestyle follow conventions.
  • Relevant issues are linked in the PR description.
  • Tests added for new functionality.
  • Documentation/comments updated according to changes.

@iamrecursion iamrecursion self-assigned this Dec 10, 2025
@iamrecursion iamrecursion added no changelog This PR does not require an entry in the `CHANGELOG.md` file merkle Related to Merkle trees or MMRs labels Dec 10, 2025
@iamrecursion iamrecursion force-pushed the forest-storage-interface branch from 6f31889 to 042b935 Compare December 10, 2025 10:43
@iamrecursion iamrecursion marked this pull request as draft December 10, 2025 10:44
@iamrecursion iamrecursion force-pushed the forest-storage-interface branch 6 times, most recently from 265ab72 to 84056a0 Compare December 10, 2025 11:57
@iamrecursion iamrecursion marked this pull request as ready for review December 10, 2025 13:57
@iamrecursion iamrecursion force-pushed the forest-storage-interface branch from 84056a0 to 5556cd8 Compare December 29, 2025 09:37
/// All methods are intended to handle potential storage errors by returning the trait's [`Result`]
/// type. Suggested errors are documented for each method, but these may be changed by any concrete
/// implementation of the trait.
pub trait Storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we are overcomplicating this interface quite a bit. Specifically, I think we should be able to leave it up to the specific storage implementation to decide how to manage in-memory vs. on-disk data and how to implement transaction boundaries.

I'd rather make the storage interface simpler and move the complexity into the storage implementations. There are very few things that I think we need from the storage:

  1. Batch mutation: given a tree root and a list of updates (both insertions and removals), update the underlying tree and return MutationSet with inverse mutations. This could be equivalent to a combination of LargeSmt::compute_mutations() and LargeSmt::apply_mutations_with_reversion().
  2. Lookups: ability to get an SmtProof for a given key in a given tree - basically equivalent to LargeSmt::open().

There may be other utility methods that could be required (e.g., getting all tree roots or something similar), but these would be the main ones.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding your point here, but that really seems to me like effectively pushing all of the logic for the forest itself into the storage. I was under the impression that we wanted to keep storage dumb as rocks, where providing these operations requires the storage to be more complex, which goes against our previous discussions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "storage" is not the right term here. Basically, at the SmtForest level, I'd like to end up with two high-level components:

  1. In-memory histories - i.e., histories: Map<Word, History>
  2. Latest versions of SMTs. This component would have the interface similar I described above.

Maybe there would be another layer within the second component that makes it customizable - but we could also decide on the interface for this later. For example, the simple in-memory implementation of the second component could be:

pub struct InMemorySmtStorage {
    maps: Map<Word, Smt>,
}

The RocksDB-based one would look quite different of course - but I'm not sure we need to re-invent much for the in-memory approach.

Copy link
Collaborator Author

@iamrecursion iamrecursion Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will summarise the discussion on Slack here when we get somewhere, but my instinct here is that this separation will reduce the available scope for data-parallelism, which is something that the future persistence will rely on heavily.

More to come...


/// The backing storage for the large SMT forest, providing the necessary methods for reading
/// from and writing to an arbitrary storage structure while allowing the forest itself to be
/// storage agnostic.
Copy link
Contributor

@huitseeker huitseeker Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this interface could be simplified by pushing more logic into the forest and keeping the storage trait minimal.

For comparison, the minibytes crate from Sapling (https://github.com/facebook/sapling/blob/13b6958fbbf863cb4b08d09d158caff95e834518/eden/scm/lib/minibytes/src/lib.rs) uses a tiny BytesOwner trait that's basically just AsRef<[u8]> + Send + Sync + 'static. All the slicing, cloning, and lifetime logic lives in the Bytes wrapper type, not the trait. The owner just provides raw bytes.

That pattern seems to align with what @bobbinth is suggesting above: let the storage be a simple data store, and let the forest handle tree semantics. The existing SmtStorage trait in large/storage/mod.rs does something similar with its apply(StorageUpdates) method, it takes a batch of changes rather than exposing fine-grained primitives.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately treating storage simply as bytes is not sufficient here. Storage has to have some knowledge of what operations are required to be able to efficiently perform access patterns for performance.

I want to keep storage as stupid as possible, and indeed I think what I have proposed here is that. It doesn't perform any actual tree operations—which is why the equivalent to apply cannot exist and we use transactions instead—but it provides semantically-relevant operations to allow the actual forest to perform the tree operations in a performant way.

You point to SmtStorage, but it also provides a whole host of finer-grained methods that are used by LargeSmt where relevant.

/// - [`StorageError::Leaf`] if a malformed leaf is found during the query.
fn entry_count(&self) -> Result<Word>;

/// Sets the number of **unique entries** that are currently stored across all leaf nodes for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use usize instead of Word? Everywhere else in the codebase entry_count is a usize:

  • large/storage/mod.rs:49: fn entry_count(&self) -> Result<usize, StorageError>
  • large/mod.rs:286: entry_count: usize
  • large/batch_ops.rs: uses isize for deltas

A Word is 256 bits which seems like overkill for a count.

Copy link
Collaborator Author

@iamrecursion iamrecursion Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked Word because technically there can be $2^{256}$ entries in the forest, but if you think that ever exceeding $2^{64}$ is effectively impossible I'm happy to change it to usize.

/// did not allocate to the caller.
fn commit(&self, handle: Self::TransactionHandle) -> Result<()>;

/// Gets a handle to the storage as if it only consists of the data for the tree with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there intentionally no rollback or abort method? The docs mention that backends supporting transactions should guard operations on there being an active transaction, but if something fails between begin() and commit(), how does the caller signal that the transaction should be abandoned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly considering data-consistency for persistent data stores in the case of crashes, rather than any kind of transaction failure that could occur at runtime. Can you explain the kind of scenarios that you are envisioning?

@iamrecursion iamrecursion force-pushed the forest-storage-interface branch 2 times, most recently from 6ada5f6 to 1571406 Compare January 7, 2026 10:18
This commit contains the `Storage` and `StoredTreeHandle` traits that
encapsulate the operations that must be provided by a storage backend.
They are designed so that the backend provides simple data operations,
while the actual logic resides in the forest itself. This gives possible
implementations enough leeway to be efficient while also providing the
necessary functionality.

With no concrete implementations included in this commit, this serves
primarily as documentation of the design space. The first concrete
implementation of `Storage` and `StoredTreeHandle` will come in the
commit that contains the `SmtForest` type itself.
@iamrecursion iamrecursion force-pushed the forest-storage-interface branch from 1571406 to 416bd71 Compare January 8, 2026 11:28
@iamrecursion
Copy link
Collaborator Author

Closing in favour of an upcoming PR with a different approach.

@iamrecursion iamrecursion deleted the forest-storage-interface branch January 8, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merkle Related to Merkle trees or MMRs no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants