Skip to content

Conversation

bernard-avalabs
Copy link
Contributor

Second implementation of adding parallel updates to a Merkle trie. In this approach, each child of the root node is a separate trie that can be modified independently by a separate worker thread. Four steps are needed to create a parallel proposal:

  1. Prepare: The trie is modified such that the root is always a branch with an empty partial path. This prevents any update from replacing the root node with one of its children, which is necessary to allow the sub-tries to be operated on independently.
  2. Split: The updates from a batch are split and sent to different workers based on the first nibble of their corresponding keys.
  3. Merge: After completing the batch, each worker returns the root of its sub-trie to the main thread. The main thread attaches the sub-trie roots to the children array of the root node.
  4. Post-process: The Merkle trie is returned to its canonical form. This eliminates cases where a trie has a root node with no value and only one single child, or a trie has a root branch node with no value and no children.

The main changes are in parallel.rs. A test case (test_propose_parallel) has been added to test parallel insertion/deletion.

…s/firewood into bernard/insert-worker-pool-rayon
a new branch node with an empty partial path is added to support
parallel insertion. Appears to be working.
@rkuris rkuris removed the request for review from demosdemon September 6, 2025 16:59
Copy link
Member

@rkuris rkuris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull out the changes to insert/delete/delete_range by Path and let's review that separately. I care a lot about reducing allocations in that code.

@rkuris
Copy link
Member

rkuris commented Oct 6, 2025

This needs the lockfile updated. Please see #1321 for some instructions as this changed fairly recently.

// Check if the root has a value or if there is more than one child. If yes, then
// just return the root unmodified
if branch.value.is_some() || children_iter.next().is_some() {
return Ok(Some((*branch).into()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unboxes and boxes the branch, which means allocate, copy, free. Use a move instead, which might mean you need a new method on Node (maybe impl From<Box<BranchNode>>>) or just Node::Branch(branch) might work.

let mut merkle = Merkle::from(worker_nodestore);

// Wait for a message on the receiver child channel. Break out of loop if there is an error.
while let Ok(request) = child_receiver.recv() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing the sender without sending Done should be caught here, at least a log entry.


/// Get a worker from the worker pool based on the `first_nibble` value. Create a worker if
/// it doesn't exist already.
fn get_worker(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rust avoids naming things "get". Just worker would be fine here.

Why this isn't caught in clippy is a good question.

.send(Request::DeleteRange {
prefix: Box::default(), // Empty prefix
})
.expect("TODO: handle error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's handle this error better than panic.

Comment on lines +343 to +344
.build()
.expect("Error in creating threadpool")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling can be better here, maybe:

Suggested change
.build()
.expect("Error in creating threadpool")
.build()?

Comment on lines +357 to +359
.expect("Should have a root node after prepare step")
.into_branch()
.expect("Root should be a branch after prepare step");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises some concerns about the type system but I don't have a good easy solution.

key: op.key().as_ref().into(),
value: value.as_ref().into(),
})
.expect("send to worker error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These expects should also get cleaned up. We want to avoid panics and instead return errors back. Otherwise, the root cause of the problem will be hard to identify.

Same for the other cases below.

self.workers = [(); BranchNode::MAX_CHILDREN].map(|()| None);

let immutable: Arc<NodeStore<Arc<ImmutableProposal>, FileBacked>> =
Arc::new(proposal.try_into().expect("error creating immutable"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fail, and we should return that error. An example would be a giant node due to too large of a value.

Comment on lines +442 to +447
pub fn take_child(&mut self, child_index: u8) -> Option<Child> {
self.children
.get_mut(child_index as usize)
.expect("index error")
.take()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we don't call expect here. Brandon recently made some changes here that might make it easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants