Skip to content

Conversation

@berZKerk
Copy link
Contributor

When recording replay data for Hasher chiplet trace generation, operation batches for basic blocks were cloned every time a block was entered, causing unnecessary duplication when the same basic block was entered multiple times.

This change introduces a BTreeMap<Word, Vec> to store each unique set of operation batches only once, keyed by the basic block digest.
The HasherOp enum now stores only the digest instead of the full batches, reducing memory usage and improving performance for programs with repeated basic block executions.

Fixes #2250

/// Deduplicates operation batches by storing them in a map keyed by the basic block digest.
/// If the same basic block is entered multiple times, only one copy of the operation batches
/// is stored.
pub fn record_hash_basic_block(&mut self, op_batches: Vec<OpBatch>, expected_hash: Word) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests that verify deduplication works when the same basic block is executed multiple times.

The existing tests pass, but none specifically check that op_batches_map deduplicates correctly. A simple test could record the same digest three times and verify the map has one entry while hasher_ops has three.

Example:

#[test]
fn test_hasher_request_replay_deduplicates_basic_blocks() {
    let mut replay = HasherRequestReplay::default();
    let digest: Word = [Felt::new(1), Felt::new(2), Felt::new(3), Felt::new(4)];
    let op_batches = vec![OpBatch::new(vec![Operation::Add], Vec::new()).unwrap()];

    replay.record_hash_basic_block(op_batches.clone(), digest);
    replay.record_hash_basic_block(op_batches.clone(), digest);
    replay.record_hash_basic_block(op_batches.clone(), digest);

    assert_eq!(replay.op_batches_map().len(), 1);

    let (hasher_ops, _) = replay.into_parts();
    assert_eq!(hasher_ops.len(), 3);
}

alloc::collections::btree_map::Entry::Vacant(entry) => {
entry.insert(op_batches);
},
alloc::collections::btree_map::Entry::Occupied(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "we don't verify to avoid overhead", but a debug_assert_eq! would catch bugs during testing without runtime cost in release builds.

Consider:

alloc::collections::btree_map::Entry::Occupied(entry) => {
    debug_assert_eq!(
        entry.get(),
        &op_batches,
        "Same digest should always map to same op_batches"
    );
},

This guards against implementation bugs where different batches produce the same digest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impl IntoIterator for HasherRequestReplay still exists, but seems unused, now. It only returns hasher_ops and drops the op_batches_map. We should delete it to avoid confusion with into_parts.

@berZKerk berZKerk marked this pull request as draft January 5, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HasherOp: improve handling of basic blocks

2 participants