fix: bypass decorator retrieval in release mode #2529

huitseeker · 2026-01-04T08:21:33Z

This replaces #2524.

This fixes 4 issues in both legacy and fast processor, see individual commits:

when in_debug_mode == false, decorators should not be accessed, never mind executed,
when strip_decorators is false, the DebugInfo provided as replacement should not panic on access,
~~adds a --strip-decorators CLI option to the prove and run commands which strips decorators from the MastForest before operation,~~
the err_ctx! macro should respect in_debug_mode, that is it should not perform decorator access in debug mode.

Even without stripping decorators, the combination of 1. and 4. yields a substantive performance improvement:

Baseline (main branch)

Total Time: 6906 ms
Execution Time: 4015 ms (58.14%)
Trace Generation: 4015 ms for 1,048,576 steps (48% padded)

Optimized (bypass-decorator-retrieval-in-release-mode branch)

Total Time: 3533 ms
Execution Time: 289 ms (8.17%)
Trace Generation: 288 ms for 1,048,576 steps (48% padded)

Gate all decorator retrieval calls behind `in_debug_mode` checks, ensuring zero overhead when debugging is disabled. Processor changes: - before_enter/after_exit decorator loops - decorators_for_op in basic block execution FastProcessor changes: - execute_before_enter_decorators early return - execute_after_exit_decorators early return - decorators_for_op in basic block execution Includes spy tests to verify retrieval is bypassed.

Update strip_decorators() to create an empty but valid CSR structure instead of calling clear(), which removed the structure entirely and caused panics when accessing decorator information after stripping. - Add DebugInfo::empty_for_nodes(num_nodes) to create valid empty CSR - Update from_components to accept empty structures - Add edge case tests for empty forest, idempotency, multiple node types

bobbinth

Looks good! Thank you! I reviewed all non-test code and left a few comments inline (mostly for the future).

One other note, in this branch, we are not really differentiating execution vs. trace generation time. I think the benchmarks in the PR description listing execution around 300 ms also includes trace generation (just execution time should be about 10x faster than that). In next, I believe we already have this broken out - so, we can double-check when we apply these changes there.

miden-vm/src/cli/prove.rs

miden-vm/src/cli/run.rs

bobbinth · 2026-01-05T05:33:42Z

processor/src/fast/basic_block.rs

+        // Get the node ID once since it doesn't change within the loop
+        let node_id = basic_block
+            .linked_id()
+            .expect("basic block node should be linked when executing operations");


Not for this PR, but getting node ID from the node in this way feels a bit backwards to me. When we apply these changes to next, we could probably just pass in node_id instead of back_block into this function (and in general, we should think how to simplify the number of parameters passed into this function). This will require refactoring how we build error context - but it should be relatively easy (e.g., instead of node.get_assembly_op(mast_forest, target_op_idx) we should be able to do something like mast.get_assembly_op(node_id, target_op_idx).

You're right, but a couple of nuances:

obviously, this is just moving a local variable definition out of a hot loop, not changing the definition (in a PR to main),

there's upcoming work on removing err_ctx as part of Investigate removing ErrorContext #1978 that may make this obsolete.

This fix resolves a 20x performance degradation in release mode when decorators were present in the MastForest but not being executed. Root Cause: The err_ctx! macro was unconditionally calling node.get_assembly_op(), which traversed the CSR decorator storage on every operation (522,059 times in blake3 benchmark), even when in_debug_mode was false. This caused execution time to increase from 191ms to 3,884ms. Solution: Modified the error context creation pipeline to accept and respect the in_debug_mode flag: - Updated err_ctx! macro to require in_debug_mode parameter - Updated ErrorContextImpl::new() and new_with_op_idx() to accept in_debug_mode - Modified precalc_label_and_source_file() to return early when !in_debug_mode, avoiding expensive decorator traversal - Updated all err_ctx!() call sites in Process and FastProcessor to pass in_debug_mode flag Performance Impact: - Before: Op execution time 3,884ms (7,439ns/op) - After: Op execution time 191ms (366ns/op) - Improvement: 20.3x faster When in_debug_mode is false, decorators (including AsmOp decorators used for error context) are no longer accessed, even if present in the MastForest.

adr1anh

Looks good, just left a minor nit.

adr1anh · 2026-01-05T10:02:01Z

core/src/mast/debuginfo/mod.rs

+        let mut node_indptr_for_op_idx = IndexVec::new();
+        for _ in 0..=num_nodes {
+            let _ = node_indptr_for_op_idx.push(0);
+        }


Can we initialize the vector with zeros directly, rather than pushing one by one?

Not an API that's available on IndexVec. Should be fixed, just not on a PR to main. #2532

huitseeker · 2026-01-05T10:17:54Z

@bobbinth
Re: confusion between trace generation and execution time, here are sequential runs rebased on next:

Baseline (origin/next) - 3 Runs

  | Run     | execute_for_trace | build_trace | Total Time |
  |---------|-------------------|-------------|------------|
  | 1       | 3.58s             | 70.9ms      | 34,991ms   |
  | 2       | 3.57s             | 73.8ms      | 35,100ms   |
  | 3       | 3.56s             | 76.2ms      | 35,001ms   |
  | Average | 3.57s             | 73.6ms      | 35,031ms   |

  Decorator Bypass (decorator-bypass-on-next) - 3 Runs

  | Run     | execute_for_trace | build_trace | Total Time |
  |---------|-------------------|-------------|------------|
  | 1       | 10.7ms            | 86.0ms      | 31,428ms   |
  | 2       | 7.34ms            | 78.9ms      | 31,688ms   |
  | 3       | 11.3ms            | 83.4ms      | 31,857ms   |
  | Average | 9.78ms            | 82.8ms      | 31,658ms   |

This merge brings the decorator bypass optimization from main (#2529) into next. The changes adapt the decorator bypass optimization to next's architecture: - No changes to legacy Process (removed in next) - All changes apply to FastProcessor only - Decorator retrieval gated behind in_debug_mode checks - Error context creation respects in_debug_mode flag Performance impact: ~10% overall speedup, 99.7% reduction in trace execution time. Conflicts resolved using solutions from huitseeker/decorator-bypass-on-next rebase work.

* fix: bypass decorator retrieval in release mode Gate all decorator retrieval calls behind `in_debug_mode` checks, ensuring zero overhead when debugging is disabled. Processor changes: - before_enter/after_exit decorator loops - decorators_for_op in basic block execution FastProcessor changes: - execute_before_enter_decorators early return - execute_after_exit_decorators early return - decorators_for_op in basic block execution Includes spy tests to verify retrieval is bypassed. * fix(core): strip decorators while maintaining valid CSR structure Update strip_decorators() to create an empty but valid CSR structure instead of calling clear(), which removed the structure entirely and caused panics when accessing decorator information after stripping. - Add DebugInfo::empty_for_nodes(num_nodes) to create valid empty CSR - Update from_components to accept empty structures - Add edge case tests for empty forest, idempotency, multiple node types * fix(processor): gate error context decorator access with in_debug_mode This fix resolves a 20x performance degradation in release mode when decorators were present in the MastForest but not being executed. Root Cause: The err_ctx! macro was unconditionally calling node.get_assembly_op(), which traversed the CSR decorator storage on every operation (522,059 times in blake3 benchmark), even when in_debug_mode was false. This caused execution time to increase from 191ms to 3,884ms. Solution: Modified the error context creation pipeline to accept and respect the in_debug_mode flag: - Updated err_ctx! macro to require in_debug_mode parameter - Updated ErrorContextImpl::new() and new_with_op_idx() to accept in_debug_mode - Modified precalc_label_and_source_file() to return early when !in_debug_mode, avoiding expensive decorator traversal - Updated all err_ctx!() call sites in Process and FastProcessor to pass in_debug_mode flag Performance Impact: - Before: Op execution time 3,884ms (7,439ns/op) - After: Op execution time 191ms (366ns/op) - Improvement: 20.3x faster When in_debug_mode is false, decorators (including AsmOp decorators used for error context) are no longer accessed, even if present in the MastForest.

bobbinth · 2026-01-05T19:48:33Z

Re: confusion between trace generation and execution time, here are sequential runs rebased on next:

Baseline (origin/next) - 3 Runs

  | Run     | execute_for_trace | build_trace | Total Time |
  |---------|-------------------|-------------|------------|
  | 1       | 3.58s             | 70.9ms      | 34,991ms   |
  | 2       | 3.57s             | 73.8ms      | 35,100ms   |
  | 3       | 3.56s             | 76.2ms      | 35,001ms   |
  | Average | 3.57s             | 73.6ms      | 35,031ms   |

  Decorator Bypass (decorator-bypass-on-next) - 3 Runs

  | Run     | execute_for_trace | build_trace | Total Time |
  |---------|-------------------|-------------|------------|
  | 1       | 10.7ms            | 86.0ms      | 31,428ms   |
  | 2       | 7.34ms            | 78.9ms      | 31,688ms   |
  | 3       | 11.3ms            | 83.4ms      | 31,857ms   |
  | Average | 9.78ms            | 82.8ms      | 31,658ms   |

Nice! Thank you for running these!

So, it seems like our current execution runs roughly at 100 MHz, right? I think that's pretty nice, and in the future we could probably try to optimize it to get closer to 200 MHz.

Re-trace generation, it should be able take advantage of multi-threading - so, if we run with concurrent feature, I'd expect it to be closer to 20ms range.

huitseeker changed the title ~~Huitseeker/bypass decorator retrieval in release mode~~ fix: bypass decorator retrieval in release mode Jan 4, 2026

huitseeker added 2 commits January 4, 2026 03:29

huitseeker force-pushed the huitseeker/bypass-decorator-retrieval-in-release-mode branch from 1421357 to b817703 Compare January 4, 2026 08:30

huitseeker mentioned this pull request Jan 4, 2026

fix(core): strip decorators while maintaining valid CSR structure #2524

Closed

huitseeker marked this pull request as ready for review January 4, 2026 08:36

huitseeker requested review from Al-Kindi-0, adr1anh, bobbinth and plafer January 4, 2026 08:36

huitseeker force-pushed the huitseeker/bypass-decorator-retrieval-in-release-mode branch from b817703 to a704496 Compare January 4, 2026 08:59

bobbinth approved these changes Jan 5, 2026

View reviewed changes

huitseeker added 2 commits January 5, 2026 03:49

chore: Changelog

07031b0

huitseeker force-pushed the huitseeker/bypass-decorator-retrieval-in-release-mode branch from a704496 to 07031b0 Compare January 5, 2026 08:49

adr1anh approved these changes Jan 5, 2026

View reviewed changes

huitseeker mentioned this pull request Jan 5, 2026

feat: add From<Vec<T>> for IndexVec<I, T> #2532

Closed

huitseeker merged commit 097626f into main Jan 5, 2026
16 checks passed

huitseeker deleted the huitseeker/bypass-decorator-retrieval-in-release-mode branch January 5, 2026 10:57

huitseeker mentioned this pull request Jan 5, 2026

Merge main into next #2534

Merged

huitseeker mentioned this pull request Jan 6, 2026

MastForest deserialization fills index array with zeroes #2538

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: bypass decorator retrieval in release mode #2529

fix: bypass decorator retrieval in release mode #2529

Uh oh!

huitseeker commented Jan 4, 2026 •

edited

Loading

Uh oh!

bobbinth left a comment

Uh oh!

Uh oh!

Uh oh!

bobbinth Jan 5, 2026

Uh oh!

huitseeker Jan 5, 2026

Uh oh!

adr1anh left a comment

Uh oh!

adr1anh Jan 5, 2026

Uh oh!

huitseeker Jan 5, 2026 •

edited

Loading

Uh oh!

huitseeker commented Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

bobbinth commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: bypass decorator retrieval in release mode #2529

fix: bypass decorator retrieval in release mode #2529

Uh oh!

Conversation

huitseeker commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bobbinth Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

huitseeker Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

adr1anh left a comment

Choose a reason for hiding this comment

Uh oh!

adr1anh Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

huitseeker Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huitseeker commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bobbinth commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

huitseeker commented Jan 4, 2026 •

edited

Loading

huitseeker Jan 5, 2026 •

edited

Loading

huitseeker commented Jan 5, 2026 •

edited

Loading