feat(fuzz): ast-seeded dictionary #12015

0xrusowsky · 2025-10-08T08:08:16Z

Motivation

closes #10233

Solution

leverage solar::sema::Compiler to collect all relevant AST literals found in the sources (excluding libs and scripts) and seed the FuzzerDictionary with them at initialization.
modify the strategies to source from a new pool of values (AST literals), when available
modify the string strategy to not always generate random strings, but also source from the string literals pool

TODO

fix unit tests, as they expect different amount of runs to break --> however it is better to get feedback on the impl before fixing all tests, and having to fix them again later

Future improvements

explore literal value folding as suggested in: feat(fuzz): ast-seeded dictionary #12015 (comment). Tracked on a separate issue: feat(fuzz): impl constant folding for typical expressions #12044
use HIR rather than AST to figure out the type of the numeric literals, so that we can minimize dict size

PR Checklist

Added Tests
Added Documentation
Breaking changes

crates/cli/src/opts/build/utils.rs

crates/evm/fuzz/src/strategies/param.rs

0xrusowsky · 2025-10-09T08:10:52Z

crates/evm/fuzz/src/strategies/param.rs

+                .prop_flat_map(move |(use_ast_index, select_index)| {
+                    let dict = state_clone.dictionary_read();
+
+                    // AST string literals available: use 30/70 allocation


arbitrary value, we could change it if you are opinionated.

grandizzy · 2025-10-09T08:22:55Z

crates/evm/fuzz/src/strategies/state.rs

+
+        // Seed dict with AST literals if analysis is available.
+        if let Some(literals) = analysis {
+            dictionary.ast_values = Some(literals);


IMO better to keep this simpler and just insert / reuse the existing fuzz dict samples - insert_sample_values which stores values by type and use them during fuzz runs instead having new strategy / weights and ast_values. We probably need to make the samples limit configurable and bump the default value

foundry/crates/evm/fuzz/src/strategies/state.rs

Lines 374 to 377 in 020d515

/// Insert sample values that are reused across multiple runs.

/// The number of samples is limited to invariant run depth.

/// If collected samples limit is reached then values are inserted as regular values.

pub fn insert_sample_values(

0xalpharush · 2025-10-09T13:15:33Z

Not blocking but I would recommend implementing constant folding to some degree i.e. evaluate 2 * 2 ether, evaluate bytes32 IMPLEMENTATION_SLOT = bytes32(uint256(keccak256('eip1967.proxy.implementation')) - 1);, evaluate uint(-2). Arguably, solar will need this and the code could live there

See crytic/echidna#636

grandizzy · 2025-10-09T13:20:56Z

Not blocking but I would recommend implementing constant folding to some degree i.e. evaluate 2 * 2 ether, perform the keccack hash of a string, evaluate uint(-2).

See crytic/echidna#636

thanks! One thing here - this means we should collect from tests too which we don't do in PR, is this correct?

0xalpharush · 2025-10-09T13:26:35Z

AFAIK neither Echidna or slither's printer filters tests out. I think not including forge-std makes sense.

Also, I am not sure how the push/pop/log dictionary is managed currently in Foundry, but I think Echidna will always keep the constant pool around and eject the dynamically collected values after running a full sequence. For example, a user's balance that is emitted in one run may help within the same sequence but probably unlikely to help in a totally unrelated sequence.

grandizzy · 2025-10-09T13:51:21Z

AFAIK neither Echidna or slither's printer filters tests out. I think not including forge-std makes sense.

👍 @0xrusowsky let's include too

Also, I am not sure how the push/pop/log dictionary is managed currently in Foundry, but I think Echidna will always keep the constant pool around and eject the dynamically collected values after running a full sequence. For example, a user's balance that is emitted in one run may help within the same sequence but probably unlikely to help in a totally unrelated sequence.

The push / pop dictionary + db addresses / storage values are populated when test starts and used across all runs, without being evicted.
These are defined as

foundry/crates/evm/fuzz/src/strategies/state.rs

Lines 129 to 134 in b823ae0

    
               /// Number of state values initially collected from db. 
        
               /// Used to revert new collected values at the end of each run. 
        
               db_state_values: usize, 
        
               /// Number of address values initially collected from db. 
        
               /// Used to revert new collected addresses at the end of each run. 
        
               db_addresses: usize,

and collected when dict is created

foundry/crates/evm/fuzz/src/strategies/state.rs

Lines 52 to 54 in b823ae0

    
           // Create fuzz dictionary and insert values from db state. 
        
           let mut dictionary = FuzzDictionary::new(config); 
        
           dictionary.insert_db_values(accs);

We also maintain a dict of so called sample values, dynamically collected from logs, return values & state changes of runs up to a limit (set rn to the test depth) - these are also reused across all runs

foundry/crates/evm/fuzz/src/strategies/state.rs

Lines 135 to 136 in b823ae0

    
               /// Sample typed values that are collected from call result and used across invariant runs. 
        
               sample_values: HashMap<DynSolType, B256IndexSet>,

Then there are the regular values dynamically collected from runs that are not shared between runs

foundry/crates/evm/fuzz/src/strategies/state.rs

Lines 123 to 124 in b823ae0

/// Collected state values.

state_values: B256IndexSet,

Please let us know if you see any redundant data / ways to improve the dict. Thank you!

0xrusowsky · 2025-10-09T14:21:45Z

^ note that AST literals are injected into sample_values

0xrusowsky · 2025-10-10T06:18:34Z

Not blocking but I would recommend implementing constant folding to some degree i.e. evaluate 2 * 2 ether, evaluate bytes32 IMPLEMENTATION_SLOT = bytes32(uint256(keccak256('eip1967.proxy.implementation')) - 1);, evaluate uint(-2). Arguably, solar will need this and the code could live there

See crytic/echidna#636

thanks for the advise! will be tackled next on a follow-up PR:

feat(fuzz): impl constant folding for typical expressions #12044

grandizzy

thank you, looks good! left some comments / nits, pls check

grandizzy · 2025-10-10T06:58:08Z

crates/evm/evm/src/inspectors/stack.rs

        if let Some(config) = cheatcodes {
            let mut cheatcodes = Cheatcodes::new(config);
            // Set analysis capabilities if they are provided
            if let Some(analysis) = analysis {


the analysis here is technically the compiler, not the analysis per se, should we process / analyze already, smth like

if let Some(compiler) = compiler { let ast_analysis = AstAnalysis::new(compiler); cheatcodes.set_struct_defs(ast_analysis.get_struct_defs().clone()); stack.set_ast_analysis(ast_analysis); }

and we consolidate AST analysis in single place instead have parts of it in cheatcodes, parts in stack? Then in EvmFuzzState::new we just pass AstAnalysis and populate dict with AstAnalysis words, strings and bytes - side note, in this way we could pass to fuzzer also the enums to be used for #6623 but that's different scope and complex (it affects mutations as well) as @DaniPopes pointed out

i was also thinking about where to place the LiteralsCollector, and i guess it could also make sense to upstream it to the inspector stack so that other inspectors could benefit from it.

however, i wouldn't eagerly perform the analysis as you suggest here, as most of the times you won't need to use the analysis capabilities (i.e. only a small subset of tests will use the cheatcodes that require struct defs).

also, i expect each consumer (inspector) to have different needs, hence why i thought having a more granular approach and implementing the actual analysis capabilities on each inspector (i.e. crates/evm/fuzz/src/strategies/state.rs, crates/cheatcodes/src/inspector/analysis.rs) would make more sense 🤔

let's see what @DaniPopes prefers and we can do what majority thinks its best? haha

should we then analyze at build time and cache the values / write them to disk, then lazy loading what's needed in different components and only when / where needed? this will also mean we don't need to analyze each time we forge test

grandizzy · 2025-10-10T07:23:26Z

crates/evm/fuzz/src/strategies/param.rs

+                .prop_flat_map(move |(use_ast_index, select_index)| {
+                    let dict = state_clone.dictionary_read();
+
+                    // AST string literals available: use 30/70 allocation


IMO this should follow the sample rules, we already have logic / bias to select them

https://github.com/foundry-rs/foundry/pull/12015/files#diff-d37d278bbc4bfc5240900ba4963f1a0f562f98808670ca692658aed9e0fdf624R128-R130

maybe we could reuse same and return DynSolValues from ast analyzed String / bytes here?

wdym exactly?

bias is a randomly generated bool (50-50) but allocating 50% to ast seeded literals feels like a lot (before it was 0-100).

my idea was that by using Index we can allocate a smaller pct to AST string literals, but we are already using them (30% of the time)

grandizzy · 2025-10-10T07:30:14Z

crates/evm/fuzz/src/strategies/param.rs

-                    let max_int_plus1 = U256::from(1).wrapping_shl(n - 1);
-                    let num = I256::from_raw(uint.wrapping_sub(max_int_plus1));
+                    // Extract lower N bits
+                    let uint_n = U256::from_be_bytes(value.0) % U256::from(1).wrapping_shl(n);


good catch, need to make some more tests to see how this affects overall perf

grandizzy · 2025-10-10T07:32:36Z

crates/evm/fuzz/src/strategies/state.rs

+}
+
+#[derive(Clone, Default, Debug)]
+pub struct LiteralMaps {


as in comment above, would be nice to have all AST analysis consolidated and performed only once, these could be good candidates to move there. Let's add comments to the enum / structs and their members too

works

github-project-automation bot added this to Foundry Oct 8, 2025

0xrusowsky commented Oct 9, 2025

View reviewed changes

crates/cli/src/opts/build/utils.rs Show resolved Hide resolved

0xrusowsky marked this pull request as ready for review October 9, 2025 08:05

0xrusowsky requested review from DaniPopes, grandizzy, mattsse, onbjerg and zerosnacks as code owners October 9, 2025 08:05

0xrusowsky commented Oct 9, 2025

View reviewed changes

crates/evm/fuzz/src/strategies/param.rs Outdated Show resolved Hide resolved

0xrusowsky commented Oct 9, 2025

View reviewed changes

grandizzy reviewed Oct 9, 2025

View reviewed changes

0xrusowsky added 11 commits October 10, 2025 07:46

feat(test): ast-seeded fuzzer dictionary

7f0d2ba

test: add unit tests

2809ba5

test: add unit tests

9bea99a

fix: default config test

83542a4

fix: merge sample_values and ast_values.words

2a6c8c4

fix: typos

738a7f6

better test

951e0cd

chore: move LiteralsCollector to the fuzz crate

07a5dca

feat: bytes support

7316df5

feat: int support

dc063ca

style: clippy

fc4f3d4

0xrusowsky force-pushed the rusowsky/ast-fuzz-dict branch from 6f35d1b to fc4f3d4 Compare October 10, 2025 05:46

fix: bump max dict values

3440429

0xrusowsky force-pushed the rusowsky/ast-fuzz-dict branch from a9373d6 to 3440429 Compare October 10, 2025 05:51

0xrusowsky added 2 commits October 10, 2025 07:58

style: simplify tests

5ea2f8d

style: cmnts

bf49a49

0xrusowsky changed the title ~~feat(test): ast-seeded fuzzer dictionary~~ feat(fuzz): ast-seeded dictionary Oct 10, 2025

0xrusowsky mentioned this pull request Oct 10, 2025

feat(fuzz): impl constant folding for typical expressions #12044

Open

0xrusowsky added 2 commits October 10, 2025 08:25

fix: test

6477f11

fix: test

58e45f9

grandizzy reviewed Oct 10, 2025

View reviewed changes

0xrusowsky added 6 commits October 10, 2025 09:59

feat: insert all possible uint types that fit

e4e5b4d

test: turn unit256 to uint64 to ensure discovery of smaller uints

6aaa729

works

test: add LiteralCollector coverage and size tests

cba4f57

test: simplify

b56e2ae

style: avoid typo error

c7ff115

test: revert should_fuzz_literals changes

935c435

	/// Insert sample values that are reused across multiple runs.
	/// The number of samples is limited to invariant run depth.
	/// If collected samples limit is reached then values are inserted as regular values.
	pub fn insert_sample_values(

feat(fuzz): ast-seeded dictionary #12015

Are you sure you want to change the base?

feat(fuzz): ast-seeded dictionary #12015

Uh oh!

Conversation

0xrusowsky commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

TODO

Future improvements

PR Checklist

Uh oh!

Uh oh!

Uh oh!

0xrusowsky Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

0xalpharush commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandizzy commented Oct 9, 2025

Uh oh!

0xalpharush commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandizzy commented Oct 9, 2025

Uh oh!

0xrusowsky commented Oct 9, 2025

Uh oh!

0xrusowsky commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandizzy left a comment

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

0xrusowsky Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

0xrusowsky Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

grandizzy Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0xrusowsky commented Oct 8, 2025 •

edited

Loading

0xrusowsky Oct 9, 2025 •

edited

Loading

0xalpharush commented Oct 9, 2025 •

edited

Loading

0xalpharush commented Oct 9, 2025 •

edited

Loading

0xrusowsky commented Oct 10, 2025 •

edited

Loading

0xrusowsky Oct 10, 2025 •

edited

Loading