-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
rework fuzz testing to be smith based #25893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gooncreeper
wants to merge
5
commits into
ziglang:master
Choose a base branch
from
gooncreeper:integrated-smith
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+3,333
−1,491
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Obtaining it is expensive and usually redundent in the case of no leaks. Notably, between each unit test run the test runner resets the debug allocator's state, causing this to slow it down by a notable margin.
NicoElbers
reviewed
Nov 11, 2025
The end of the archive needs to also be aligned to a two-byte boundary, not just the start of records. This was causing lld to reject archives. Notably, this was happening with compiler_rt when rebuilding in fuzz mode, which is why this commit is included in this patchset.
SeanTUT
reviewed
Nov 11, 2025
7325b28 to
dd9d383
Compare
-- On the standard library side: The `input: []const u8` parameter of functions passed to `testing.fuzz` has changed to `smith: *testing.Smith`. `Smith` is used to generate values from libfuzzer or input bytes generated by libfuzzer. `Smith` contains the following base methods: * `value` as a generic method for generating any type * `eos` for generating end-of-stream markers. Provides the additional guarantee `true` will eventually by provided. * `bytes` for filling a byte array. * `slice` for filling part of a buffer and providing the length. `Smith.Weight` is used for giving value ranges a higher probability of being selected. By default, every value has a weight of zero (i.e. they will not be selected). Weights can only apply to values that fit within a u64. The above functions have corresponding ones that accept weights. Additionally, the following functions are provided: * `baselineWeights` which provides a set of weights containing every possible value of a type. * `eosSimpleWeighted` for unique weights for `true` and `false` * `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range of values. -- On the libfuzzer and abi side: --- Uids These are u32s which are used to classify requested values. This solves the problem of a mutation causing a new value to be requested and shifting all future values; for example: 1. An initial input contains the values 1, 2, 3 which are interpreted as a, b, and c respectively by the test. 2. The 1 is mutated to a 4 which causes the test to request an extra value interpreted as d. The input is now 4, 2, 3, 5 (new value) which the test corresponds to a, d, b, c; however, b and c no longer correspond to their original values. Uids contain a hash component and type component. The hash component is currently determined in `Smith` by taking a hash of the calling `@returnAddress()` or via an argument in the corresponding `WithHash` functions. The type component is used extensively in libfuzzer with its hashmaps. --- Mutations At the start of a cycle (a run), a random number of values to mutate is selected with less being exponentially more likely. The indexes of the values are selected from a selected uid with a logarithmic bias to uids with more values. Mutations may change a single values, several consecutive values in a uid, or several consecutive values in the uid-independent order they were requested. They may generate random values, mutate from previous ones, or copy from other values in the same uid from the same input or spliced from another. For integers, mutations from previous ones currently only generates random values. For bytes, mutations from previous mix new random data and previous bytes with a set number of mutations. --- Passive Minimization A different approach has been taken for minimizing inputs: instead of trying a fixed set of mutations when a fresh input is found, the input is instead simply added to the corpus and removed when it is no longer valuable. The quality of an input is measured based off how many unique pcs it hit and how many values it needed from the fuzzer. It is tracked which inputs hold the best qualities for each pc for hitting the minimum and maximum unique pcs while needing the least values. Once all an input's qualities have been superseded for the pcs it hit, it is removed from the corpus. -- Comparison to byte-based smith A byte-based smith would be much more inefficient and complex than this solution. It would be unable to solve the shifting problem that Uids do. It is unable to provide values from the fuzzer past end-of-stream. Even with feedback, it would be unable to act on dynamic weights which have proven essential with the updated tests (e.g. to constrain values to a range). -- Test updates All the standard library tests have been updated to use the new smith interface. For `Deque`, an ad hoc allocator was written to improve performance and remove reliance on heap allocation. `TokenSmith` has been added to aid in testing Ast and help inform decisions on the smith interface.
dd9d383 to
6373532
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reworks fuzz testing to be smith-based.
Closes #25281
Closes #20816
On the standard library side:
The
input: []const u8parameter of functions passed totesting.fuzzhas changed tosmith: *testing.Smith.Smithis used to generate values from libfuzzer or input bytes generated by libfuzzer.Smithcontains the following base methods:valueas a generic method for generating any typeeosfor generating end-of-stream markers. Provides the additional guaranteetruewill eventually by provided.bytesfor filling a byte array.slicefor filling part of a buffer and providing the length.Smith.Weightis used for giving value ranges a higher probability of being selected. By default, every value has a weight of zero (i.e. they will not be selected). Weights can only apply to values that fit within a u64. The above functions have corresponding ones that accept weights. Additionally, the following functions are provided:baselineWeightswhich provides a set of weights containing everypossible value of a type.
eosSimpleWeightedfor unique weights fortrueandfalsevalueRangeAtMostandvalueRangeLessThanfor weighing only a rangeof values.
On the libfuzzer and abi side:
Uids
These are u32s which are used to classify requested values. This solves the problem of a mutation causing a new value to be requested and shifting all future values; for example:
Uids contain a hash component and type component. The hash component is currently determined in
Smithby taking a hash of the calling@returnAddress()or via an argument in the correspondingWithHashfunctions. The type component is used extensively in libfuzzer with its hashmaps.Mutations
At the start of a cycle (a run), a random number of values to mutate is selected with less being exponentially more likely. The indexes of the values are selected from a selected uid with a logarithmic bias to uids with more values.
Mutations may change a single values, several consecutive values in a uid, or several consecutive values in the uid-independent order they were requested. They may generate random values, mutate from previous ones, or copy from other values in the same uid from the same input or spliced from another.
For integers, mutations from previous ones currently only generates random values. For bytes, mutations from previous mix new random data and previous bytes with a set number of mutations.
Passive Minimization
A different approach has been taken for minimizing inputs: instead of trying a fixed set of mutations when a fresh input is found, the input is instead simply added to the corpus and removed when it is no longer valuable.
The quality of an input is measured based off how many unique pcs it hit and how many values it needed from the fuzzer. It is tracked which inputs hold the best qualities for each pc for hitting the minimum and maximum unique pcs while needing the least values.
Once all an input's qualities have been superseded for the pcs it hit, it is removed from the corpus.
Comparison to byte-based smith
A byte-based smith would be much more inefficient and complex than this solution. It would be unable to solve the shifting problem that Uids do. It is unable to provide values from the fuzzer past end-of-stream. Even with feedback, it would be unable to act on dynamic weights which have proven essential with the updated tests (e.g. to constrain values to a range).
Test updates
All the standard library tests have been updated to use the new smith interface. For
Deque, an ad hoc allocator was written to improve performance and remove reliance on heap allocation.TokenSmithhas been added to aid in testing Ast and help inform decisions on the smith interface.