Skip to content

Conversation

@gooncreeper
Copy link
Contributor

Reworks fuzz testing to be smith-based.

Closes #25281
Closes #20816

On the standard library side:

The input: []const u8 parameter of functions passed to testing.fuzz has changed to smith: *testing.Smith. Smith is used to generate values from libfuzzer or input bytes generated by libfuzzer.

Smith contains the following base methods:

  • value as a generic method for generating any type
  • eos for generating end-of-stream markers. Provides the additional guarantee true will eventually by provided.
  • bytes for filling a byte array.
  • slice for filling part of a buffer and providing the length.

Smith.Weight is used for giving value ranges a higher probability of being selected. By default, every value has a weight of zero (i.e. they will not be selected). Weights can only apply to values that fit within a u64. The above functions have corresponding ones that accept weights. Additionally, the following functions are provided:

  • baselineWeights which provides a set of weights containing every
    possible value of a type.
  • eosSimpleWeighted for unique weights for true and false
  • valueRangeAtMost and valueRangeLessThan for weighing only a range
    of values.

On the libfuzzer and abi side:

Uids

These are u32s which are used to classify requested values. This solves the problem of a mutation causing a new value to be requested and shifting all future values; for example:

  1. An initial input contains the values 1, 2, 3 which are interpreted as a, b, and c respectively by the test.
  2. The 1 is mutated to a 4 which causes the test to request an extra value interpreted as d. The input is now 4, 2, 3, 5 (new value) which the test corresponds to a, d, b, c; however, b and c no longer correspond to their original values.

Uids contain a hash component and type component. The hash component is currently determined in Smith by taking a hash of the calling @returnAddress() or via an argument in the corresponding WithHash functions. The type component is used extensively in libfuzzer with its hashmaps.

Mutations

At the start of a cycle (a run), a random number of values to mutate is selected with less being exponentially more likely. The indexes of the values are selected from a selected uid with a logarithmic bias to uids with more values.

Mutations may change a single values, several consecutive values in a uid, or several consecutive values in the uid-independent order they were requested. They may generate random values, mutate from previous ones, or copy from other values in the same uid from the same input or spliced from another.

For integers, mutations from previous ones currently only generates random values. For bytes, mutations from previous mix new random data and previous bytes with a set number of mutations.

Passive Minimization

A different approach has been taken for minimizing inputs: instead of trying a fixed set of mutations when a fresh input is found, the input is instead simply added to the corpus and removed when it is no longer valuable.

The quality of an input is measured based off how many unique pcs it hit and how many values it needed from the fuzzer. It is tracked which inputs hold the best qualities for each pc for hitting the minimum and maximum unique pcs while needing the least values.

Once all an input's qualities have been superseded for the pcs it hit, it is removed from the corpus.

Comparison to byte-based smith

A byte-based smith would be much more inefficient and complex than this solution. It would be unable to solve the shifting problem that Uids do. It is unable to provide values from the fuzzer past end-of-stream. Even with feedback, it would be unable to act on dynamic weights which have proven essential with the updated tests (e.g. to constrain values to a range).

Test updates

All the standard library tests have been updated to use the new smith interface. For Deque, an ad hoc allocator was written to improve performance and remove reliance on heap allocation. TokenSmith has been added to aid in testing Ast and help inform decisions on the smith interface.

Obtaining it is expensive and usually redundent in the case of no leaks.

Notably, between each unit test run the test runner resets the debug
allocator's state, causing this to slow it down by a notable margin.
The end of the archive needs to also be aligned to a two-byte boundary,
not just the start of records. This was causing lld to reject archives.

Notably, this was happening with compiler_rt when rebuilding in fuzz
mode, which is why this commit is included in this patchset.
@gooncreeper gooncreeper force-pushed the integrated-smith branch 3 times, most recently from 7325b28 to dd9d383 Compare November 12, 2025 00:17
-- On the standard library side:

The `input: []const u8` parameter of functions passed to `testing.fuzz`
has changed to `smith: *testing.Smith`. `Smith` is used to generate
values from libfuzzer or input bytes generated by libfuzzer.

`Smith` contains the following base methods:
* `value` as a generic method for generating any type
* `eos` for generating end-of-stream markers. Provides the additional
  guarantee `true` will eventually by provided.
* `bytes` for filling a byte array.
* `slice` for filling part of a buffer and providing the length.

`Smith.Weight` is used for giving value ranges a higher probability of
being selected. By default, every value has a weight of zero (i.e. they
will not be selected). Weights can only apply to values that fit within
a u64. The above functions have corresponding ones that accept weights.
Additionally, the following functions are provided:
* `baselineWeights` which provides a set of weights containing every
  possible value of a type.
* `eosSimpleWeighted` for unique weights for `true` and `false`
* `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range
  of values.

-- On the libfuzzer and abi side:

--- Uids

These are u32s which are used to classify requested values. This solves
the problem of a mutation causing a new value to be requested and
shifting all future values; for example:

1. An initial input contains the values 1, 2, 3 which are interpreted
as a, b, and c respectively by the test.

2. The 1 is mutated to a 4 which causes the test to request an extra
value interpreted as d. The input is now 4, 2, 3, 5 (new value) which
the test corresponds to a, d, b, c; however, b and c no longer
correspond to their original values.

Uids contain a hash component and type component. The hash component
is currently determined in `Smith` by taking a hash of the calling
`@returnAddress()` or via an argument in the corresponding `WithHash`
functions. The type component is used extensively in libfuzzer with its
hashmaps.

--- Mutations

At the start of a cycle (a run), a random number of values to mutate is
selected with less being exponentially more likely. The indexes of the
values are selected from a selected uid with a logarithmic bias to uids
with more values.

Mutations may change a single values, several consecutive values in a
uid, or several consecutive values in the uid-independent order they
were requested. They may generate random values, mutate from previous
ones, or copy from other values in the same uid from the same input or
spliced from another.

For integers, mutations from previous ones currently only generates
random values. For bytes, mutations from previous mix new random data
and previous bytes with a set number of mutations.

--- Passive Minimization

A different approach has been taken for minimizing inputs: instead of
trying a fixed set of mutations when a fresh input is found, the input
is instead simply added to the corpus and removed when it is no longer
valuable.

The quality of an input is measured based off how many unique pcs it
hit and how many values it needed from the fuzzer. It is tracked which
inputs hold the best qualities for each pc for hitting the minimum and
maximum unique pcs while needing the least values.

Once all an input's qualities have been superseded for the pcs it hit,
it is removed from the corpus.

-- Comparison to byte-based smith

A byte-based smith would be much more inefficient and complex than this
solution. It would be unable to solve the shifting problem that Uids
do. It is unable to provide values from the fuzzer past end-of-stream.
Even with feedback, it would be unable to act on dynamic weights which
have proven essential with the updated tests (e.g. to constrain values
to a range).

-- Test updates

All the standard library tests have been updated to use the new smith
interface. For `Deque`, an ad hoc allocator was written to improve
performance and remove reliance on heap allocation. `TokenSmith` has
been added to aid in testing Ast and help inform decisions on the smith
interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fuzzing smith API plus feedback std.testing.fuzzInput: introduce a length range option

3 participants