Sampling distribution documentation and tailoring #221

Dietr1ch · 2025-04-25T04:25:22Z

I played with this library this afternoon and noticed that there's a bias towards edge values like 0, ?::MAX ?::MIN and I get that they work wonders in fuzzing, but I ran into the problem that the bias ended up producing simple test cases.

I was generating a sequence of push(value)/pop operations on a heap. My approach could be simplified to,

#[derive(Arbitrary)]
struct OperationBatch {
  seed: u64,  // Fixes the push/pop sequence. (I biased towards pushing small batches)
  numbers_to_push: Vec<u16>,
}

The bias resulted in my operation sequences using mostly the same numbers on the heap, which doesn't stress the heap too much.

I ended up implementing OperationBatch::from_seed(u64) and customising the number distribution, but would appreciate documentation around default distributions and mention to helpers to tailor the distribution of values when the defaults are a bad fit.

At least from the docs around output distributions it wasn't clear to me that there's this bias nor how sharp it is. I feel that I'd have had an easier time if I ran into arbitrary_len docs, but from the README I initially thought that sprinkling a few attributes would be all I needed.

Maybe I just ran out of entropy because of a poor size_hint, but I'd expect errors instead of silently generating bad samples (Looking around might be related to #219 (comment)). Also it seems that there's a missing set of attributes to specify collection sizes that uses arbitrary_len underneath.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling distribution documentation and tailoring #221

Sampling distribution documentation and tailoring #221

Dietr1ch commented Apr 25, 2025

Sampling distribution documentation and tailoring #221

Sampling distribution documentation and tailoring #221

Comments

Dietr1ch commented Apr 25, 2025