-
Notifications
You must be signed in to change notification settings - Fork 83
int_in_range (and others) reports success even when running out of entropy #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is intended behavior. We have found empirically that it is better (in terms of fuzzing coverage and efficiency) to produce dummy values when we run out of data than to return errors. Intuitively, this helps the fuzzer avoid exploring all error paths in the That said, improvements towards uniformity (especially when we partial bytes from the underlying data) are very welcome. As are documentation improvements. The reason it returns an error is to avoid API breaking changes. If you do not want this behavior, you can check the length of the data. |
Hi, thanks for the clarification. I'm currently having a look at adjusting If API compatibility is the only reason for it to return an error, would it be worth putting a change to remove that under #217 to better reflect the semantics? Alternatively, the various methods in In terms of documentation, #184 appears to have been ready to go apart from a clippy lint that's since been addressed in the mainline. Does anything still need addressed on that? |
I added the last checkbox in that OP with the intention for cleaning this stuff up. Does that item not align with what you are thinking? |
Sorry, that was my mistake. That's exactly what I meant. |
Firstly, I feel like I'm missing something here because this is such a fundamental piece of functionality, it's implausible to me this problem isn't described anywhere (I couldn't see anything about in the docs - the only reference I've found is in a unit test buried in #184), but I'm raising it because it nonetheless caused problems for me.
In
Unstructured
, a callint_in_range(n..=m)
consumes some bytes of the seed data to construct an index between 0 and the smallest power of 256 larger than(m-n)
. It does this by simply concatenating the bytes together, before mod-reducing and shifting this value into the desired output range. As mentioned in #184, this leads to biased results and inefficient use of the entropy ifm-n
is not a power of 256, but it will nonetheless work if the data is available. However, if there aren't enough bytes available,int_in_range
does not return anErr
- instead, if it can't get a next byte, it stops constructing the large index value and uses it as is in the rest of the calculation, returning anOk
. IOW, if we have less than the required number of bytes left, we interpret whatever's left as a single integer (which will be by definition less than(m-n)
, possibly as little as 1/256th of that if the range is a very large one) and use that for the reduction and shifting. In the particular case that there's no data left at all, we "successfully" return exactlyn
.While I appreciate that this is technically compliant with the documentation, it's extremely counterintuitive and I can't tell if it's intended behavior or not. If you enable
clippy::pedantic
and configureavoid-breaking-exported-api = false
, it will indeed flagint_in_range
returning aResult
as redundant because it cannot fail. If this isn't intended, can it be fixed to returnNotEnoughData
appropriately? (It also maybe looks like it should returnEmptyChoose
for an empty range, instead of panicking as it does currently)Alternatively, if this is the intended behavior, perhaps I've misunderstood and none of
Unstructured
should be expected to be checking for this case, because only one method,bytes()
, actually does sometimes returnNotEnoughData
. I assume this is because it returns a reference to the underlying data in order to facilitate&[u8]: Arbitrary
, and so can't invent a value out of thin air like the other methods. If this is the idea, I'd appreciate if it could be included in the documentation somewhere that none of the methods are intended to be checking for a lack of data except by necessity.FWIW I'm happy to submit a PR to help fix this, or do any work needed to push #184 over the line if this is an issue of documentation, I'd just need someone to clarify the intended design.
The text was updated successfully, but these errors were encountered: