-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Clarify MaybeUninit docs #136689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Clarify MaybeUninit docs #136689
Conversation
This comment has been minimized.
This comment has been minimized.
@hkBst could you talk a bit about what issue with the docs you are trying to fix here? Cc @rust-lang/opsem in case someone has an opinion. I don't have a lot of time rn unfortunately. |
@RalfJung that seems fair. Issue #66845 talks about misunderstanding So then we have "uninitialized Then I tried to write an intro that tries to make the rules around drop seem less magical by putting more emphasis on the fact that a I believe the rest of the changes are mostly copy-editing based on the above. This PR is not time sensitive, so there is no rush. |
The type of To me this feels like rewording for the sake of rewording, I don't see a clear improvement that would justify the effort of making sure that every single sentence doesn't potentially imply something we don't want it to imply. These docs are hard to get right, they shouldn't be rewrittien gratuitously.
Indeed, the very first chunk of the diff is already not an improvement IMO. I appreciate your attempt to improve the docs, but currently I don't quite see which issue in the existing overarching docs this fixes -- an issue with the |
Also, apparently the plan is to remove |
Couldn't help myself while fixing rust-lang#66845.
I've restored the summary line. |
Thanks, that doesn't resolve any of the other concerns I mentioned though. :) |
I think that "Creates a new array of uninitialized In addition to that, I think the following does much work to dispell some magic surrounding this type and drop: |
I can agree to a high-level paragraph being added before "Initialization invariant". What about the rest of this PR?
Yeah as I said I'm fine improving the docs of that function. Or, well, I'd suggest first waiting for the outcome of #134585 as it seems likely the function will be removed. It was added because we didn't have |
I've read over most of this PR and I agree. I can't find an improvement that justifies the churn and review effort. |
@@ -308,7 +312,7 @@ impl<T> MaybeUninit<T> { | |||
MaybeUninit { value: ManuallyDrop::new(val) } | |||
} | |||
|
|||
/// Creates a new `MaybeUninit<T>` in an uninitialized state. | |||
/// Creates a new uninitialized `MaybeUninit<T>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you agree with this change and others like it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see nothing fundamentally wrong with the old wording. 🤷
I'm not saying every line of your diff is wrong, but it's easier for me to rewrite the docs myself than to sieve out the clear-improvement changes in your PR from the ones that make things less clear or that just use different words to say the same thing. If you have some core changes here you think are most important, please reduce the PR to just those so that the diff becomes obvious.
@@ -367,8 +371,7 @@ impl<T> MaybeUninit<T> { | |||
[const { MaybeUninit::uninit() }; N] | |||
} | |||
|
|||
/// Creates a new `MaybeUninit<T>` in an uninitialized state, with the memory being | |||
/// filled with `0` bytes. It depends on `T` whether that already makes for | |||
/// Creates a new zero-filled `MaybeUninit<T>`. It depends on `T` whether that already makes for | |||
/// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old text is even trying to claim an uninitialized state, even though it depends on T
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah I could see how this could be confusing.
@@ -545,7 +545,7 @@ impl<T> MaybeUninit<T> { | |||
/// ``` | |||
/// | |||
/// (Notice that the rules around references to uninitialized data are not finalized yet, but | |||
/// until they are, it is advisable to avoid them.) | |||
/// until they are, it is advisable to avoid references to uninitialized data.) | |||
#[stable(feature = "maybe_uninit", since = "1.36.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "them" is ambiguous, as it could refer to "references", or to "unintialized data" or "references to uninitialized data" or even to "the rules"... Probably needs more rewording to avoid the repetition though.
@@ -25,7 +29,7 @@ use crate::{fmt, intrinsics, ptr, slice}; | |||
/// This is exploited by the compiler for various optimizations, such as eliding | |||
/// run-time checks and optimizing `enum` layout. | |||
/// | |||
/// Similarly, entirely uninitialized memory may have any content, while a `bool` must | |||
/// Similarly, entirely uninitialized memory may have any value, while a `bool` must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"value" has a specific technical meaning in Rust semantics, and I don't see how this is the right use of the term. What's wrong with "content"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does a bool have to do with uninitialzed memory if a bool has values, but memory has content? I think it makes more sense if they both have the same things, since memory consists of bits, or bytes, or words which all also have values. "content" is perhaps not wrong, but it is suboptimal IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory doesn't contain values, memory contains bytes. And no, bytes don't have a "value". Bytes can represent a value at a given type, but not all sequences of bytes represent a value at each type, and different sequences of bytes can represent the same value at the same type.
@@ -726,16 +729,16 @@ impl<T> MaybeUninit<T> { | |||
} | |||
} | |||
|
|||
/// Drops the contained value in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another example of a magic phrase: "drop in place". Does that mean to drop it without moving it first, or is it equivalent to just "drop". When does the difference (if there is one) matter? I honestly don't know the answer, which I why I did not yet propose a change for that part, but it could use some clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean to drop it without moving it first
Yes, what else should it mean?
See https://doc.rust-lang.org/nightly/std/ptr/fn.drop_in_place.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should refer to that function then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the other PR.
/// (unchanged unless written to) value . Reading the same uninitialized byte | ||
/// multiple times can give different results. This even makes it undefined | ||
/// behavior to have uninitialized data in a variable of integer type, | ||
/// which otherwise could hold any *fixed* bit pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, the old paragraph here is entirely fine, whereas the new version is unnecessarily terse while also using the term "value" in questionable ways.
I think you are underestimating how much work it is to review changes to natural-language technical writing that is highly sensitive to its exact wording. Every single change you make needs to be clearly motivated or it's just not worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is definitely more opinion-based, and I definitely like to remove extraneous words, as it improves the signal-to-noise ratio. I believe this is useful when you want to slowly read something a few times, when trying to understand it. But I'll happily retract this one.
I'm not quite sure how to respond to how much work reviewing changes is. Yes, it is a lot of work. So is polishing existing text, especially when having to justify each change. I still think there is value in doing it, as there can be a lot of papercuts hiding in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely like to remove extraneous words, as it improves the signal-to-noise ratio
Sorry, no, please don't. Just because you prefer super terse text where every single word counts, doesn't mean that everyone prefers that. You are likely making this text less readable for others with your proposed changes, so in 6 months we'll get the next PR by someone arguing for a slightly different style.
There's value in fixing papercuts, yes. There is no value in rewording based on subjective ideas about which way of saying the same thing is more clear, that's just useless churn -- and it will be endless churn as there'll always be the next PR arguing for a different wording.
Therefore, I'll only spend my time on changes that are clearly motivated by objective criteria. And I don't have time to tease out every such justification from you one-by-one, I need you to frontload the explanations.
/// initialized causes undefined behavior. | ||
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized, | ||
/// before dropping the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value | ||
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is what you both mean by churn, but is an attempt at making the phrasing more uniform:
- initialization: initialized vs fully initialized vs properly initialized
- UB: causes UB vs causes immediate UB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See, that's a justification. Why did I have to argue for days before you told me the reasons for this change? I can't read your mind and just know that this is the point of this diff!
But if we do want to treat "initialized" as a technical term here, then we should properly define it in the type-level docs first. Also, I see no reason to have a qualification -- if we define the term, there's no reason to repeat "fully" everywhere.
Such a change would ideally be done separately from other changes, at least in a separate commit, so that one can tell apart changes due to this from other changes.
Regarding UB vs immediate UB, that's something we do all over the libs docs, so there's no point in trying to be consistent within this one file. So please don't touch that for now; if you do feel like making this consistent across the standard library, file an issue to discuss with the libs-api team which wording they prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See, that's a justification. Why did I have to argue for days before you told me the reasons for this change? I can't read your mind and just know that this is the point of this diff!
I guess this was just a minor change in my mind, and it was also not clear to me which parts you were objecting to for which reasons. Mind reading is definitely a useful skill, but I can't say I excel at it either.
But if we do want to treat "initialized" as a technical term here, then we should properly define it in the type-level docs first. Also, I see no reason to have a qualification -- if we define the term, there's no reason to repeat "fully" everywhere.
I don't disagree, and I'm also not sure if there is any merit to qualifying "initialized". It makes me think of a MaybeUninit<(u8, u8)>
, initializing one of the u8
and then transmuting to (u8, MaybeUninit<u8>)
. You can't say the (u8, u8)
was (fully) initialized, but it was initialized enough that...
Such a change would ideally be done separately from other changes, at least in a separate commit, so that one can tell apart changes due to this from other changes.
Fair enough.
Regarding UB vs immediate UB, that's something we do all over the libs docs, so there's no point in trying to be consistent within this one file. So please don't touch that for now; if you do feel like making this consistent across the standard library, file an issue to discuss with the libs-api team which wording they prefer.
Thanks for the clarification. I'll keep it under consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point that makes this more tricky is that we don't have a stable way to refer to "validity" vs "safety" (or "language" vs "library") invariants yet; see rust-lang/unsafe-code-guidelines#539.
Most of the docs here seem to use "initialized" as referring to validity invariants, not safety invariants.
/// is in an initialized state. | ||
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized, | ||
/// before getting a reference to the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value | ||
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one, for example, says the same things, but in a different way...
/// initialize a `MaybeUninit`. | ||
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized, | ||
/// before getting a mutable reference to the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value | ||
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a third way...
Should this be closed in favor of #138005? |
I've turned this into a draft for now, but would prefer it not be closed yet. |
Couldn't help myself while fixing #66845.