Skip to content

Clarify MaybeUninit docs #136689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Clarify MaybeUninit docs #136689

wants to merge 1 commit into from

Conversation

hkBst
Copy link
Member

@hkBst hkBst commented Feb 7, 2025

Couldn't help myself while fixing #66845.

@rustbot
Copy link
Collaborator

rustbot commented Feb 7, 2025

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 7, 2025
@rust-log-analyzer

This comment has been minimized.

@thomcc
Copy link
Member

thomcc commented Feb 19, 2025

I don't feel strongly about these changes but they should probably be reviewed by @RalfJung, who I believe wrote these docs in the first place.

r? @RalfJung

@rustbot rustbot assigned RalfJung and unassigned thomcc Feb 19, 2025
@RalfJung
Copy link
Member

@hkBst could you talk a bit about what issue with the docs you are trying to fix here?

Cc @rust-lang/opsem in case someone has an opinion. I don't have a lot of time rn unfortunately.

@hkBst
Copy link
Member Author

hkBst commented Feb 19, 2025

@RalfJung that seems fair.

Issue #66845 talks about misunderstanding MaybeUninit::uninit_array() to return MaybeUninit<[MaybeUninit<T>; LEN]> because of "Create a new array of MaybeUninit items, in an uninitialized state.". I think because "a new array of MaybeUninit items" is already [MaybeUninit<T>; LEN] and then "in an uninitialized state" could apply to either the elements or the array. This is why the proposed version says "Creates a new array of uninitialized MaybeUninit<T> elements." to make it clear what "uninitialized" refers to. Then I've tried to remove "in an uninitialized state" everywhere to prevent the same confusion.

So then we have "uninitialized MaybeUninit<T>" and "initialized MaybeUninit<T>" and so it seemed slightly awkward to me to talk about the MaybeUninit<T> containing a value and about its content, and better to talk about it being a wrapper for a value and its value. The original first line even is "A wrapper type to construct uninitialized instances of T." (And I realize that here I have perhaps sinned against the rule to have a short one-sentence summary at the top.)

Then I tried to write an intro that tries to make the rules around drop seem less magical by putting more emphasis on the fact that a MaybeUninit<T> is just a type-level construct for a T which may or may not be initialized. This would hopefully also make it more intuitive that there is little value in a MaybeUninit<[MaybeUninit<T>; LEN]>.

I believe the rest of the changes are mostly copy-editing based on the above.

This PR is not time sensitive, so there is no rush.

@RalfJung
Copy link
Member

RalfJung commented Feb 21, 2025

The type of uninit_array is given right there in the docs, so I don't quite follow why that would trigger a half-rewrite of the docs in that module.^^ And I don't understand how you want from "the docs of that function are implicitly talking about the state of N elements in an array, rather than the state of the array" to "we should get rid of the terminology 'uninitialized state'". It is pretty standard in English to say "a list of X, having property Y" in a case where Y is a property of the elements X, nor a property of the list. If this needs disambiguation, adding a single 'each' would fix that: "a list of X, each having property Y".

To me this feels like rewording for the sake of rewording, I don't see a clear improvement that would justify the effort of making sure that every single sentence doesn't potentially imply something we don't want it to imply. These docs are hard to get right, they shouldn't be rewrittien gratuitously.

(And I realize that here I have perhaps sinned against the rule to have a short one-sentence summary at the top.)

Indeed, the very first chunk of the diff is already not an improvement IMO.

I appreciate your attempt to improve the docs, but currently I don't quite see which issue in the existing overarching docs this fixes -- an issue with the uninit_array docs should be fixed locally there, making this a lot easier to review. :D And if it doesn't fix anything, just makes it different, then I'm afraid I don't think it is worth it.

@RalfJung
Copy link
Member

Also, apparently the plan is to remove uninit_array anyway: #134585.

Couldn't help myself while fixing rust-lang#66845.
@hkBst
Copy link
Member Author

hkBst commented Feb 21, 2025

(And I realize that here I have perhaps sinned against the rule to have a short one-sentence summary at the top.)

Indeed, the very first chunk of the diff is already not an improvement IMO.

I've restored the summary line.

@RalfJung
Copy link
Member

RalfJung commented Feb 21, 2025

Thanks, that doesn't resolve any of the other concerns I mentioned though. :)

@hkBst
Copy link
Member Author

hkBst commented Feb 21, 2025

I think that "Creates a new array of uninitialized MaybeUninit<T> elements." is just simpler than "Creates a new array of MaybeUninit<T> elements, each in an uninitialized state.", and thus will be more readily understood by more people in less time. But even if you disagree there, then surely it is at least as simple as the latter, no? So at the least you are using fewer words to say the same thing.

In addition to that, I think the following does much work to dispell some magic surrounding this type and drop:
/// A MaybeUninit<T> is like a T, but without the requirement that it is properly initialized as a T.
/// Dropping a MaybeUninit<T> does nothing, even if properly initialized as a T, because
/// the compiler relies on the type system to decide how to drop variables. Thus, if a MaybeUninit<T>
/// should be dropped like a T, it should be converted to a T with assume_init or similar.

@RalfJung
Copy link
Member

I can agree to a high-level paragraph being added before "Initialization invariant". What about the rest of this PR?

I think that "Creates a new array of uninitialized MaybeUninit elements." is just simpler than "Creates a new array of MaybeUninit elements, each in an uninitialized state.", and thus will be more readily understood by more people in less time. But even if you disagree there, then surely it is at least as simple as the latter, no? So at the least you are using fewer words to say the same thing.

Yeah as I said I'm fine improving the docs of that function. Or, well, I'd suggest first waiting for the outcome of #134585 as it seems likely the function will be removed. It was added because we didn't have const blocks yet so there was no other easy way to crate a [MaybeUninit<T>; N] for a non-Copy T.

@saethlin
Copy link
Member

To me this feels like rewording for the sake of rewording, I don't see a clear improvement that would justify the effort of making sure that every single sentence doesn't potentially imply something we don't want it to imply. These docs are hard to get right, they shouldn't be rewrittien gratuitously.

I've read over most of this PR and I agree. I can't find an improvement that justifies the churn and review effort.

@@ -308,7 +312,7 @@ impl<T> MaybeUninit<T> {
MaybeUninit { value: ManuallyDrop::new(val) }
}

/// Creates a new `MaybeUninit<T>` in an uninitialized state.
/// Creates a new uninitialized `MaybeUninit<T>`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you agree with this change and others like it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing fundamentally wrong with the old wording. 🤷

I'm not saying every line of your diff is wrong, but it's easier for me to rewrite the docs myself than to sieve out the clear-improvement changes in your PR from the ones that make things less clear or that just use different words to say the same thing. If you have some core changes here you think are most important, please reduce the PR to just those so that the diff becomes obvious.

@@ -367,8 +371,7 @@ impl<T> MaybeUninit<T> {
[const { MaybeUninit::uninit() }; N]
}

/// Creates a new `MaybeUninit<T>` in an uninitialized state, with the memory being
/// filled with `0` bytes. It depends on `T` whether that already makes for
/// Creates a new zero-filled `MaybeUninit<T>`. It depends on `T` whether that already makes for
/// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old text is even trying to claim an uninitialized state, even though it depends on T.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah I could see how this could be confusing.

@@ -545,7 +545,7 @@ impl<T> MaybeUninit<T> {
/// ```
///
/// (Notice that the rules around references to uninitialized data are not finalized yet, but
/// until they are, it is advisable to avoid them.)
/// until they are, it is advisable to avoid references to uninitialized data.)
#[stable(feature = "maybe_uninit", since = "1.36.0")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "them" is ambiguous, as it could refer to "references", or to "unintialized data" or "references to uninitialized data" or even to "the rules"... Probably needs more rewording to avoid the repetition though.

@@ -25,7 +29,7 @@ use crate::{fmt, intrinsics, ptr, slice};
/// This is exploited by the compiler for various optimizations, such as eliding
/// run-time checks and optimizing `enum` layout.
///
/// Similarly, entirely uninitialized memory may have any content, while a `bool` must
/// Similarly, entirely uninitialized memory may have any value, while a `bool` must
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"value" has a specific technical meaning in Rust semantics, and I don't see how this is the right use of the term. What's wrong with "content"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a bool have to do with uninitialzed memory if a bool has values, but memory has content? I think it makes more sense if they both have the same things, since memory consists of bits, or bytes, or words which all also have values. "content" is perhaps not wrong, but it is suboptimal IMO.

Copy link
Member

@RalfJung RalfJung Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory doesn't contain values, memory contains bytes. And no, bytes don't have a "value". Bytes can represent a value at a given type, but not all sequences of bytes represent a value at each type, and different sequences of bytes can represent the same value at the same type.

@@ -726,16 +729,16 @@ impl<T> MaybeUninit<T> {
}
}

/// Drops the contained value in place.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another example of a magic phrase: "drop in place". Does that mean to drop it without moving it first, or is it equivalent to just "drop". When does the difference (if there is one) matter? I honestly don't know the answer, which I why I did not yet propose a change for that part, but it could use some clarification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean to drop it without moving it first

Yes, what else should it mean?
See https://doc.rust-lang.org/nightly/std/ptr/fn.drop_in_place.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should refer to that function then...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe it should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the other PR.

/// (unchanged unless written to) value . Reading the same uninitialized byte
/// multiple times can give different results. This even makes it undefined
/// behavior to have uninitialized data in a variable of integer type,
/// which otherwise could hold any *fixed* bit pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, the old paragraph here is entirely fine, whereas the new version is unnecessarily terse while also using the term "value" in questionable ways.

I think you are underestimating how much work it is to review changes to natural-language technical writing that is highly sensitive to its exact wording. Every single change you make needs to be clearly motivated or it's just not worth the effort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is definitely more opinion-based, and I definitely like to remove extraneous words, as it improves the signal-to-noise ratio. I believe this is useful when you want to slowly read something a few times, when trying to understand it. But I'll happily retract this one.

I'm not quite sure how to respond to how much work reviewing changes is. Yes, it is a lot of work. So is polishing existing text, especially when having to justify each change. I still think there is value in doing it, as there can be a lot of papercuts hiding in documentation.

Copy link
Member

@RalfJung RalfJung Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like to remove extraneous words, as it improves the signal-to-noise ratio

Sorry, no, please don't. Just because you prefer super terse text where every single word counts, doesn't mean that everyone prefers that. You are likely making this text less readable for others with your proposed changes, so in 6 months we'll get the next PR by someone arguing for a slightly different style.

There's value in fixing papercuts, yes. There is no value in rewording based on subjective ideas about which way of saying the same thing is more clear, that's just useless churn -- and it will be endless churn as there'll always be the next PR arguing for a different wording.

Therefore, I'll only spend my time on changes that are clearly motivated by objective criteria. And I don't have time to tease out every such justification from you one-by-one, I need you to frontload the explanations.

/// initialized causes undefined behavior.
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized,
/// before dropping the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior.
///
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is what you both mean by churn, but is an attempt at making the phrasing more uniform:

  • initialization: initialized vs fully initialized vs properly initialized
  • UB: causes UB vs causes immediate UB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See, that's a justification. Why did I have to argue for days before you told me the reasons for this change? I can't read your mind and just know that this is the point of this diff!

But if we do want to treat "initialized" as a technical term here, then we should properly define it in the type-level docs first. Also, I see no reason to have a qualification -- if we define the term, there's no reason to repeat "fully" everywhere.

Such a change would ideally be done separately from other changes, at least in a separate commit, so that one can tell apart changes due to this from other changes.

Regarding UB vs immediate UB, that's something we do all over the libs docs, so there's no point in trying to be consistent within this one file. So please don't touch that for now; if you do feel like making this consistent across the standard library, file an issue to discuss with the libs-api team which wording they prefer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See, that's a justification. Why did I have to argue for days before you told me the reasons for this change? I can't read your mind and just know that this is the point of this diff!

I guess this was just a minor change in my mind, and it was also not clear to me which parts you were objecting to for which reasons. Mind reading is definitely a useful skill, but I can't say I excel at it either.

But if we do want to treat "initialized" as a technical term here, then we should properly define it in the type-level docs first. Also, I see no reason to have a qualification -- if we define the term, there's no reason to repeat "fully" everywhere.

I don't disagree, and I'm also not sure if there is any merit to qualifying "initialized". It makes me think of a MaybeUninit<(u8, u8)>, initializing one of the u8 and then transmuting to (u8, MaybeUninit<u8>). You can't say the (u8, u8) was (fully) initialized, but it was initialized enough that...

Such a change would ideally be done separately from other changes, at least in a separate commit, so that one can tell apart changes due to this from other changes.

Fair enough.

Regarding UB vs immediate UB, that's something we do all over the libs docs, so there's no point in trying to be consistent within this one file. So please don't touch that for now; if you do feel like making this consistent across the standard library, file an issue to discuss with the libs-api team which wording they prefer.

Thanks for the clarification. I'll keep it under consideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point that makes this more tricky is that we don't have a stable way to refer to "validity" vs "safety" (or "language" vs "library") invariants yet; see rust-lang/unsafe-code-guidelines#539.

Most of the docs here seem to use "initialized" as referring to validity invariants, not safety invariants.

/// is in an initialized state.
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized,
/// before getting a reference to the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior.
///
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one, for example, says the same things, but in a different way...

/// initialize a `MaybeUninit`.
/// It is up to the caller to ensure that the `MaybeUninit<T>` has been fully initialized,
/// before getting a mutable reference to the `T` value of the `MaybeUninit<T>`. Calling this when the `T` value
/// of the `MaybeUninit<T>` is not yet fully initialized causes immediate undefined behavior.
///
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a third way...

@RalfJung
Copy link
Member

RalfJung commented Mar 5, 2025

Should this be closed in favor of #138005?

@hkBst hkBst marked this pull request as draft March 5, 2025 13:33
@hkBst
Copy link
Member Author

hkBst commented Mar 5, 2025

Should this be closed in favor of #138005?

I've turned this into a draft for now, but would prefer it not be closed yet.

@RalfJung
Copy link
Member

@rustbot author

(see here for latest status from my side)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants