Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we / can we make MaybeUninit<T> always preserve all bytes of T (including padding)? #518

Open
RalfJung opened this issue Jul 26, 2024 · 71 comments

Comments

@RalfJung
Copy link
Member

It is a frequent source of confusion that MaybeUninit<T> is not just preserving all the underlying bytes of storage, but actually if T has padding then those bytes are lost on copies/moves of MaybeUninit<T>.

This is currently pretty much a necessary consequence of the promise that MaybeUninit<T> is ABI-compatible with T: some ABIs don't preserve the padding of T when it is passed to a function. However, this was not part of the intention with MaybeUninit at all, it is something we discovered later.

Maybe we should try to take this back, and make the guarantee only for types without padding?

I am not even sure why we made this a guarantee. We made the type repr(transparent) because for performance it is quite important that MaybeUninit<$int> becomes just an iN in LLVM. But that doesn't require a stable guarantee. And in fact it seems like it would almost always be a bug if the caller and callee disagree about whether the value has to be initialized. So I would be curious about real-world examples where this guarantee is needed.

@elichai
Copy link

elichai commented Jul 26, 2024

I'll add that a typed copy of an uninitialized variable is UB in C, so there's no need to promise any ABI for FFI compatibility,
So this leaves us with the "Rust" ABI which isn't stable anyway.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024 via email

@Diggsey

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@carbotaniuman
Copy link

The PR that introduced the guarantees does not talk about padding, and it seems like that wasn't really understood back then. The t-lang minutes discussing this are lost to time and reorganizations, but it seems doubtful that such a consideration was raised. Discussions from 2018 raise a lack of real-world use cases for ABI compatibility, and I agree with such a sentiment in the present.

I don't this this would be approved nowadays, but I am incredibly apprehensive about removing it. There are few places in the Rust documentation that use always for guarantees like this, and the use cases for some weird FFI thunks or bindings would be nigh-impossible to properly test with crater or similar...

@Diggsey

This comment was marked as off-topic.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024

@carbotaniuman I think we should consider removing it. If we can't come up with any legitimate usecase, I think we should definitely remove it. I don't like going back on a promise like this, but if we don't have a usecase that could be broken by taking back this promise, then the chances that someone is affected should be very slim.

@Diggsey thanks for explaining why you think this belongs in this thread. But I disagree. "MaybeUninit preserves provenance" is not relevant here. You will note that provenance does not appear in the issue description. Furthermore, provenance on CHERI works like it does everywhere else, so even if provenance were relevant, CHERI wouldn't change anything. It is true that you can write code with MaybeUninit that will work everywhere but not on CHERI; discussing that is off-topic here as the reasons are completely different from what this thread is about and also different from what you mentioned -- it is caused not by padding and not by provenance, but by capabilities. So please take this elsewhere, e.g. Zulip or a new issue where you explain why you think MaybeUninit's provenance behavior is incompatible with CHERI, but I'd prefer not to see yet another thread derailed by CHERI. I like CHERI and want to see it work in Rust, but our main task here is to figure out Rust for the existing targets we already support. CHERI support is a nice extra that I'll happily discuss, as long as it doesn't distract from our core task.

@carbotaniuman
Copy link

If we do agree to remove the guarantee, I expect it to break 0 uses in practice. My only other concern would be the performance impact of having to copy more bytes. It probably won't affect SIMD or buffers though, so I don't really think that's it's really an issue.

@bjorn3
Copy link
Member

bjorn3 commented Jul 26, 2024

I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using repr(C) instead of repr(transparent) I think.

@chorman0773
Copy link
Contributor

FTR, I use MaybeUninit<T> in the signatures of lccc's libatomic and libsfp ABI-level routines. This is because they get called from xlang's codegen, and xlang allows uninit for those operations. Though in these cases, they are types without padding in the signature.

However, for compatibility with gcc/clang, they have to expose an ABI equal to the rountines using primitives.

@chorman0773
Copy link
Contributor

(And in general, I agree with @carbotaniuman - unless crater is testing all kinds of targets, I'm betting it primarily tests x86_64, where aggregate-of-one-field will get passed the same way as that one field*, so without using miri-crater, the ABI checks won't be found by crater. If the code is used on something like arm32 though, it's going to be very visibility broken)

@RalfJung
Copy link
Member Author

Yes, this is super hard to test for. I wonder if it's worth having a blog post asking people whether they need this guarantee...

I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using repr(C) instead of repr(transparent) I think.

Yes, concretely the proposal would be:

  • T and MaybeUninit<T> always have the same size and alignment
  • they have the same ABI if T has no padding

Or maybe "no padding" should be restricted a bit further, like "if T is a primitive integer/float/pointer type" or so. Note that some non-power-of-2 SIMD types have padding so we have to be careful if we want to talk about those types.

@Ddystopia
Copy link

Is the only motivation to backing up on that promise is the fact that this is a frequent source of confusion? Which benefits except clarity can Rust gain?

@RalfJung
Copy link
Member Author

We never intended MaybeUninit<(u8, u16)> to have a padding byte that would be lost on copies. This is a complete accident. It's not just a source of confusion, it's not the semantics we want. It came up in #517 where it means that returning a MaybeUninit<T> from an atomic compare-exchange actually doesn't work since padding bytes still get lost so we won't end up having the same bit pattern as what is stored in the atomic location.

@jamesmunns
Copy link
Member

jamesmunns commented Jul 26, 2024

I'm pretty sure this is still the case, but it might be worth it to enumerate things that ARE still allowed for this wrt FFI/ABI concerns. My primary use of MaybeUninit<T> in FFI is for "outptr" usages (edit: specifically &mut MaybeUninit<T> or *mut MaybeUninit<T>), which seems to be still good (because we never copy/pass by value - the part that is discussed by this issue), but for folks like myself might be worth spelling out clearly/contrasting what is no longer allowed.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024

Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility.

In public communication we'll obviously spell out the details more than in internal discussion. ("Internal" not as in "private" but as in "among the team members and anyone else who's willing to participate".)

@chorman0773
Copy link
Contributor

Or maybe "no padding" should be restricted a bit further, like "if T is a primitive integer/float/pointer type" or so. Note that some non-power-of-2 SIMD types have padding so we have to be careful if we want to talk about those types.

I at the very least need target simd types as well - for floating-point types that aren't directly supported by rust (e.g. f2x64_t), I wrap them in a target-specific ABI type, which on x86_64, is mostly __m128.

@RalfJung
Copy link
Member Author

Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general.

But that case would be covered by "types without padding", or we could explicitly mention the stdarch SIMD types (since they are all powers of 2).

@chorman0773
Copy link
Contributor

Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general.

Not fully - you don't necessarily need to compile the rtlibs with lccc themselves, they're written in mostly portable rust, and quite deliberately. I'd like to be able to continue providing that guarantee.

Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility.

You can also now see a formalization in reference#1545, as a note.

@RalfJung
Copy link
Member Author

This has caused an actual soundness bug now: rust-lang/rust#134713.

I think we should seriously consider restricting the ABI compatibility guarantee to scalar and SIMD types.

@chorman0773
Copy link
Contributor

The issue with that is that will make it hard to represent "Maybe Initialized" aggregate types in ABIs.
I already know I need this for fn() since I use this (via a MaybeValid1 wrapper), in place of void(*)(int) in signal. Depending on some other design details of my OS (which I'm currently in the process of implementing a wine-like compatibility layer for running programs for it in Rust), I might need this behaviour (either via the MaybeValid wrapper or via MaybeUninit itself) for at least one aggregate type (at the very least, if I want to keep the strong typing my OS's API has been exporting).

Footnotes

  1. MaybeValid is a repr(transparent) wrapper arround MaybeUninit<T> that disallows uninit bytes wherever T does as a safety invariant. In particular, the type is allowed to contain "Any Initialized Bit patttern" or any safe value of T, minus padding bytes of T.

@RalfJung
Copy link
Member Author

Yeah I don't think a by-value ABI-compatible "maybe init" aggregate is common enough to justify the constant stream of surprises and UB that this problem causes. I would suggest not designing your OS around such a facility.

@RalfJung RalfJung changed the title Should we / can we make MaybeUninit<T> always preserve all bytes of T? Should we / can we make MaybeUninit<T> always preserve all bytes of T (including padding)? Feb 23, 2025
@carbotaniuman
Copy link

I had a longer post here that I since felt was too confrontational, but my thoughts have changed and I do not believe that solely changing this is justifiable given that this is not a breaking change for soundness, but merely to make the use of the API better for users. Unsafe Rust already has multiple sharp edges (SB/TB, Box noalias, provenance), and I feel like this is not a particularly sharp one once users understand it.

I would also like to echo the alternative of a new type BikeshedMaybeUninitBagOfBits, or if desire is expressed to retake the better name for the more useful use-case, BikeshedMaybeUninitIgnoringPadding. This to me feels similar to the exposed provenance methods in that we may not like them, but they've ossified (in this case for nearly 6 years), so providing both a good option with the obvious semantics as well as a way to express the use-cases others may care about with regards to ABI is a good compromise. We could even hang this on an edition!

@ia0
Copy link

ia0 commented Feb 23, 2025

I might be missing something obvious, but isn't BikeshedMaybeUninitBagOfBits<T> the same as [MaybeUninit<u8>; size_of::<T>()]? This has limitations today because of generic_const_exprs (in particular you can't fix the stdlib soundness bug with this yet), but ultimately it seems to me the current definition of MaybeUninit seems the most expressive one (making it irrelevant to also have BikeshedMaybeUninitBagOfBits).

@saethlin
Copy link
Member

I would also like to echo the alternative of a new type BikeshedMaybeUninitBagOfBits

What properties does this type have?

@carbotaniuman
Copy link

BikeshedMaybeUninitBagOfBits is the hypothetical MaybeUninit that preserves all bytes of T like is being proposed in this issue. This was called bag of bits semantics in the past which is why the bikeshed is named that.

@chorman0773
Copy link
Contributor

chorman0773 commented Feb 23, 2025

I would suggest not designing your OS around such a facility.

The issue is that the design isn't simply "Whether or not to use MaybeUninit in signatures". The issue that comes up is "What do I have to do on the rust side to match this C API while being compatible with other design decisions of the OS". Much of the SCI (System Call Interface) surface for the OS takes two-pointer aggregates by value instead of by pointer. In the past, when signals were a native part of the OS (and not emulated atop uSEH), sigset_t was defined as a struct wrapping a [u64; 2]). I'd rather those decisions not make it impossible to define other APIs in Rust.
Berkely Sockets will also be fun given that SOCKET likely needs to be a struct of a handle + metadata. And I may need to handle uninitialized SOCKETs - or at the very least, MaybeValid<SOCKET>s.

@chorman0773
Copy link
Contributor

Also I speak of a position of being involved in the discussion here - a third party having absolutely zero idea this is happening may have just as much cause to rely on a stable language guarantee. And given much of this code just won't even run in miri (miri won't even run winter-lily, which is my current project touching Lilium), this is probably in the realm of "Breaks silently, until it doesn't".

@Lokathor
Copy link
Contributor

Changing how pointers to MaybeUninit works does appear to be in the proposal, and MaybeUninit integers would also specifically not be affected, so I guess you're concerned that people are passing MaybeUninit<SomeStruct> by value to/from an extern "C" function? Do I have that right?

@RalfJung
Copy link
Member Author

RalfJung commented Feb 28, 2025

@chorman0773

Yes I am proposing to take back a documented language guarantee, and replace it with a different, more useful guarantee. I think long-term this will cause less harm. So I was asking if there's any cases where the ABI guarantee is useful or even needed.

I am still extremely confused about your example. You keep bringing up more and more concepts you're not explaining and there's too many parties calling each other so it's not even clear which call you are talking about when. When I think I understand is that there's a particular function call where the caller uses signature extern "C" fn(i32, siginfo_t, ucontext_t) (but you never even stated that signature! you stated a different one, and then later said that's not the real signature). But then based on some ambient information the caller might know that the callee is actually declared as extern "C" fn(i32, siginfo_t) and therefore you want to leave the last argument uninitialized. In other cases all 3 arguments exist and you want to actually pass the data, and then the ABI must of course match. But there's also a "Rust trampoline" involved somehow and now you completely lost me again.
There should be a single function call that matters, where caller and callee use different but ABI-compatible types. Please explain everything about that call, and leave away everything else.

Could you achieve your goals without the ABI guarantee (and without worsening performance)? And if yes, would that solution be any less "natural" than what you are currently doing? Frankly, based on what you said so far, any possible alternative seems more natural to me. ;)

@Lokathor
Copy link
Contributor

(sorry for any confusion Ralf, but my own question was directed at chorman)

@chorman0773
Copy link
Contributor

chorman0773 commented Feb 28, 2025

Ok, I'll restate what the flow is:

  • The Lilium Kernel has a core concept called an exception handler. This is invoked by the kernel (in much the same manner that a POSIX signal handler is invoked - asynchronously or synchronously, and assume same sort of ordering constraints on an asynchronous exception)
  • For certain exception types, the userspace runtime invokes a C signal handler. In order to do this, it first invokes a trampoline function. This trampoline always has signature extern "C" fn(i32, MaybeUninit<siginfo_t>, MaybeUinit<ucontext_t>) The exception handler knows when to initialize siginfo_t and ucontext_t but always passes them. This is because ucontext_t in particular is partially initialized with a handle needed to resume handling the previous exception.
    • The trampoline is invoked first because the exception handler is actually "Returning" to it. It needs to resume from the Exception Handling state so that the signal handler can call raise (if it was asynchronous), which in turn is translated to the synchronous entry point to exceptions, being ExceptionHandleSynchronous (calling this on a thread that is currently handling an exception causes the thread to exit with an unmanaged exception)
    • The Trampoline "call" is setup manually, so this relies on knowing (and matching) the ABI for MaybeUninit<siginfo_t> and MaybeUninit<ucontext_t>. This in turn requires an ABI guarantee for those types.
  • The trampoline then knows whether or not to call sa_sigaction (with a signature of unsafe extern "C" fn (i32, *mut siginfo_t, *mut ucontext_t)) or sa_sighandler (with a signature of unsafe extern "C" fn(i32)) (the trampoline also has a bunch of other setup to do to fully support the full set of sigaction options from POSIX). For efficiency reasons, the call is just merged into one without a branch (though this can easily be done from asm).

ucontext_t and siginfo_t are defined in another library that's also used by consumers of the API. The types there probably don't want to let the callee deinit things (especially in ucontext_t which is then read back by the trampoline, before the trampoline goes back into the Exceptionhandling context to resume handling the exception through other means).

@workingjubilee
Copy link
Member

@RalfJung You are the Pope of Rust, or at least the Pope of Rust Safety Models. Whenever you say something, even if you say something blatantly wrong, almost no one calls you out on it. You are the proxy author and reviewer of all std code because everyone is reading everything you are writing and thinking about it when writing unsafe code. You are repeatedly cited in these discussions. If you repeatedly say something wrong, you can convince other people it's true, simply by repeating the wrong thing.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 28, 2025 via email

@workingjubilee
Copy link
Member

workingjubilee commented Feb 28, 2025

That is a reasonable stance, honestly.

I just am not surprised incorrect code is written based on something you say again and again, and don't think it should be taken as evidence the confusion is that widespread if it might instead be your confusion spreading widely.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 28, 2025 via email

@comex
Copy link

comex commented Feb 28, 2025

@chorman0773

Ok, I'll restate what the flow is:

Okay. It sounds to me like even if MaybeUninit's ABI compatibility guarantee is restricted to primitive types and its ABI for other types is unspecified, you have some options:

  1. Define MaybeUninitSiginfo and MaybeUninitUcontext structs that are like siginfo_t and ucontext_t but wrap the individual fields in MaybeUninit.

    This would still be guaranteed ABI-compatible with siginfo_t/ucontext_t by-value parameters in C.

    Maybe this is an abstraction violation, but it sounds like it shouldn’t be the end of the world since this is such a special case.

  2. Change the trampoline to take siginfo_t and ucontext_t by pointer, i.e. extern "C" fn(i32, *const siginfo_t, *const ucontext_t).

    After all, it's not like this is some syscall interface where you can't pass parameters via the stack. siginfo_t and ucontext_t are both large structs that will be passed on the stack anyway. Is the issue that the trampoline itself is a stable ABI boundary you don't want to break (separately from the stable POSIX API)? If so, then I think you've already made a mistake. ucontext_t may need to grow over time to account for future architecture extensions, so it should never be passed by value across a stable ABI boundary.

However, I think we can do better. Even if MaybeUninit<T> is not guaranteed to be ABI-compatible with T, it definitely should be guaranteed to have a stable ABI (when passed to extern "C" functions). It shouldn't be something that can change on rustc upgrades.

That should be enough of a guarantee that you could keep doing what you're currently doing. Since you're setting up the call manually, all that matters is that there is some stable ABI, not what it is exactly. The only issue would be if (a) you are worried about breaking an existing stable ABI boundary (for your apparently still-in-development OS), and (b) the stable ABI rustc adopts for MaybeUninit<siginfo_t> or MaybeUninit<ucontext_t> doesn't match what rustc does today. But in practice it is going to match, because the structs were already being passed on the stack.

Beyond that, if MaybeUninit has a stable ABI, then we may as well define it as being ABI-compatible with something. For example, if T is a struct, MaybeUninit<T> could be documented as ABI-compatible with an equivalent struct where u8 fields are added to fill all padding bytes. (This would need to be fleshed out a bit more to deal with nested structs/unions/whatever, but the basic idea should work.) This would ensure that functions taking MaybeUninit by-value can still be called from C or whatever.

…With all that said, I am definitely sympathetic to the alternative view that MaybeUninit<T> ought to remain ABI-compatible with T. It's definitely surprising if they're incompatible. But I also don't want Rust to have to add yet another wrapper type.

@carbotaniuman
Copy link

carbotaniuman commented Feb 28, 2025

I think this breaking change is not justified by the stated goal, which is expressly not any inherent unsoundness, but just a desire to reduce a paper cut for unsafe Rust users. I think that's a great goal (and to be clear, I support it myself), but unsafe code is tricky, with many corner cases. Box noalias remains on the books today despite being a far larger footgun. We have not yet ruled out Stacked Borrows as an aliasing model! MaybeUninit not keeping padding may be surprising, but I think it can also be explained relatively easily and is a small footgun compared to other complexities.

In addition, this creates a special case in the ABI compatibility rules. Windows until a few months ago, passed MaybeUninit by value across FFI boundaries. Now, I'm 95% sure that all of those types are simple pointers, but we've now added a footgun in cases where they aren't. And any cases which relied on this ABI compatibility will likely break in weird and spectacular ways only at runtime - these are the types of cases likely to be underindexed on Crater.

Making MaybeUninit<T> preserve the padding bytes of T also comes with a performance downside - previously these padding bytes were garbage, and the compiler could not pass it across function boundaries (especially in extern "Rust", where this guarantee would definitely apply). Making the padding worthwhile would likely regress common use cases of storing or transferring a maybe uninitialized T in favor of use cases that do need this bag of bits functionality.

I think it's also disingenuous to act like the use cases here are contrived and useless - this is a documented language feature, not RUSTC_BOOTSTRAP. And yes, while niche use cases are indeed niche and some of us here may consider the stabilization to be a mistake, we (speaking as a community) did make the mistake.

I am also confused as to why my compatibility ideas seemed to have been (intentionally?) ignored. MaybeUninit is a vocabulary type today - you cannot write the semantics of it in library code. #[repr(transparent)] on unions has real problems with the design, and may not be stabilized in a "reasonable" amount of time. Migration to something else for the use cases that need it will be all but impossible, or users will just take the hit and write T instead, willfully invoking undefined behavior due to the lack of an alternative.

To me, MaybeUninitIgnoringPadding and MaybeUninitBagOfBits are different types with different use cases. What they may be named or how those capabilities are actually written are immaterial to me, what is important is that we do not throw away a core guaranteed capability without sufficient migration patterns and justification.

@comex
Copy link

comex commented Mar 2, 2025

Making MaybeUninit<T> preserve the padding bytes of T also comes with a performance downside - previously these padding bytes were garbage, and the compiler could not pass it across function boundaries (especially in extern "Rust", where this guarantee would definitely apply). Making the padding worthwhile would likely regress common use cases of storing or transferring a maybe uninitialized T in favor of use cases that do need this bag of bits functionality.

I think you will be hard-pressed to find a case where this makes a measurable difference, even if you specifically microbenchmark the function call.

I'm neutral regarding the rest of your post.

@RalfJung
Copy link
Member Author

I think it's also disingenuous to act like the use cases here are contrived and useless - this is a documented language feature, not RUSTC_BOOTSTRAP. And yes, while niche use cases are indeed niche and some of us here may consider the stabilization to be a mistake, we (speaking as a community) did make the mistake.

It took as like 2 weeks to even get to the bottom of the one use case that was brought up. And then it turns out it's not actually a use case for the ABI guarantee that is documented, but for a weaker guarantee that is entirely compatible with MaybeUninitBagOfBits. So I think "contrived" is a pretty accurate description. "useless" was not used in this discussion; please don't put words in other people's mouths.

To me, MaybeUninitIgnoringPadding and MaybeUninitBagOfBits are different types with different use cases.

We haven't yet seen a single use case for MaybeUninitIgnoringPadding. @chorman0773, it turns out, doesn't need MaybeUninitIgnoringPadding, they just need some stable ABI for MaybeUninitUcontext and MaybeUninitSiginfo.

All evidence points towards MaybeUninitIgnoringPadding being a type that nobody needs. I will also point out that nobody wanted to add that type, it was created by accident.

@carbotaniuman
Copy link

All evidence points towards MaybeUninitIgnoringPadding being a type that nobody needs. I will also point out that nobody wanted to add that type, it was created by accident.

I think this is an example of overindexing on the responses present. The vast majority of use cases for this will be low-level, or a workaround for some legacy code, or maybe to provide some potentially uninitialized data to an assembly function for math or similar. You can probably find issues (undocumented guarantees, weird code, not technically supported) with any or all of these potential use cases. I might even agree with those issues.

I personally do have code running that uses this ABI guarantee, but I don't particularly care about how this issue resolves wrt to that code. If the capability goes away, I will just remove the MaybeUninit and go on with my life. Nor do I really have a desire to litigate the "validity" or non-contrivedness of how my code is written.

As I have said, I am not saying that we should freeze MaybeUninit<T> in amber because of the past, only that the capabilities not be lost.

But maybe it is decided that these capabilities are not worth keeping around. The obvious next step will be a crater run. Much of this code will not be present in crater. It may be private, internal, or using FFI, such that crater cannot really test it. I expect there to be ~0 breakage on said run. Such a number will not accurately reflect the breakage. And as the capabilities are taken away, there will no way to migrate without willfully invoking UB.

Again, maybe that much breakage will be tolerated. We broke an inordinate amount of the ecosystem in the time debacle. This would be by significantly smaller and far less impactful. But from my reading of your comments, I think you seem to be thinking that there will be ~0 actual breakages from this change. To that, I completely disagree.

@CAD97
Copy link

CAD97 commented Mar 12, 2025

No fundamental capabilities are lost, as you can use a compound type of MaybeUninit instead of a MaybeUninit of compound type to express the same exact ABI as is "lost" by this change. It's perhaps not as compositional, but the ability to write the ABI is still present.

And note that for any repr(C) type which is pass-in-memory, likely nothing even changes! To have a case which is broken, you need a case which doesn't preserve padding over some stable ABI, or to be a Rust-Rust call, which is immediate UB if there are any uninitialized bytes in an initialized receiving argument type anyway.

If the capability goes away, I will just remove the MaybeUninit and go on with my life

Even if you don't want to debate the "validity" of whatever hacks you needed to use, it would still be good to see an example of a case where you really do want the ABI for MaybeUninit of a non-scalar to match that of the initialized type. Even if it's just “I have a cursed legacy C API where a correct call provides an uninitialized struct lvalue as a parameter,” that's better than just saying “code could be relying on this.”

I really am sympathetic to being perfectly strict about avoiding language breaking changes. But this really does feel like a case of accidental stabilization that just makes things worse for everyone. Casting fn(MaybeUninit<T>) -> U to fn(T) -> MaybeUninit<U> isn't useful enough to justify nonscalar ABI equivalency nor a second flavor of MaybeUninit.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 12, 2025

@carbotaniuman you are basically saying "trust me I have a usecase but I'm not interested in telling you about it". That's not a constructive contribution to this discussion.

As you said, a crater run is not very useful. The next step is an RFC to get wider awareness of this proposal and make it more likely that if there is some usecase relying on this accidental guarantee out there, we will hear about it.

@carbotaniuman
Copy link

Given your behavior to the other use case presented, I do not believe you are asking this in good faith. My main use cases are effectively what CAD97's described, where I have C (and assembly) code that returns various complex structs that are potentially uninitialized. I would justify this with ASM freeze, but that's verboten, so I am being technically correct by using MaybeUninit<T> here and operating solely on that.

@CAD97
Copy link

CAD97 commented Mar 13, 2025

Could it make sense to specify MaybeUninit's ABI a bit more specifically than just for scalars? Specifically, to guarantee that:

  • for MaybeUninit of a scalar, MaybeUninit has the ABI of that scalar; and
  • for MaybeUninit of a type guaranteed to have an indirect (i.e. in memory, whether at a known stack position or behind a pointer) ABI, MaybeUninit has that same indirect ABI.

Some version of this, if practical to specify and provide, would serve the needs of the existing code using MaybeUninit's ABI, as it keeps the ABI matching in the scenarios that don't cause issues, only changing it for types shaped like #[repr(Rust)] (scalar, scalar) (ScalarPair ABI).


Aside: C code which converts an uninitialized lvalue into an rvalue has undefined behavior by the standard for any type which is not excluded from having trap values (which is essentially LLVM's undef), i.e. is not unsigned char. But with the exception of LTO, Rust does FFI across the OS object file semantics (i.e. machine code semantics), not the C language semantics.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 21, 2025

Given your behavior to the other use case presented, I do not believe you are asking this in good faith.

I am sorry you feel that way. I don't know what I could have done differently to avoid this. Maybe I could have been a bit more patient in how I extracted the details of the use case; I admit I was frustrated since the first explanations we were given were just not useful. But ultimately, if getting to the bottom of a technical question is considered acting in bad faith, we may as well stop having technical discussions altogether. I won't just take it on faith that someone has a use case that they are unwilling or unable to properly describe.

And it turned out I was right in getting to the bottom of this, since "T and MaybeUninit<T> are ABI compatible" turned out to be entirely irrelevant for that use case, as I suspected. What really matters is "MaybeUninit<T> has some well-defined ABI".

The proposed migration plan for cases like that is to move the MaybeUninit down to the fields, so that it only ever wraps scalars. I hope that would also cover your use case. (And as CAD says, it is nearly impossible to return potentially uninitialized structs from C without causing UB. clang in fact adds noundef to all arguments and return values, even character types.)

Could it make sense to specify MaybeUninit's ABI a bit more specifically than just for scalars? Specifically, to guarantee that:

This is getting very close to the edge of my knowledge of ABI details, so I feel uncomfortable making definite statements here. Deciding ABI things without knowing enough about ABI is what got us into this situation in the first place.

In particular, there's somewhat of a layering violation here: the Rust compiler and language, and the docs, don't really have a concept of which types would have an indirect ABI. So there's no proper way we could even set up that definition in the current framework -- we'd have to make that framework a lot more complicated first.

@carbotaniuman
Copy link

carbotaniuman commented Mar 21, 2025

since "T and MaybeUninit are ABI compatible" turned out to be entirely irrelevant for that use case

I think this is misleading, and being able to spell out the struct as a bunch of MaybeUninit scalar fields to match the underlying struct is useful as a migration, but to call this an alternative is the peak of malicious compliance :/. There's also no guarantee that the structs will be laid out in the way I expect. Fundamentally, on one side, I have a function fn(Args) -> Foo, (where uninitness is not a factor because we are on the ABI level), and on the Rust side I would like to call it like fn(Args) -> MaybeUninit<Foo> (because uninitness does matter in Rust).

In my opinion, I think that relies on the ABI compatibility guarantee that was made. My other alternative is a type that can represent this ABI (MaybeUninitIgnoringPadding) that is not tied to MaybeUninit name, but that suggestion has been repeatedly ignored so whatever. This to me epitomizes the bad faith argument undertaken here, where the use cases provided are being ignored and minimized as much as reasonably possible, while potential compromises are ignored and not even addressed.

Ultimately I no longer really have an opinion on this change - it will likely be rammed through anyways. I would like to say if we don't make it easy (or even possible) for people to do the use cases that they want in the correct way, I suspect that they will just not. Personally, it's looking like the best "migration" if this were to occur is simply willfully invoking UB, and I'll likely be doing that for my code in order to immunize myself against this change.

@comex
Copy link

comex commented Mar 23, 2025

Fundamentally, on one side, I have a function fn(Args) -> Foo, (where uninitness is not a factor because we are on the ABI level)

One of the points Ralf is trying to make is that that's not a thing.

If you're interoperating with raw assembly, then there is no uninit, but there is also no such thing as fn(Args) -> Foo, only registers and stack. You have to think about ABI lowering manually on a function-by-function basis. So if you are starting from scratch, there is no reason why generic ABI guarantees of the form "Foo is the same as MaybeUninit<Foo>" would be useful at all. Now, in practice, nobody is starting from scratch, so maybe you have a pile of existing assembly functions with existing C or Rust function signatures, and now you want to change the signatures to be more uninit-aware. In this case, generic guarantees would help but they're not really necessary; you would just need to verify that the ABI still matches for each of the specific functions you're looking at. Most of the time it will.

Is that your use case?

If so, I can understand how going through assembly functions would be a pain, and the backwards-compat break is inherently a pain, but I don't understand how it becomes as huge a problem as you're suggesting.

On the other hand, if you're interoperating with C, then there is such a thing as function signatures, but there is also such a thing as uninit and UB. Which gets into this awkward situation where there is a ton of C code that either (a) is UB but nobody cares, or (b) is not UB according to the spec but the compiler optimizes it as if it were UB. And Rust is trying to be stricter on that front. So perhaps you want to add MaybeUninit on the Rust side but not the C side (because C doesn't have this concept), even though from the compiler's perspective that's pretty weird because the optimizations are the same on both ends.

Is that your use case?

If so, then I actually do understand how this could potentially be a huge problem. The C side may be UB, but it works (presumably), and the function signatures may be baked into legacy code, so it makes sense to want to only use MaybeUninit on the Rust side. And if the code is portable, then analyzing the ABI on a case-by-case basis is not possible, so the only solution is to push MaybeUninit down to struct fields. Which is possible but very nasty.

Though the scope of the nastiness is still unclear, since most C libraries don't do a lot of passing structs by value.

Ultimately, I'm wildly speculating here, because you haven't explained your use case. You need to stop with the charged language and explain, or else we will all continue to not understand each other.

@carbotaniuman
Copy link

carbotaniuman commented Mar 24, 2025

I apologize, given the complexity of the project I have tried to give the guarantee I actually need, but I'll provide as much detail as possible here. The project I used for this has several layers, some of which are in C, some of which are in assembly, some of which are in a custom glorified macro assembler, and some of them are in a custom DSL written in C. The usage of these languages are pretty normal, with a portable C implementation alongside some custom assembly (and DSL) implementations to better take advantage of the hardware. The actual purpose of the library is DSP-y things, so performance is relatively important.

The actual interface that a user would use is of course a relatively normal C interface. Given the maintainability of the tech stack however, it would be nice to move some of this to Rust. Unfortunately this exposes us to the bad internal interfaces. For instance, some callsites in C pass along something like:

// I believe this would be directly broken by this change
struct Renamed {
    char foo;
    unsigned long long int bar;
};
struct Trimmed {
    long int foo, bar, baz;
    int num[16384];
};
// Renamed is passed uninit into this function
Renamed process(Renamed a);
// Example is passed uninit into this function
Trimmed do_stuff(Trimmed a, size_t in, size_t *out);

Trimmed is passed indirectly (I think on all ABI, but I'm decidedly not an ABI expert), and while it really should be a pointer, making that changes is not really easy given the current state of that codebase.

If you're interoperating with raw assembly, then there is no uninit, but there is also no such thing as fn(Args) -> Foo, only registers and stack. You have to think about ABI lowering manually on a function-by-function basis.

Sure, but ultimately an assembly function can fulfill(?) some ABI lowering such that it is the same as a fn(Args) -> Foo. Technically speaking, it is an assembly function that for instance expects certain parameters to be passed in registers, but I think that's relatively unhelpful for wanting to actually call this function from C or Rust.

On the other hand, if you're interoperating with C, then there is such a thing as function signatures, but there is also such a thing as uninit and UB. Which gets into this awkward situation where there is a ton of C code that either (a) is UB but nobody cares, or (b) is not UB according to the spec but the compiler optimizes it as if it were UB. And Rust is trying to be stricter on that front. So perhaps you want to add MaybeUninit on the Rust side but not the C side (because C doesn't have this concept), even though from the compiler's perspective that's pretty weird because the optimizations are the same on both ends.

The C code is compiled with a legacy compiler that does not optimize usage of uninit memory, or else I doubt the code would currently be working.

Is that your use case?

Yes, this is basically the use-case, with some extra context given by the responses above.

And if the code is portable, then analyzing the ABI on a case-by-case basis is not possible, so the only solution is to push MaybeUninit down to struct fields. Which is possible but very nasty.

Ultimately I think this is the main contention I have here. Writing MaybeUninitIgnoringPadding<T> means I am 100% confident that no matter what T is, the ABI will be the same, except that uninitialized memory will be allowed in the type. Any carve-outs means that I have to go through and audit the actual type, and potentially write custom structs where MaybeUninit<T> is pushed to the scalars.

Pragmatically, does it matter that I used MaybeUninit<T> in the Rust interfaces instead of just T? Probably not, beyond being "more correct" and self-documenting, but given my involvement in these sort of discussions I am the type to try to make my code as correct as possible. And I think if I had chosen T, I think the compiler would be far less likely to break my code than this change would be. I think that having taken the time out to be more careful by using MaybeUninit, this sort of code should not be broken wholesale without keeping the underlying ability to write these ABIs.

@RalfJung
Copy link
Member Author

// Example is passed uninit into this function

There is no type Example in your example, do you mean Trimmed?

@carbotaniuman
Copy link

Yes, that is supposed to be Trimmed.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 28, 2025

Okay. Thank you for getting over the part where you are insulting me, to an actual example.

I did not talk about MaybeUninitIgnoringPadding because having that type presupposes there's a motivation for it, and it is exactly that motivation we were looking for. (I also find the name confusing since "ignoring padding" could also mean "treat padding like all other bytes" which this type does not.)


Regarding your example, IIUC the summary is that you are interfacing with C code where types like Renamed and Trimmed are used with uninitialized fields. This is UB in C. However, you want to hold the Rust side of this to a higher standard than the C side and have it be UB-free.

This could be done even after the proposed change by wrapping the leafs of the type, as follows:

struct Renamed {
    foo: MaybeUninit<libc::c_char>,
    bar: MaybeUninit<libc::c_longlong>,
}
struct Trimmed {
    foo: MaybeUninit<libc::c_long>,
    bar: MaybeUninit<libc::c_long>,
    baz: MaybeUninit<libc::c_long>,
    num: [MaybeUninit<libc::c_int>; 16384],
}

However, the code already exists, and the proposed change would break it since it changes the ABI of these functions that you already wrote. And frustratingly, if you hadn't cared about making the Rust side properly sound, you wouldn't be affected by the ABI change.

@CAD97
Copy link

CAD97 commented Mar 28, 2025

This change shouldn't need to be a break to the C FFI usage, and avoiding breaking it potentially may not require the specification to look at the low-level ABI choices either.

Are we aware of any existing cases where a repr(C) ABI passes an object type and doesn't maintain the full object representation? By my reading, the C standard definitely does permit such; function calls does assignment to copy the arguments, and both assignment and return statements both say they move a value, not the object (which identifies the region of data storage wherein a value is represented).

Aside about uninit in C

I've now seen the requirement that the value of a struct is not a trap representation even if any member is one, so my prior argument that an argument or return of an uninitialized lvalue is UB by the current standard doesn't work for any nonscalar types. However, one step removed, reading the aggregate value from the object may require reading the scalar field value which can trap and cause UB; it's just a less direct argument that I'm not as strongly confident in.

It's probably not ideal to say that MaybeUninit is transparent for #[repr(C)] types without any caveats, because that makes the "preserves all bytes" quality conditional on subtle target ABI details. But it probably is okay if we also require that the target ABI does preserve padding bytes for a target to be tier-2, or state it as a truth with "broken ABI" caveats like with floating point on i686 targets.

It's definitely not desirable to break incremental porting and improving soundness w.r.t. uninit values over FFI. But this only holds for repr(C) and scalars, since containing #[repr(Rust)] in an extern "C" signature is intended1 to raise the warning for improper_ctypes_definitions to inform you that you can't rely on the ABI beyond Rust-Rust linkage with a single compiler.

Footnotes

  1. There are a lot of issues with how the FFI safety lints' function, and any type that isn't fully concrete can easily result in the lint failing to see "improper" FFI. But generally it's understood that this is a limitation of the compiler, and not that false negatives don't use unstable ABI that can't be relied on beyond Rust-Rust linkage with a single rustc configuration.

@comex
Copy link

comex commented Mar 28, 2025

@CAD97

Are we aware of any existing cases where a repr(C) ABI passes an object type and doesn't maintain the full object representation?

Yes. (edit: specifically, both x86-64 and RISC-V ELF ABIs.)

Edit 2: To be clear, this is only for types passed in registers.

@carbotaniuman
Copy link

Regarding your example, IIUC the summary is that you are interfacing with C code where types like Renamed and Trimmed are used with uninitialized fields. This is UB in C. However, you want to hold the Rust side of this to a higher standard than the C side and have it be UB-free.

Yes, effectively. I would prefer the thinking that the custom compiler has defined the behavior of uninitialized reads to be a freeze, (I'm going to ignore the can ASM freeze tangent), but it's not standard C either way so this feels like debating semantics.

But this only holds for repr(C) and scalars, since containing #[repr(Rust)] in an extern "C" signature is intended.

The structs I am dealing with are indeed #[repr(C)] in the Rust code and I suspect Ralf just forgot that when writing out his examples.

It's probably not ideal to say that MaybeUninit is transparent for #[repr(C)] types without any caveats, because that makes the "preserves all bytes" quality conditional on subtle target ABI details. But it probably is okay if we also require that the target ABI does preserve padding bytes for a target to be tier-2, or state it as a truth with "broken ABI" caveats like with floating point on i686 targets.

I'm not sure this is possible, afaik this padding issue is a thing on x86_64-pc-windows-gnu, which is probably the most Tier 1 of a target that you can get? I also think that making the semantics of MaybeUninit differ based on extern "C" vs extern "Rust" or whether the contained struct is repr(C) vs repr(Rust) allows this pattern to be expressed, but at a real cost to complexity and mental burden.

@comex
Copy link

comex commented Mar 28, 2025

@carbotaniuman

The C code is compiled with a legacy compiler that does not optimize usage of uninit memory

I'll note one thing.

If you're sure you're never going to use LTO between the C and Rust code, nor other exotica like running optimizations on already-compiled assembly, then you should be justified in saying: I don't need to use MaybeUninit, because none of the concrete machines I'm targeting have a concept of uninit bytes.

I think even @RalfJung would agree with me on this, if I flesh out the argument a bit more.

Under Ralf's approach, to formally model an FFI call, you first come up with some hypothetical equivalent pure-Rust code as the 'spec' for the call. Then prove that the FFI call actually satisfies that spec using "the implementation-specific relation between concrete machine states and AM states". In practice, this is more of a thought experiment; nobody goes around writing pure-Rust reimplementations of all the FFI calls they make. But an important aspect of this thought experiment is that you have flexibility in choosing the spec. For any given piece of concrete code, there are multiple possible specs that can justify it.

For calls into C, usually we want the spec to treat Rust uninit bytes as equivalent to C uninit bytes, because that way the model is compatible with optimizations that can happen after cross-lang LTO. But if you are not going to do LTO, you could also say that the spec is, broadly speaking, "when C passes structs as arguments to Rust functions, all the uninit bytes are replaced by arbitrary initialized bytes".

Well, technically that doesn't work because Rust doesn't have a freeze operation yet; see this recent discussion. But Rust is probably getting freeze soon. And even without freeze, you can typically still justify it with a more elaborate spec. Instead of "all the uninit bytes are clobbered", say "all the bytes which are supposed to be uninit, based on the API contract for when this particular struct is passed into this particular function, are clobbered". As long as it's hypothetically possible to determine which bytes can be clobbered and which must be preserved, either statically or based on other program data, then the hypothetical pure-Rust spec can make that determination and clobber those bytes.

@CAD97
Copy link

CAD97 commented Mar 28, 2025

I also think that making the semantics of MaybeUninit differ based on extern "C" vs extern "Rust" or whether the contained struct is repr(C) vs repr(Rust) allows this pattern to be expressed, but at a real cost to complexity and mental burden.

The only semantic which differs is the ABI for passing the type as a function argument or return value, which fundamentally depends both on the type's "ABI" and the function's calling convention. For example, for struct X(f64) on a Linux target, repr(Rust) extern "Rust", repr(Rust) extern "C", and repr(C) extern "C" all lower to the LLVM function signature as double, but repr(C) extern "Rust" lowers to LLVM i64 instead.

We currently document function signature ABI compatibility as that the caller and callee must agree on the calling convention and that the argument/return types must be ABI compatible with their counterpart, which is in turn defined nominally for repr(Rust) and structurally for repr(C).

It's already the case that MaybeUninit<Rust> and MaybeUninit<C> aren't compatible. My pitch keeps compatibility between C and MaybeUninit<C> but doesn't provide it for Rust and MaybeUninit<Rust>.

The existence of sees-meaningful-usage ABIs in the wild that don't pass all aggregate types as all-bits-meaningful does make that line of thinking a lot less appealing an option, though.

afaik this padding issue is a thing on x86_64-pc-windows-gnu,

specifically, both x86-64 and RISC-V ELF ABIs

I was under the impression from previous tries at reading them that both win64 and sysv64 calling conventions always passed aggregate types by register only if the entire size. But that seems to be incorrect; from some quick tests, it does seem that "scalar pair" types get passed in two registers. If the ABI mandates sign or zero extension, like apparently the RISC-V psABI does, then that does indeed mandate losing the padding bytes.

To be fair, I did recall the examples with #[repr(C)] union discarding padding, so I honestly have no idea why I never connected that to union being an aggregate type split between registers in the calling convention. (Perhaps I was thinking that it was fieldwise copied to a stack slot.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests