Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Pointer bytes as Integers #547

Open
chorman0773 opened this issue Dec 6, 2024 · 16 comments
Open

Reading Pointer bytes as Integers #547

chorman0773 opened this issue Dec 6, 2024 · 16 comments

Comments

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 6, 2024

This came up in rust-lang/reference#1664. I wanted to ask what T-opsem thinks about the behaviour of reading pointer bytes as integer types (or as char/bool/etc.).

As far as I can tell, there are two "sensible" behaviours, given that integers themselves do no carry provenance:

  • The pointer fragment is ignored,
  • Decoding error (thus undefined behaviour).

Given provenance monotonicity, which would be violated by the decoding error, it seems like the best option is that the fragments are ignored. Is there anything missed here? If not, can we get a formal sign off on this behaviour.

Note that I'm only considering the runtime behaviour, which can be a point against adopting the behaviour. Given that it's impossible to get the address of certain pointers in const-eval, it does need to be undefined behaviour (or otherwise an error) to read pointer bytes (to at least symbolic allocations) as integer types.

@saethlin
Copy link
Member

saethlin commented Dec 6, 2024

Given that it's impossible to get the address of certain pointers

Which pointers?

@chorman0773
Copy link
Contributor Author

I failed to clarify that. It was referring to the consteval AM, where allocations that exist outside of the particular constant evaluation (what I call symbolic pointers) can't be assigned an address.

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2024

Const-eval can't assign an address to any allocation, "inside" or "outside". (Not sure what you mean with that distinction.)

@chorman0773

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2024

Anyway that sub-discussion seems off-topic here, please move it to Zulip. And please update the issue description to clarify that "certain pointers" refers to const-eval.

@chorman0773
Copy link
Contributor Author

I suppose the third alternative that should be addressed is that the read exposes the pointer bytes, but I don't like that suggestion (and I recall few people did), as it means that reads can result in a side effect, and such reads as an integer type can never be elided.

Is there any other alternative I'm missing?

@RalfJung
Copy link
Member

Yeah I definitely don't like that suggestion, it pessimizes optimization too much. It is worth mentioning that that third alternative is basically what PNVI-ae-udi mandates for C. I am curious if compilers will actually implement that, though.

@RalfJung
Copy link
Member

RalfJung commented Dec 17, 2024

Note that I'm only considering the runtime behaviour, which can be a point against adopting the behaviour. Given that it's impossible to get the address of certain pointers in const-eval, it does need to be undefined behaviour (or otherwise an error) to read pointer bytes (to at least symbolic allocations) as integer types.

We could characterize this as a "unsupported in const-eval" error rather than a UB error. (Internally in rustc this is already what we do, ReadPointerAsInt is a variant of UnsupportedOpInfo. However we don't clearly distinguish those cases in the error message AFAIK, and we do call this UB in the transmute docs.)

That would be similar to how is_null is sometimes unsupported in const-eval.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member

@joshlf says they have a usecase for these transmutes in zerocopy. Or, to be more precise -- they have a usecase for making these transmutes not be UB. The goal isn't actually to ever run these operations, but having them be well-defined allows soundly adding some IntoBytes trait instances that would be useful even if the transmute is never actually executed. I'll let him fill out the details. :)

@CatsAreFluffy
Copy link

This sounds a lot like #286.

@RalfJung
Copy link
Member

RalfJung commented Mar 3, 2025

True, those are discussing the same thing.

@joshlf
Copy link

joshlf commented Mar 6, 2025

I wrote up an example use case in rust-lang/rust#137323 (comment), but the very brief TLDR is that we've designed zerocopy's API so that as many operations as possible "fall out naturally" from a base set of composable atoms. Having to special-case things makes it so that we can't express some operations that way, and as a result, it means that we have to either decide not to support certain operations, or instead create one-off APIs that don't compose with the rest of our machinery. Since most of zerocopy's internals are unsafe, changes take a long time to implement since we move very slowly to make sure we haven't made any mistakes. As a consequence, we often end up just not supporting these operations, despite having users who want them supported.

@RalfJung
Copy link
Member

RalfJung commented Mar 6, 2025 via email

@joshlf
Copy link

joshlf commented Mar 6, 2025

Yeah, we do intend to reflect provenance. But we'd prefer to reflect that "ptr-to-int is valid but strips provenance" rather than have to not support ptr-to-int because it's UB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants