Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: don't use Abi::ScalarPair for unions when padding is involved. #62298

Closed
wants to merge 1 commit into from

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Jul 2, 2019

See #60405 (comment) for the discussion (no issue was opened for that bug).
Assuming that we want to treat unions as opaque bags of bytes and preserve all of them uniformly, this PR disables the "scalar pair" optimization when it would cause some of those bytes to be ignored.

In the example given by @gnzlbg, (u8, u32), bytes 1, 2 and 3 are interior padding and Abi::ScalarPair, even if treated as aggregate by non-Rust calling conventions, will cause Rust copies of the whole value to only copy the u8 and u32 components separately, losing the original values in those 3 padding bytes (effectively replacing them with undef).

r? @rkruppe or @nagisa cc @RalfJung

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 2, 2019
@nagisa
Copy link
Member

nagisa commented Jul 2, 2019

The implementation seems fine but I’m not too pumped about the test. It seems like an extremely convulted way to test what’s being tested. I’m partial to codegen tests and there should probably also be a test which checks that ScalarPair still gets used where we want it to be used.

@eddyb
Copy link
Member Author

eddyb commented Jul 2, 2019

@nagisa The annoying thing about a codegen test is that the calling conventions can make the code wildly different, and if you rely on optimizations to normalize things, the bug can disappear.

FWIW, I've updated the test since, to be more thorough (it now tests both Rust and C ABIs, with more types, and every byte of every type is checked).

Also, ideally, I'd use something like the proposed LLVM instruction freeze, instead of black_box, since what I want is to be able to read any byte of the union without triggering UB if it's undef.

@Centril
Copy link
Contributor

Centril commented Jul 2, 2019

Assuming that we want to treat unions as opaque bags of bytes and preserve all of them uniformly, this PR disables the "scalar pair" optimization when it would cause some of those bytes to be ignored.

Note that this is by no means a settled matter and people shouldn't start assuming it is.

It seems to me that having the optimizations rather than disabling them is good in terms of having people not assume things because they just "seems to work". (In general, I think where rustc can be aggressive while preserving soundness, it should -- but that is not to say that this is easy.)

@RalfJung
Copy link
Member

RalfJung commented Jul 2, 2019

It seems to me that having the optimizations rather than disabling them is good in terms of having people not assume things because they just "seems to work". (In general, I think where rustc can be aggressive while preserving soundness, it should -- but that is not to say that this is easy.)

The way I see it, this is part of an experiment to see if the semantics of unions being a bag of bytes is even achievable.


fn main() {
// NOTE(eddyb) we can't error for the `extern "C"` ABI in these cases, as
// the x86_64 SysV calling convention will drop padding bytes inside unions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there no .stderr file then if we expect warnings to be shown?

Also, would be nice to still test the first N bytes that we assume not to be part of the padding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we capture runtime stderr, only rustc stderr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to mention that only some targets will hit that.

@Centril
Copy link
Contributor

Centril commented Jul 2, 2019

The way I see it, this is part of an experiment to see if the semantics of unions being a bag of bytes is even achievable.

@RalfJung I do like experimentation and data gathering! :) Could we perhaps add some explicit disclaimers around the code and the tests however to make the experimental nature clearer?

@RalfJung
Copy link
Member

RalfJung commented Jul 2, 2019

Sure, that seems fair.

@hanna-kruppe
Copy link
Contributor

So, considering rust-lang/unsafe-code-guidelines#156, do we still want to do this?

@RalfJung
Copy link
Member

I honestly don't know.^^ @eddyb @gnzlbg what difference does this make for rust-lang/unsafe-code-guidelines#156 ?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 16, 2019

Without this PR, #[repr(Rust)] union U { x: (u8, u32) } has 3 bytes of padding that are not copied when the union is copied / moved.

This means that code that assumes that these bytes are copied on move is broken.

This is more restrictive than copying these bytes, and I think it is the conservative thing to do until rust-lang/unsafe-code-guidelines#156 is resolved. That is, I'd rather not merge this.


For #[repr(C)] union U { x: (u8, u32) }, the issue was trickier. C does not copy the padding, so in a round trip Rust->C->Rust it cannot be preserved. The open question is whether it is even possible to try to copy the padding here in the Rust side, e.g., when dealing with extern "C" fn foo(U) -> U { ... } definitions, that are potentially only ever called from Rust code. I recall @eddyb mentioning that the calling convention should not interact with this, and that maybe we could give these the same layout as a [u8; size_of::<U>()] to work around that, but I don't recall the details of why this was hard to achieve.

I think all of this should inform rust-lang/unsafe-code-guidelines#156 , but we should pick there the semantics that make most sense for Rust first, and worry about how to implement them later.

@RalfJung
Copy link
Member

but we should pick there the semantics that make most sense for Rust first, and worry about how to implement them later.

I'd love to do that, and fight fiercely for "bag of bits no matter the repr", but that seems to me like ignoring reality if we come up with an unimplementable semantics...

@JohnTitor
Copy link
Member

Ping from triage: any updates? @eddyb

@JohnTitor
Copy link
Member

Ping from triage: @eddyb and @rkruppe, any updates on this?

@hanna-kruppe
Copy link
Contributor

This PR is sort-of blocked on the discussion of unions and paddings, and that discussion is currently stalled. We should probably close this PR and revisit when that discussion is settled. @RalfJung what do you think?

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2019

I don't have a good handle on the union situation. Does rust-lang/unsafe-code-guidelines#156 mean we already know the result of the experiment?

But anyway this PR only affects MaybeUninit-like unions, if I understood the code correctly? Seems fair to close then. We first have to figure out some way to describe what happens for general unions (or lobby with LLVM to make our desired semantics implementable).

IIRC @eddyb mentioned another experiment where we basically add a [u8; size_of::<T>()] variant to the union, to "teach" LLVM the semantics we want. That sounds like a more interesting experiment to me. And maybe that needs something like this PR to make MaybeUninit-like unions not "slip through"?

@hanna-kruppe
Copy link
Contributor

My understanding was that the main concern is that the desired "unions are bags of bits without padding" is not implementable at all, not just because C code may stomp over parts it considers padding, but also because some C ABIs (which pure Rust programs can use without any FFI or unsafe) may not even have a way to pass all the bytes of certain unions. The question to answer, then, is whether we want to give up on the dream entirely and have a less satisfying but reasonable uniform semantics of unions, or try to salvage unions-as-bags-of-bits in some subset of cases (presumably at the cost of more complicated semantics).

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2019

Ah, right. Yes. So an option might be to make repr(Rust) unions bags-of-bits and do something more messy for others, or so. But that's indeed a topic for the UCG thread.

I think we also nee more data on what exactly does (not) get preserved for unions with more than one variant in C.

@joelpalmer
Copy link

Ping from Triage. Looks like this PR is still "sort-of blocked". I am going to change the label to waiting-on-author since a reviewer was the last to respond here. @RalfJung @rkruppe

@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 12, 2019
@joelpalmer joelpalmer added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 19, 2019
@joelpalmer
Copy link

Ping from Triage: Closing due to inactivity @eddyb. Looks like changes or updates could still be in the works at some point. If so, please re-open when there are changes or updates. Thanks!

@joelpalmer joelpalmer closed this Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants