-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we / can we make MaybeUninit<T> always preserve all bytes of T (including padding)? #518
Comments
I'll add that a typed copy of an uninitialized variable is UB in C, so there's no need to promise any ABI for FFI compatibility, |
In C I think this is not true for char types. But yeah for most types you cannot pass them uninit by value.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
The PR that introduced the guarantees does not talk about padding, and it seems like that wasn't really understood back then. The t-lang minutes discussing this are lost to time and reorganizations, but it seems doubtful that such a consideration was raised. Discussions from 2018 raise a lack of real-world use cases for ABI compatibility, and I agree with such a sentiment in the present. I don't this this would be approved nowadays, but I am incredibly apprehensive about removing it. There are few places in the Rust documentation that use always for guarantees like this, and the use cases for some weird FFI thunks or bindings would be nigh-impossible to properly test with crater or similar... |
This comment was marked as off-topic.
This comment was marked as off-topic.
@carbotaniuman I think we should consider removing it. If we can't come up with any legitimate usecase, I think we should definitely remove it. I don't like going back on a promise like this, but if we don't have a usecase that could be broken by taking back this promise, then the chances that someone is affected should be very slim. @Diggsey thanks for explaining why you think this belongs in this thread. But I disagree. "MaybeUninit preserves provenance" is not relevant here. You will note that provenance does not appear in the issue description. Furthermore, provenance on CHERI works like it does everywhere else, so even if provenance were relevant, CHERI wouldn't change anything. It is true that you can write code with |
If we do agree to remove the guarantee, I expect it to break 0 uses in practice. My only other concern would be the performance impact of having to copy more bytes. It probably won't affect SIMD or buffers though, so I don't really think that's it's really an issue. |
I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using |
FTR, I use However, for compatibility with gcc/clang, they have to expose an ABI equal to the rountines using primitives. |
(And in general, I agree with @carbotaniuman - unless crater is testing all kinds of targets, I'm betting it primarily tests x86_64, where aggregate-of-one-field will get passed the same way as that one field*, so without using miri-crater, the ABI checks won't be found by crater. If the code is used on something like arm32 though, it's going to be very visibility broken) |
Yes, this is super hard to test for. I wonder if it's worth having a blog post asking people whether they need this guarantee...
Yes, concretely the proposal would be:
Or maybe "no padding" should be restricted a bit further, like "if |
Is the only motivation to backing up on that promise is the fact that this is a frequent source of confusion? Which benefits except clarity can Rust gain? |
We never intended |
I'm pretty sure this is still the case, but it might be worth it to enumerate things that ARE still allowed for this wrt FFI/ABI concerns. My primary use of |
Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility. In public communication we'll obviously spell out the details more than in internal discussion. ("Internal" not as in "private" but as in "among the team members and anyone else who's willing to participate".) |
I at the very least need target simd types as well - for floating-point types that aren't directly supported by rust (e.g. |
Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general. But that case would be covered by "types without padding", or we could explicitly mention the stdarch SIMD types (since they are all powers of 2). |
Not fully - you don't necessarily need to compile the rtlibs with lccc themselves, they're written in mostly portable rust, and quite deliberately. I'd like to be able to continue providing that guarantee.
You can also now see a formalization in reference#1545, as a note. |
This has caused an actual soundness bug now: rust-lang/rust#134713. I think we should seriously consider restricting the ABI compatibility guarantee to scalar and SIMD types. |
The issue with that is that will make it hard to represent "Maybe Initialized" aggregate types in ABIs. Footnotes
|
Yeah I don't think a by-value ABI-compatible "maybe init" aggregate is common enough to justify the constant stream of surprises and UB that this problem causes. I would suggest not designing your OS around such a facility. |
I had a longer post here that I since felt was too confrontational, but my thoughts have changed and I do not believe that solely changing this is justifiable given that this is not a breaking change for soundness, but merely to make the use of the API better for users. Unsafe Rust already has multiple sharp edges (SB/TB, Box noalias, provenance), and I feel like this is not a particularly sharp one once users understand it. I would also like to echo the alternative of a new type |
I might be missing something obvious, but isn't |
What properties does this type have? |
|
The issue is that the design isn't simply "Whether or not to use |
Sorry, I don't follow. Please strip all the detail that's unnecessary for this thread and focus on the actual question: passing some uninitialized data across the ABI. You seem to be saying you want the If it's just about leaving the data the pointer points to uninitialized, then that's entirely off-topic for this discussion as the ABI does not care about that.
We don't document this. But people still think it is true, I've myself thought it is true, and it is quite reasonable to think it is true. It's also a much more useful guarantee than the ABI compatibility one. We're still looking for even a single use-case that needs the ABI-compatibility for non-scalar types, whereas we already had a standard library correctness bug due to the lack of padding byte preservation. |
So the signature I need to actually call looks more like |
Uh, then, I'd suggest you call it with a Rust function pointer / declaration that has the matching number of parameters? It's anyway UB if caller and callee disagree on the number of parameters. |
Its the same vein as calling |
If you are anyway already causing technical UB by having the signature not match, you shouldn't be worried about venturing into "unspecified" land here either. Just have a version of |
Did these standard library implementations surface this bug before or after you recommended changing the implementations of these things to use MaybeUninit? |
That sounds like you are trying to insinuate something -- I don't know exactly what you are referring I don't know what I recommended when, but using Let's not discuss the history of rust-lang/rust#134713 here please; there's already a separate issue for that. The fact is that that implementation made it in (neither authored nor reviewed by me), which is a pretty clear sign that "bag of bytes" semantics are (a) useful, and (b) easy to assume to already be the current semantics even if they are not. |
Ffor the record here - the call that I'm worried about is going to be to the Rust trampoline - which does always have the same parameters. Then I call the actual But in the case of handler, the ABI level is very well-defined (even if the Rust Level isn't). Whereas wrapping |
Also I speak of a position of being involved in the discussion here - a third party having absolutely zero idea this is happening may have just as much cause to rely on a stable language guarantee. And given much of this code just won't even run in miri (miri won't even run winter-lily, which is my current project touching Lilium), this is probably in the realm of "Breaks silently, until it doesn't". |
Changing how pointers to MaybeUninit works does appear to be in the proposal, and MaybeUninit integers would also specifically not be affected, so I guess you're concerned that people are passing |
Yes I am proposing to take back a documented language guarantee, and replace it with a different, more useful guarantee. I think long-term this will cause less harm. So I was asking if there's any cases where the ABI guarantee is useful or even needed. I am still extremely confused about your example. You keep bringing up more and more concepts you're not explaining and there's too many parties calling each other so it's not even clear which call you are talking about when. When I think I understand is that there's a particular function call where the caller uses signature Could you achieve your goals without the ABI guarantee (and without worsening performance)? And if yes, would that solution be any less "natural" than what you are currently doing? Frankly, based on what you said so far, any possible alternative seems more natural to me. ;) |
(sorry for any confusion Ralf, but my own question was directed at chorman) |
Ok, I'll restate what the flow is:
|
@RalfJung You are the Pope of Rust, or at least the Pope of Rust Safety Models. Whenever you say something, even if you say something blatantly wrong, almost no one calls you out on it. You are the proxy author and reviewer of all std code because everyone is reading everything you are writing and thinking about it when writing unsafe code. You are repeatedly cited in these discussions. If you repeatedly say something wrong, you can convince other people it's true, simply by repeating the wrong thing. |
I do hope people call me out for my nonsense when I lose my marbles. ;) I am confident our libs reviewers don't just take my word as gospel.
Or should I make the argument that if the semantics are too surprising for the Pope, maybe we should change them? ;)
|
That is a reasonable stance, honestly. I just am not surprised incorrect code is written based on something you say again and again, and don't think it should be taken as evidence the confusion is that widespread if it might instead be your confusion spreading widely. |
I don't think I said this "again and again". But maybe I purged that from my memory?
|
Okay. It sounds to me like even if
However, I think we can do better. Even if That should be enough of a guarantee that you could keep doing what you're currently doing. Since you're setting up the call manually, all that matters is that there is some stable ABI, not what it is exactly. The only issue would be if (a) you are worried about breaking an existing stable ABI boundary (for your apparently still-in-development OS), and (b) the stable ABI rustc adopts for Beyond that, if …With all that said, I am definitely sympathetic to the alternative view that |
I think this breaking change is not justified by the stated goal, which is expressly not any inherent unsoundness, but just a desire to reduce a paper cut for unsafe Rust users. I think that's a great goal (and to be clear, I support it myself), but unsafe code is tricky, with many corner cases. Box noalias remains on the books today despite being a far larger footgun. We have not yet ruled out Stacked Borrows as an aliasing model! In addition, this creates a special case in the ABI compatibility rules. Windows until a few months ago, passed Making I think it's also disingenuous to act like the use cases here are contrived and useless - this is a documented language feature, not I am also confused as to why my compatibility ideas seemed to have been (intentionally?) ignored. To me, |
I think you will be hard-pressed to find a case where this makes a measurable difference, even if you specifically microbenchmark the function call. I'm neutral regarding the rest of your post. |
It took as like 2 weeks to even get to the bottom of the one use case that was brought up. And then it turns out it's not actually a use case for the ABI guarantee that is documented, but for a weaker guarantee that is entirely compatible with
We haven't yet seen a single use case for All evidence points towards |
I think this is an example of overindexing on the responses present. The vast majority of use cases for this will be low-level, or a workaround for some legacy code, or maybe to provide some potentially uninitialized data to an assembly function for math or similar. You can probably find issues (undocumented guarantees, weird code, not technically supported) with any or all of these potential use cases. I might even agree with those issues. I personally do have code running that uses this ABI guarantee, but I don't particularly care about how this issue resolves wrt to that code. If the capability goes away, I will just remove the As I have said, I am not saying that we should freeze But maybe it is decided that these capabilities are not worth keeping around. The obvious next step will be a crater run. Much of this code will not be present in crater. It may be private, internal, or using FFI, such that crater cannot really test it. I expect there to be ~0 breakage on said run. Such a number will not accurately reflect the breakage. And as the capabilities are taken away, there will no way to migrate without willfully invoking UB. Again, maybe that much breakage will be tolerated. We broke an inordinate amount of the ecosystem in the |
No fundamental capabilities are lost, as you can use a compound type of And note that for any
Even if you don't want to debate the "validity" of whatever hacks you needed to use, it would still be good to see an example of a case where you really do want the ABI for I really am sympathetic to being perfectly strict about avoiding language breaking changes. But this really does feel like a case of accidental stabilization that just makes things worse for everyone. Casting |
@carbotaniuman you are basically saying "trust me I have a usecase but I'm not interested in telling you about it". That's not a constructive contribution to this discussion. As you said, a crater run is not very useful. The next step is an RFC to get wider awareness of this proposal and make it more likely that if there is some usecase relying on this accidental guarantee out there, we will hear about it. |
Given your behavior to the other use case presented, I do not believe you are asking this in good faith. My main use cases are effectively what CAD97's described, where I have C (and assembly) code that returns various complex structs that are potentially uninitialized. I would justify this with ASM freeze, but that's verboten, so I am being technically correct by using |
Could it make sense to specify
Some version of this, if practical to specify and provide, would serve the needs of the existing code using Aside: C code which converts an uninitialized lvalue into an rvalue has undefined behavior by the standard for any type which is not excluded from having trap values (which is essentially LLVM's |
I am sorry you feel that way. I don't know what I could have done differently to avoid this. Maybe I could have been a bit more patient in how I extracted the details of the use case; I admit I was frustrated since the first explanations we were given were just not useful. But ultimately, if getting to the bottom of a technical question is considered acting in bad faith, we may as well stop having technical discussions altogether. I won't just take it on faith that someone has a use case that they are unwilling or unable to properly describe. And it turned out I was right in getting to the bottom of this, since " The proposed migration plan for cases like that is to move the
This is getting very close to the edge of my knowledge of ABI details, so I feel uncomfortable making definite statements here. Deciding ABI things without knowing enough about ABI is what got us into this situation in the first place. In particular, there's somewhat of a layering violation here: the Rust compiler and language, and the docs, don't really have a concept of which types would have an indirect ABI. So there's no proper way we could even set up that definition in the current framework -- we'd have to make that framework a lot more complicated first. |
I think this is misleading, and being able to spell out the struct as a bunch of In my opinion, I think that relies on the ABI compatibility guarantee that was made. My other alternative is a type that can represent this ABI ( Ultimately I no longer really have an opinion on this change - it will likely be rammed through anyways. I would like to say if we don't make it easy (or even possible) for people to do the use cases that they want in the correct way, I suspect that they will just not. Personally, it's looking like the best "migration" if this were to occur is simply willfully invoking UB, and I'll likely be doing that for my code in order to immunize myself against this change. |
One of the points Ralf is trying to make is that that's not a thing. If you're interoperating with raw assembly, then there is no uninit, but there is also no such thing as Is that your use case? If so, I can understand how going through assembly functions would be a pain, and the backwards-compat break is inherently a pain, but I don't understand how it becomes as huge a problem as you're suggesting. On the other hand, if you're interoperating with C, then there is such a thing as function signatures, but there is also such a thing as uninit and UB. Which gets into this awkward situation where there is a ton of C code that either (a) is UB but nobody cares, or (b) is not UB according to the spec but the compiler optimizes it as if it were UB. And Rust is trying to be stricter on that front. So perhaps you want to add Is that your use case? If so, then I actually do understand how this could potentially be a huge problem. The C side may be UB, but it works (presumably), and the function signatures may be baked into legacy code, so it makes sense to want to only use Though the scope of the nastiness is still unclear, since most C libraries don't do a lot of passing structs by value. Ultimately, I'm wildly speculating here, because you haven't explained your use case. You need to stop with the charged language and explain, or else we will all continue to not understand each other. |
I apologize, given the complexity of the project I have tried to give the guarantee I actually need, but I'll provide as much detail as possible here. The project I used for this has several layers, some of which are in C, some of which are in assembly, some of which are in a custom glorified macro assembler, and some of them are in a custom DSL written in C. The usage of these languages are pretty normal, with a portable C implementation alongside some custom assembly (and DSL) implementations to better take advantage of the hardware. The actual purpose of the library is DSP-y things, so performance is relatively important. The actual interface that a user would use is of course a relatively normal C interface. Given the maintainability of the tech stack however, it would be nice to move some of this to Rust. Unfortunately this exposes us to the bad internal interfaces. For instance, some callsites in C pass along something like:
These args are passed indirectly (I think on all ABI, but I'm decidedly not an ABI expert), and while they really should be a pointer, making that changes is not really easy given the current state of that codebase.
Sure, but ultimately an assembly function can fulfill(?) some ABI lowering such that it is the same as a
The C code is compiled with a legacy compiler that does not optimize usage of uninit memory, or else I doubt the code would currently be working.
Yes, this is basically the use-case, with some extra context given by the responses above.
Ultimately I think this is the main contention I have here. Writing Pragmatically, does it matter that I used |
It is a frequent source of confusion that
MaybeUninit<T>
is not just preserving all the underlying bytes of storage, but actually ifT
has padding then those bytes are lost on copies/moves ofMaybeUninit<T>
.This is currently pretty much a necessary consequence of the promise that
MaybeUninit<T>
is ABI-compatible withT
: some ABIs don't preserve the padding ofT
when it is passed to a function. However, this was not part of the intention withMaybeUninit
at all, it is something we discovered later.Maybe we should try to take this back, and make the guarantee only for types without padding?
I am not even sure why we made this a guarantee. We made the type
repr(transparent)
because for performance it is quite important thatMaybeUninit<$int>
becomes just aniN
in LLVM. But that doesn't require a stable guarantee. And in fact it seems like it would almost always be a bug if the caller and callee disagree about whether the value has to be initialized. So I would be curious about real-world examples where this guarantee is needed.The text was updated successfully, but these errors were encountered: