Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for const-UB #58

Merged
merged 4 commits into from
Oct 26, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions rfcs/0000-const-ub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
- Feature Name: `const_ub`
- Start Date: 2020-10-10
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process.

# Motivation
[motivation]: #motivation

So far, nothing is specified about what happens when `unsafe` code leads to UB during CTFE.
This is a major blocker for stabilizing `unsafe` operations in const-contexts.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

There are some values that Rust needs to compute at compile-time.
This includes the initial value of a `const`/`static`, and array lengths (and more general, const generics).
Computing these initial values is called compile-time function evaluation (CTFE).
CTFE in Rust is very powerful and permits running almost arbitrary Rust code.
This begs the question, what happens when there is `unsafe` code and it causes Undefined Behavior (UB)?

The answer is that in this case, the final value that is currently being executed is arbitrary.
For example, when UB arises while computing an array length, then the final array length can be any `usize`, or it can be (partially) uninitialized memory.
No guarantees are made about this final value, and it can be different depending on host and target architecture, compiler flags, and more.
However, UB will not otherwise adversely affect the currently running compiler; type-checking and lints and everything else will work correctly given whatever the result of the CTFE computation is.

Note, however, that this means compile-time UB can later cause runtime UB when the program is actually executed:
for example, if there is UB while computing the initial value of a `Vec<i32>`, the result might be a completely invalid vector that causes UB at runtime when used in the program.

Sometimes, the compiler might be able to detect such problems and show an error or warning about CTFE computation having gone wrong (for example, the compiler might detect when the array length ends up being uninitialized).
But other times, this might not be the case -- UB is not reliably detected during CTFE.
This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another.
(This is in accordance with the general policy that unsound code is not subject to strict stability guarantees.)

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

When UB arises as part of CTFE, the result of this evaluation is an unspecified constant.
The compiler might be able to detect that UB occurred and raise an error or a warning, but this is not mandated, and absence of lints does not imply absence of UB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this section we should explain the various mechanisms for detecting UB and their respective "detection rate" of UB.

  1. Pointer dereferences
    • Dangling, null and allocation out of bounds pointers are reliably detected
    • Out of bounds pointers that are within the allocation but outside any valid object are not detected
    • Generally no guarantees are made for dereferencing arbitrary pointers with arbitrary types
  2. Uninitialized memory
    • Values may be backed by an uninitialized value, any use of that value will cause an error
    • Uninitialized memory in the final value of a constant is reliably detected, but intermediate computations may do arbitrary things
  3. Pointer comparisons
    • Pointers cannot be compared, this is prevented statically and dynamically.
    • The guaranteed_eq method on pointers allows fuzzy comparison of pointers, where a valid implementation is to always returns false and the code relying on the return value must still be valid if false were returned in all cases. It is only to be used for performance reasons.
  4. Intrinsics
    • Compiler intrinsics often have soundness rules for their arguments. e.g. exact_div for exact_div(x, y) results in UB where
      • x % y != 0 or,
      • y == 0 or,
      • x == T::MIN && y == -1
        but we reliably detect this.
    • All intrinsics must detect all the UB that they can cause.
    • Intrinsics that cannot have UB should be marked as safe
  5. transmute/union access/pointer dereferences
    • It is entirely up to the user to ensure that the results of type punning derefs or actual transmutes result in valid values and does not depend on repr(Rust) types or other unspecified layouts.

Copy link
Member Author

@RalfJung RalfJung Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of bounds pointers that are within the allocation but outside any valid object are not detected

I am not entirely sure what this even means, this is using C terminology ("object").

Values may be backed by an uninitialized value, any use of that value will cause an error

Here, "use" means strictly "actually performing an arithmetic or logic operation that this value as a input". A mere copy will not be detected (since we do not check validity).

Copy link
Member Author

@RalfJung RalfJung Oct 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this section we should explain the various mechanisms for detecting UB and their respective "detection rate" of UB.

I have to say this list seems to be awfully geared around what happens to be the current implementation... in particular if we consider RFCs as applying to all Rust compilers, this seems unnecessarily specific.

I'd say something much shorter, more like:

  • UB caused by inrinsics is detected.
  • Dereferencing dangling pointers is detected. (But misalignment is not!)
  • All the other UB listed here is not detected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure what this even means, this is using C terminology ("object").

I tried to say we don't detect references that point to uninitialized memory. Like the extra capacity in a vector could be accessed by using set_len wrongly and then indexing into the uninitialized parts.

Which kind of relates to your next point. Yes, I did mean actual operations, not just copying values.

I have to say this list seems to be awfully geared around what happens to be the current implementation... in particular if we consider RFCs as applying to all Rust compilers, this seems unnecessarily specific.

Do we want other Rust compilers to be more permissive? Your list is only missing uninit memory in final values and basic operations and pointer comparisons. While I get that pointer comparisons aren't really UB, they just aren't possible, at least doing math or so with uninitialized memory/undefined values should be UB. Or is there a difference between intrinsics and basic ops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right we could also mention "uninit memory occurring as an input to an arithmetic, logical or control-flow operation".

While I get that pointer comparisons aren't really UB, they just aren't possible, at least doing math or so with uninitialized memory/undefined values should be UB. Or is there a difference between intrinsics and basic ops?

I'd like not to touch on unconst operations in this RFC; that is a more tricky subject that I planned to leave for the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea ,but shouldn't we do the const roadmap blogpost first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we? Not sure how the blog post would block even proposing the RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't block it, it's more... I was under the impression that the lang team kinda expects the next const thing to be given to them to be the roadmap that they'd sign off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC was at least mentioned at the last meeting so I feel it is fine to be doing it in parallel. We'd still need this even after they signed off the road-map so I do not see a dependency either way here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true.


# Drawbacks
[drawbacks]: #drawbacks

This means UB during CTFE can silently "corrupt" the build in a way that the final program has UB when being executed
(but not more so than if the CTFE code would instead have been run at runtime).

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The most obvious alternative is to say that UB during CTFE will definitely be detected.
However, that is expensive and might even be impossible.
Even Miri does not currently detect all UB, and Miri is already performing many additional checks that would significantly slow down CTFE.
Furthermore, since optimizations can "hide" UB (an optimization can turn a program with UB into one without), this means we would have to run CTFE on unoptimized MIR.
And finally, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way.
In particular, this would mean we need to decide on an aliasing model before permitting raw pointers in CTFE.

Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB.
Basically this would mean that CTFE would be allowed to "leave its sandbox".
This would allow JIT'ing CTFE and running the resulting code unchecked.
While compiling untrusted code should only be done with care (including additional sandboxing), this seems like an unnecessary extra footgun.

# Prior art
[prior-art]: #prior-art

C++ requires compilers to detect UB in `constexpr`.
However, the fragment of C++ that is available to `constexpr` excludes pointer casts, pointer arithmetic (beyond array bounds), and union-based type punning, which makes such checks not very complicated and avoids most of the poorly specified parts of UB.

If we found a way to run CTFE on unoptimized MIR, then detecting UB for programs that do not use unions, `transmute`, or raw pointers is not very hard.
CTFE already has almost all the checks required for this, except for alignment checks which are disabled during CTFE.
(Disabling them was the easiest way forward to solve some issues around packed structs in patterns, but we could use a different solution and reinstate CTFE alignment checks.
The relevant code paths still exist for Miri.)

# Unresolved questions
[unresolved-questions]: #unresolved-questions

Currently none.

# Future possibilities
[future-possibilities]: #future-possibilities

This RFC provides an easy way forward for "unconst" operations, i.e., operations that are safe at run-time but not at compile-time.
Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time.
If this RFC were accepted, we could declare such operations UB during CTFE (and thus naturally they would only be permitted in an `unsafe` block).
This still leaves the door open for providing better guarantees in the future.