|
| 1 | +# `unsafe_aliased` |
| 2 | + |
| 3 | +- Feature Name: `unsafe_aliased` |
| 4 | +- Start Date: 2022-11-05 |
| 5 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 6 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 7 | + |
| 8 | +# Summary |
| 9 | +[summary]: #summary |
| 10 | + |
| 11 | +Add a type `UnsafeAliased` that acts on `&mut` in a similar way to how `UnsafeCell` acts on `&`: it opts-out of the aliasing guarantees. |
| 12 | +However, `&mut UnsafeAliased` can still be `mem::swap`ed, so this is not a free ticket for arbitrary aliasing of mutable references. |
| 13 | +Abstractions built on top of `UnsafeAliased` must ensure proper encapsulation when handing such aliases references to clients (e.g. via pinning). |
| 14 | + |
| 15 | +This type is then used in generator lowering, finally fixing [#63818](https://github.com/rust-lang/rust/issues/63818). |
| 16 | + |
| 17 | +# Motivation |
| 18 | +[motivation]: #motivation |
| 19 | + |
| 20 | +Let's say you want to write a type with a self-referential pointer: |
| 21 | + |
| 22 | +```rust |
| 23 | +#![feature(pin_macro)] |
| 24 | +#![feature(negative_impls)] |
| 25 | +use std::ptr; |
| 26 | +use std::pin::{pin, Pin}; |
| 27 | + |
| 28 | +pub struct S { |
| 29 | + data: i32, |
| 30 | + ptr_to_data: *mut i32, |
| 31 | +} |
| 32 | + |
| 33 | +impl !Unpin for S {} |
| 34 | + |
| 35 | +impl S { |
| 36 | + pub fn new() -> Self { |
| 37 | + S { data: 42, ptr_to_data: ptr::null_mut() } |
| 38 | + } |
| 39 | + |
| 40 | + pub fn get_data(self: Pin<&mut Self>) -> i32 { |
| 41 | + // SAFETY: We're not moving anything. |
| 42 | + let this = unsafe { Pin::get_unchecked_mut(self) }; |
| 43 | + if this.ptr_to_data.is_null() { |
| 44 | + this.ptr_to_data = ptr::addr_of_mut!(this.data); |
| 45 | + } |
| 46 | + // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field. |
| 47 | + unsafe { this.ptr_to_data.read() } |
| 48 | + } |
| 49 | + |
| 50 | + pub fn set_data(self: Pin<&mut Self>, data: i32) { |
| 51 | + // SAFETY: We're not moving anything. |
| 52 | + let this = unsafe { Pin::get_unchecked_mut(self) }; |
| 53 | + if this.ptr_to_data.is_null() { |
| 54 | + this.ptr_to_data = ptr::addr_of_mut!(this.data); |
| 55 | + } |
| 56 | + // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field. |
| 57 | + unsafe { this.ptr_to_data.write(data) } |
| 58 | + } |
| 59 | +} |
| 60 | + |
| 61 | +fn main() { |
| 62 | + let mut s = pin!(S::new()); |
| 63 | + s.as_mut().set_data(42); |
| 64 | + println!("{}", s.as_mut().get_data()); |
| 65 | +} |
| 66 | +``` |
| 67 | + |
| 68 | +This kind of code is implicitly generated by rustc all the time when an `async fn` has a local variable of reference type that is live across a yield point. |
| 69 | +The problem is that this code has UB under our aliasing rules: the `&mut S` inside the `self` argument of `get_data` aliases with `ptr_to_data`! |
| 70 | + |
| 71 | +This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what `async fn` generates -- also have UB under the LLVM aliasing rules. |
| 72 | + |
| 73 | +<details> |
| 74 | + |
| 75 | +<summary>A more complex variant</summary> |
| 76 | + |
| 77 | +```rust |
| 78 | +#![feature(pin_macro)] |
| 79 | +#![feature(negative_impls)] |
| 80 | +use std::ptr; |
| 81 | +use std::pin::{pin, Pin}; |
| 82 | +use std::task::Poll; |
| 83 | + |
| 84 | +pub struct S { |
| 85 | + state: i32, |
| 86 | + data: i32, |
| 87 | + ptr_to_data: *mut i32, |
| 88 | +} |
| 89 | + |
| 90 | +impl !Unpin for S {} |
| 91 | + |
| 92 | +impl S { |
| 93 | + pub fn new() -> Self { |
| 94 | + S { state: 0, data: 42, ptr_to_data: ptr::null_mut() } |
| 95 | + } |
| 96 | + |
| 97 | + fn poll(self: Pin<&mut Self>) -> Poll<()> { |
| 98 | + // SAFETY: We're not moving anything. |
| 99 | + let this = unsafe { Pin::get_unchecked_mut(self) }; |
| 100 | + match this.state { |
| 101 | + 0 => { |
| 102 | + // The first time, set up the pointer. |
| 103 | + this.ptr_to_data = ptr::addr_of_mut!(this.data); |
| 104 | + // Now yield. |
| 105 | + this.state += 1; |
| 106 | + Poll::Pending |
| 107 | + } |
| 108 | + 1 => { |
| 109 | + // After coming back from the yield, write to the pointer. |
| 110 | + unsafe { this.ptr_to_data.write(42) }; |
| 111 | + // And read our local variable `data`. |
| 112 | + // THIS IS UB! `this` is derived from the `noalias` pointer |
| 113 | + // `self` but we did a write to `this.data` in the previous |
| 114 | + // line when writing to `ptr_to_data`. The compiler is allowed |
| 115 | + // to reorder this and the previous line and then the output |
| 116 | + // would change. |
| 117 | + println!("{}", this.data); |
| 118 | + // Done! |
| 119 | + Poll::Ready(()) |
| 120 | + } |
| 121 | + _ => unreachable!(), |
| 122 | + } |
| 123 | + } |
| 124 | +} |
| 125 | + |
| 126 | +fn main() { |
| 127 | + let mut s = pin!(S::new()); |
| 128 | + while let Poll::Pending = s.as_mut().poll() {} |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +</details> |
| 133 | + |
| 134 | +<br> |
| 135 | + |
| 136 | +Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements. |
| 137 | +When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements. |
| 138 | +`Pin` is used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers), but there still is the problem that an `&mut Node` is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules. |
| 139 | + |
| 140 | +<br> |
| 141 | + |
| 142 | +The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB. |
| 143 | +We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations. |
| 144 | + |
| 145 | +# Guide-level explanation |
| 146 | +[guide-level-explanation]: #guide-level-explanation |
| 147 | + |
| 148 | +To write this code in a UB-free way, wrap the fields that are *targets* of self-referential pointers in an `UnsafeAliased`: |
| 149 | + |
| 150 | +```rust |
| 151 | +pub struct S { |
| 152 | + data: UnsafeAliased<i32>, // <!---- here |
| 153 | + ptr_to_data: *mut i32, |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +This lets the compiler know that mutable references to `data` might still have aliases, and so optimizations cannot assume that no aliases exist. That's entirely analogous to how `UnsafeCell` lets the compiler know that *shared* references to this field might undergo mutation. |
| 158 | + |
| 159 | +However, what is *not* analogous is that `&mut S`, when handed to safe code *you do not control*, must still be unique pointers! |
| 160 | +This is because of methods like `mem::swap` that can still assume that their two arguments are non-overlapping. |
| 161 | +(`mem::swap` on two `&mut UnsafeAliased<i32>` may soundly assume that they do not alias.) |
| 162 | +In other words, the safety invariant of `&mut S` still requires full ownership of the entire memory range `S` is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory. |
| 163 | +But again, this is a *safety invariant*; it only applies to safe code you do not control. You can write your own code handling `&mut S` and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine. |
| 164 | +You can also hand out a `Pin<&mut S>` to external safe code; the `Pin` wrapper imposes exactly the restrictions needed to ensure that this remains sound (e.g., it prevents `mem::swap`). |
| 165 | + |
| 166 | + |
| 167 | +# Reference-level explanation |
| 168 | +[reference-level-explanation]: #reference-level-explanation |
| 169 | + |
| 170 | +API sketch: |
| 171 | + |
| 172 | +```rust |
| 173 | +/// The type `UnsafeAliased<T>` lets unsafe code violate |
| 174 | +/// the rule that `&mut UnsafeAliased<T>` may never alias anything else. |
| 175 | +/// |
| 176 | +/// However, it is still very risky to have two `&mut UnsafeAliased<T>` that alias |
| 177 | +/// each other. For instance, passing those two references to `mem::swap` |
| 178 | +/// would cause UB. The general advice is to still have a single mutable |
| 179 | +/// reference, and then a bunch of raw pointers that alias it. |
| 180 | +/// If you want to pass such a reference to external safe code, |
| 181 | +/// you need to use techniques such as pinning (`Pin<&mut TypeWithUnsafeAliased>`) |
| 182 | +/// which ensure that both the mutable reference and its aliases remain valid. |
| 183 | +/// |
| 184 | +/// Further note that this does *not* lift the requirement that shared references |
| 185 | +/// must be read-only! Use `UnsafeCell` for that. |
| 186 | +#[lang = "unsafe_cell_mut"] |
| 187 | +#[repr(transparent)] |
| 188 | +struct UnsafeAliased<T: ?Sized> { |
| 189 | + value: T, |
| 190 | +} |
| 191 | + |
| 192 | +impl<T: ?Sized> !Send for UnsafeAliased<T> {} |
| 193 | + |
| 194 | +impl<T> UnsafeCell<T> { |
| 195 | + /// Constructs a new instance of `UnsafeCell` which will wrap the specified |
| 196 | + /// value. |
| 197 | + pub fn new(value: T) -> UnsafeAliased<T> { |
| 198 | + UnsafeAliased { value } |
| 199 | + } |
| 200 | + |
| 201 | + pub fn into_inner(self) -> T { |
| 202 | + self.value |
| 203 | + } |
| 204 | + |
| 205 | + /// Get read-write access to the contents of an `UnsafeAliased`. |
| 206 | + pub fn get_mut(&mut self) -> *mut T { |
| 207 | + self as *mut _ as *mut T |
| 208 | + } |
| 209 | + |
| 210 | + /// Get read-only access to the contents of a shared `UnsafeAliased`. |
| 211 | + /// Note that `&UnsafeAliased<T>` is read-only if `&T` is read-only. |
| 212 | + /// This means that if there is mutation of the `T`, future reads from the |
| 213 | + /// `*const T` returned here are UB! |
| 214 | + /// |
| 215 | + /// ```rust |
| 216 | + /// unsafe { |
| 217 | + /// let mut x = UnsafeAliased::new(0); |
| 218 | + /// let ref1 = &mut *addr_of_mut!(x); |
| 219 | + /// let ref2 = &mut *addr_of_mut!(x); |
| 220 | + /// let ptr = ref1.get(); // read-only pointer, assumes immutability |
| 221 | + /// ref2.get_mut.write(1); |
| 222 | + /// ptr.read(); // UB! |
| 223 | + /// } |
| 224 | + /// ``` |
| 225 | + pub fn get(&self) -> *const T { |
| 226 | + self as *const _ as *const T |
| 227 | + } |
| 228 | + |
| 229 | + pub fn raw_get_mut(this: *mut Self) -> *mut T { |
| 230 | + this as *mut T |
| 231 | + } |
| 232 | + |
| 233 | + pub fn raw_get(this: *const Self) -> *const T { |
| 234 | + this as *const T |
| 235 | + } |
| 236 | +} |
| 237 | +``` |
| 238 | + |
| 239 | +The comment about aliasing `&mut` being "risky" refers to the fact that their safety invariant still asserts exclusive ownership. |
| 240 | +This implies that `duplicate` in the following example, while not causing immediate UB, is still unsound: |
| 241 | + |
| 242 | +```rust |
| 243 | +pub struct S { |
| 244 | + data: UnsafeAliased<i32>, |
| 245 | +} |
| 246 | + |
| 247 | +impl S { |
| 248 | + fn new(x: i32) -> Self { |
| 249 | + S { data: UnsafeAliased::new(x) } |
| 250 | + } |
| 251 | + |
| 252 | + fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) { |
| 253 | + let s1 = unsafe { (&s).transmute_copy() }; |
| 254 | + let s2 = s; |
| 255 | + (s1, s2) |
| 256 | + } |
| 257 | +} |
| 258 | +``` |
| 259 | + |
| 260 | +The unsoundness is easily demonstrated by using safe code to cause UB: |
| 261 | + |
| 262 | +```rust |
| 263 | +let mut s = S::new(42); |
| 264 | +let (s1, s2) = s.duplicate(); // no UB |
| 265 | +mem::swap(s1, s2); // UB |
| 266 | +``` |
| 267 | + |
| 268 | +We could even soundly make `get_mut` return an `&mut T`, given that the safety invariant is not affected by `UnsafeAliased`. |
| 269 | +But that would probably not be useful and only cause confusion. |
| 270 | + |
| 271 | +Reference diff: |
| 272 | + |
| 273 | +```diff |
| 274 | + * Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped |
| 275 | +- [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`]. |
| 276 | ++ [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or |
| 277 | ++ the `&mut T` contains an [`UnsafeAliased<U>`]. |
| 278 | +``` |
| 279 | + |
| 280 | +Async generator lowering changes: |
| 281 | +- Fields that represent local variables whose address is taken across a yield point must be wrapped in `UnsafeAliased`. |
| 282 | + |
| 283 | +Codegen and Miri changes: |
| 284 | + |
| 285 | +- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`. |
| 286 | +- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.) |
| 287 | +- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.) |
| 288 | + |
| 289 | +### Comparison with some other types that affect aliasing |
| 290 | + |
| 291 | +- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell<T>` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell<T>` is also not special. |
| 292 | +- `UnsafeAliased`: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased<T>` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased<T>` is also not special. |
| 293 | +- [`MaybeDangling<T>`](https://github.com/rust-lang/rfcs/pull/3336): disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDanling<&[mut] T>` is special. `&[mut] MaybeDangling<T>` is not special at all and basically like `&[mut] T`. |
| 294 | + |
| 295 | +# Drawbacks |
| 296 | +[drawbacks]: #drawbacks |
| 297 | + |
| 298 | +It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with `UnsafeCell` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336). |
| 299 | +Furthermore, it is an extremely subtle wrapper type, as the `duplicate` example shows. |
| 300 | +`Unique` is a very strange trait, since it does not come with much of a safety requirement -- unlike `Freeze`, which quite clearly expresses some statement about the invariant of a type when shared, it is much less clear what the corresponding statement would be for `Unique`. (More work on RustBelt is needed to determine the details here.) |
| 301 | + |
| 302 | +# Rationale and alternatives |
| 303 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 304 | + |
| 305 | +The proposal in this RFC matches [what was discussed with the lang team a long time ago](https://github.com/rust-lang/rust/issues/63818#issuecomment-526579977). |
| 306 | + |
| 307 | +However, of course one could imagine alternatives: |
| 308 | + |
| 309 | +- Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is `Unpin`. This is unsatisfying: `Unpin` was never meant to have this job. A consequence is that a stray `impl Unpin` on a `Wrapper<T>`-style type can [lead to subtle miscompilations](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484/15) since it re-adds aliasing requirements for the inner `T`. Contrast this with the `UnsafeCell` situation, where it is not possible for (stable) code to just `impl Freeze for T` in the wrong way -- `UnsafeCell` is *always* recognized by the compiler. |
| 310 | + |
| 311 | + On the other hand, `UnsafeAliased` is rather quirky in its behavior and having two marker traits (`Unique` and `Unpin`) might be too confusing, so sticking with `Unpin` might not be too bad in comparison. |
| 312 | + |
| 313 | + If we do that, however, it seems preferrable to transition `Unpin` to an unsafe trait. There *is* a clear statement about the types' invariants associated with `Unpin`, so an `impl Unpin` already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don't have safe field projections on `Pin`. That said, solving this also requires solving the trouble around `Drop` and `Pin`, where effectively an `impl Drop` does an implicit `get_mut_unchecked`, i.e., implicitly assumes the type is `Unpin`. |
| 314 | +- `UnsafeAliased` could affect aliasing guarantees *both* on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing `&mut UnsafeAliased<T>` is cast or coerced to `&UnsafeAliased<T>`: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than `UnsafeCell` in the sense that replacing `UnsafeCell` by `UnsafeAliased` would always be correct. (Under the RFC as written, `UnsafeCell` and `UnsafeAliased` can be nested to remove aliasing requirements from both shared and mutable references.) |
| 315 | + |
| 316 | + If we don't do this, we could consider removing `get` since since it seems too much like a foot-gun. But that makes shared references to `UnsafeAliased` fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are. |
| 317 | +- Instead of introducing a new type, we could say that `UnsafeCell` affects *both* shared and mutable references. That would lose some optimization potential on types like `&mut Cell<T>`, but would avoid the footgun of coercing an `&mut UnsafeAliased<T>` to `&UnsafeAliased<T>`. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is [able to do detect such issues](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aab417b535f7dbd266fbfe470ea208c7)). |
| 318 | + |
| 319 | + |
| 320 | +# Prior art |
| 321 | +[prior-art]: #prior-art |
| 322 | + |
| 323 | +This is somewhat like `UnsafeCell`, but for mutable instead of shared references. |
| 324 | + |
| 325 | +# Unresolved questions |
| 326 | +[unresolved-questions]: #unresolved-questions |
| 327 | + |
| 328 | +- How do we transition code that relies on `Unpin` opting-out of aliasing guarantees, to the new type? Futures and generators just need a compiler patch, but there is probably other code that needs adjusting (e.g., Rust-for-Linux uses pinning to handle all sorts of self-referntial things in the Linux Kernel). Note that all such code is explicitly unsupported right now; the `Unpin` loophole has always explicitly been declared as temporary, unstable, and not something that we promise will actually work. |
| 329 | +- The name of the type needs to be bikeshed. `UnsafeAliased` might be too close to `UnsafeCell`, but that is a deliberate choice to indicate that this type has an effect when it appears in the *pointee*, unlike types like `MaybeUninit` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336) that have an effect when wrapped around the *pointer*. Like `UnsafeCell`, the aliasing allowed here is "interior". Other possible names: `UnsafeSelfReferential`, `UnsafePinned`, ... |
| 330 | +- Relatedly, in which module should this type live? |
| 331 | +- Should this type `derive(Copy)`? `UnsafeCell` does not, which is unfortunate because it now means some people might use `T: Copy` as indication that there is no `UnsafeCell` in `T`. |
| 332 | +- `Unpin` [also affects the `dereferenceable` attribute](https://github.com/rust-lang/rust/pull/106180), so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back `dereferenceable` when better semantics for it materialize on the LLVM side? |
| 333 | + |
| 334 | +# Future possibilities |
| 335 | +[future-possibilities]: #future-possibilities |
| 336 | + |
| 337 | +- Similar to how we might want the ability to project from `&UnsafeCell<Struct>` to `&UnsafeCell<Field>`, the same kind of projection could also be interesting for `UnsafeAliased`. |
0 commit comments