Skip to content

Commit 018a865

Browse files
committed
UnsafeAliased: port text from hackmd
1 parent ed4c592 commit 018a865

File tree

1 file changed

+337
-0
lines changed

1 file changed

+337
-0
lines changed

text/0000-unsafe-aliased.md

+337
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# `unsafe_aliased`
2+
3+
- Feature Name: `unsafe_aliased`
4+
- Start Date: 2022-11-05
5+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
6+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
7+
8+
# Summary
9+
[summary]: #summary
10+
11+
Add a type `UnsafeAliased` that acts on `&mut` in a similar way to how `UnsafeCell` acts on `&`: it opts-out of the aliasing guarantees.
12+
However, `&mut UnsafeAliased` can still be `mem::swap`ed, so this is not a free ticket for arbitrary aliasing of mutable references.
13+
Abstractions built on top of `UnsafeAliased` must ensure proper encapsulation when handing such aliases references to clients (e.g. via pinning).
14+
15+
This type is then used in generator lowering, finally fixing [#63818](https://github.com/rust-lang/rust/issues/63818).
16+
17+
# Motivation
18+
[motivation]: #motivation
19+
20+
Let's say you want to write a type with a self-referential pointer:
21+
22+
```rust
23+
#![feature(pin_macro)]
24+
#![feature(negative_impls)]
25+
use std::ptr;
26+
use std::pin::{pin, Pin};
27+
28+
pub struct S {
29+
data: i32,
30+
ptr_to_data: *mut i32,
31+
}
32+
33+
impl !Unpin for S {}
34+
35+
impl S {
36+
pub fn new() -> Self {
37+
S { data: 42, ptr_to_data: ptr::null_mut() }
38+
}
39+
40+
pub fn get_data(self: Pin<&mut Self>) -> i32 {
41+
// SAFETY: We're not moving anything.
42+
let this = unsafe { Pin::get_unchecked_mut(self) };
43+
if this.ptr_to_data.is_null() {
44+
this.ptr_to_data = ptr::addr_of_mut!(this.data);
45+
}
46+
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
47+
unsafe { this.ptr_to_data.read() }
48+
}
49+
50+
pub fn set_data(self: Pin<&mut Self>, data: i32) {
51+
// SAFETY: We're not moving anything.
52+
let this = unsafe { Pin::get_unchecked_mut(self) };
53+
if this.ptr_to_data.is_null() {
54+
this.ptr_to_data = ptr::addr_of_mut!(this.data);
55+
}
56+
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
57+
unsafe { this.ptr_to_data.write(data) }
58+
}
59+
}
60+
61+
fn main() {
62+
let mut s = pin!(S::new());
63+
s.as_mut().set_data(42);
64+
println!("{}", s.as_mut().get_data());
65+
}
66+
```
67+
68+
This kind of code is implicitly generated by rustc all the time when an `async fn` has a local variable of reference type that is live across a yield point.
69+
The problem is that this code has UB under our aliasing rules: the `&mut S` inside the `self` argument of `get_data` aliases with `ptr_to_data`!
70+
71+
This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what `async fn` generates -- also have UB under the LLVM aliasing rules.
72+
73+
<details>
74+
75+
<summary>A more complex variant</summary>
76+
77+
```rust
78+
#![feature(pin_macro)]
79+
#![feature(negative_impls)]
80+
use std::ptr;
81+
use std::pin::{pin, Pin};
82+
use std::task::Poll;
83+
84+
pub struct S {
85+
state: i32,
86+
data: i32,
87+
ptr_to_data: *mut i32,
88+
}
89+
90+
impl !Unpin for S {}
91+
92+
impl S {
93+
pub fn new() -> Self {
94+
S { state: 0, data: 42, ptr_to_data: ptr::null_mut() }
95+
}
96+
97+
fn poll(self: Pin<&mut Self>) -> Poll<()> {
98+
// SAFETY: We're not moving anything.
99+
let this = unsafe { Pin::get_unchecked_mut(self) };
100+
match this.state {
101+
0 => {
102+
// The first time, set up the pointer.
103+
this.ptr_to_data = ptr::addr_of_mut!(this.data);
104+
// Now yield.
105+
this.state += 1;
106+
Poll::Pending
107+
}
108+
1 => {
109+
// After coming back from the yield, write to the pointer.
110+
unsafe { this.ptr_to_data.write(42) };
111+
// And read our local variable `data`.
112+
// THIS IS UB! `this` is derived from the `noalias` pointer
113+
// `self` but we did a write to `this.data` in the previous
114+
// line when writing to `ptr_to_data`. The compiler is allowed
115+
// to reorder this and the previous line and then the output
116+
// would change.
117+
println!("{}", this.data);
118+
// Done!
119+
Poll::Ready(())
120+
}
121+
_ => unreachable!(),
122+
}
123+
}
124+
}
125+
126+
fn main() {
127+
let mut s = pin!(S::new());
128+
while let Poll::Pending = s.as_mut().poll() {}
129+
}
130+
```
131+
132+
</details>
133+
134+
<br>
135+
136+
Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements.
137+
When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements.
138+
`Pin` is used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers), but there still is the problem that an `&mut Node` is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules.
139+
140+
<br>
141+
142+
The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB.
143+
We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.
144+
145+
# Guide-level explanation
146+
[guide-level-explanation]: #guide-level-explanation
147+
148+
To write this code in a UB-free way, wrap the fields that are *targets* of self-referential pointers in an `UnsafeAliased`:
149+
150+
```rust
151+
pub struct S {
152+
data: UnsafeAliased<i32>, // <!---- here
153+
ptr_to_data: *mut i32,
154+
}
155+
```
156+
157+
This lets the compiler know that mutable references to `data` might still have aliases, and so optimizations cannot assume that no aliases exist. That's entirely analogous to how `UnsafeCell` lets the compiler know that *shared* references to this field might undergo mutation.
158+
159+
However, what is *not* analogous is that `&mut S`, when handed to safe code *you do not control*, must still be unique pointers!
160+
This is because of methods like `mem::swap` that can still assume that their two arguments are non-overlapping.
161+
(`mem::swap` on two `&mut UnsafeAliased<i32>` may soundly assume that they do not alias.)
162+
In other words, the safety invariant of `&mut S` still requires full ownership of the entire memory range `S` is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory.
163+
But again, this is a *safety invariant*; it only applies to safe code you do not control. You can write your own code handling `&mut S` and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.
164+
You can also hand out a `Pin<&mut S>` to external safe code; the `Pin` wrapper imposes exactly the restrictions needed to ensure that this remains sound (e.g., it prevents `mem::swap`).
165+
166+
167+
# Reference-level explanation
168+
[reference-level-explanation]: #reference-level-explanation
169+
170+
API sketch:
171+
172+
```rust
173+
/// The type `UnsafeAliased<T>` lets unsafe code violate
174+
/// the rule that `&mut UnsafeAliased<T>` may never alias anything else.
175+
///
176+
/// However, it is still very risky to have two `&mut UnsafeAliased<T>` that alias
177+
/// each other. For instance, passing those two references to `mem::swap`
178+
/// would cause UB. The general advice is to still have a single mutable
179+
/// reference, and then a bunch of raw pointers that alias it.
180+
/// If you want to pass such a reference to external safe code,
181+
/// you need to use techniques such as pinning (`Pin<&mut TypeWithUnsafeAliased>`)
182+
/// which ensure that both the mutable reference and its aliases remain valid.
183+
///
184+
/// Further note that this does *not* lift the requirement that shared references
185+
/// must be read-only! Use `UnsafeCell` for that.
186+
#[lang = "unsafe_cell_mut"]
187+
#[repr(transparent)]
188+
struct UnsafeAliased<T: ?Sized> {
189+
value: T,
190+
}
191+
192+
impl<T: ?Sized> !Send for UnsafeAliased<T> {}
193+
194+
impl<T> UnsafeCell<T> {
195+
/// Constructs a new instance of `UnsafeCell` which will wrap the specified
196+
/// value.
197+
pub fn new(value: T) -> UnsafeAliased<T> {
198+
UnsafeAliased { value }
199+
}
200+
201+
pub fn into_inner(self) -> T {
202+
self.value
203+
}
204+
205+
/// Get read-write access to the contents of an `UnsafeAliased`.
206+
pub fn get_mut(&mut self) -> *mut T {
207+
self as *mut _ as *mut T
208+
}
209+
210+
/// Get read-only access to the contents of a shared `UnsafeAliased`.
211+
/// Note that `&UnsafeAliased<T>` is read-only if `&T` is read-only.
212+
/// This means that if there is mutation of the `T`, future reads from the
213+
/// `*const T` returned here are UB!
214+
///
215+
/// ```rust
216+
/// unsafe {
217+
/// let mut x = UnsafeAliased::new(0);
218+
/// let ref1 = &mut *addr_of_mut!(x);
219+
/// let ref2 = &mut *addr_of_mut!(x);
220+
/// let ptr = ref1.get(); // read-only pointer, assumes immutability
221+
/// ref2.get_mut.write(1);
222+
/// ptr.read(); // UB!
223+
/// }
224+
/// ```
225+
pub fn get(&self) -> *const T {
226+
self as *const _ as *const T
227+
}
228+
229+
pub fn raw_get_mut(this: *mut Self) -> *mut T {
230+
this as *mut T
231+
}
232+
233+
pub fn raw_get(this: *const Self) -> *const T {
234+
this as *const T
235+
}
236+
}
237+
```
238+
239+
The comment about aliasing `&mut` being "risky" refers to the fact that their safety invariant still asserts exclusive ownership.
240+
This implies that `duplicate` in the following example, while not causing immediate UB, is still unsound:
241+
242+
```rust
243+
pub struct S {
244+
data: UnsafeAliased<i32>,
245+
}
246+
247+
impl S {
248+
fn new(x: i32) -> Self {
249+
S { data: UnsafeAliased::new(x) }
250+
}
251+
252+
fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
253+
let s1 = unsafe { (&s).transmute_copy() };
254+
let s2 = s;
255+
(s1, s2)
256+
}
257+
}
258+
```
259+
260+
The unsoundness is easily demonstrated by using safe code to cause UB:
261+
262+
```rust
263+
let mut s = S::new(42);
264+
let (s1, s2) = s.duplicate(); // no UB
265+
mem::swap(s1, s2); // UB
266+
```
267+
268+
We could even soundly make `get_mut` return an `&mut T`, given that the safety invariant is not affected by `UnsafeAliased`.
269+
But that would probably not be useful and only cause confusion.
270+
271+
Reference diff:
272+
273+
```diff
274+
* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
275+
- [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
276+
+ [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
277+
+ the `&mut T` contains an [`UnsafeAliased<U>`].
278+
```
279+
280+
Async generator lowering changes:
281+
- Fields that represent local variables whose address is taken across a yield point must be wrapped in `UnsafeAliased`.
282+
283+
Codegen and Miri changes:
284+
285+
- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`.
286+
- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.)
287+
- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.)
288+
289+
### Comparison with some other types that affect aliasing
290+
291+
- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell<T>` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell<T>` is also not special.
292+
- `UnsafeAliased`: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased<T>` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased<T>` is also not special.
293+
- [`MaybeDangling<T>`](https://github.com/rust-lang/rfcs/pull/3336): disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDanling<&[mut] T>` is special. `&[mut] MaybeDangling<T>` is not special at all and basically like `&[mut] T`.
294+
295+
# Drawbacks
296+
[drawbacks]: #drawbacks
297+
298+
It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with `UnsafeCell` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
299+
Furthermore, it is an extremely subtle wrapper type, as the `duplicate` example shows.
300+
`Unique` is a very strange trait, since it does not come with much of a safety requirement -- unlike `Freeze`, which quite clearly expresses some statement about the invariant of a type when shared, it is much less clear what the corresponding statement would be for `Unique`. (More work on RustBelt is needed to determine the details here.)
301+
302+
# Rationale and alternatives
303+
[rationale-and-alternatives]: #rationale-and-alternatives
304+
305+
The proposal in this RFC matches [what was discussed with the lang team a long time ago](https://github.com/rust-lang/rust/issues/63818#issuecomment-526579977).
306+
307+
However, of course one could imagine alternatives:
308+
309+
- Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is `Unpin`. This is unsatisfying: `Unpin` was never meant to have this job. A consequence is that a stray `impl Unpin` on a `Wrapper<T>`-style type can [lead to subtle miscompilations](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484/15) since it re-adds aliasing requirements for the inner `T`. Contrast this with the `UnsafeCell` situation, where it is not possible for (stable) code to just `impl Freeze for T` in the wrong way -- `UnsafeCell` is *always* recognized by the compiler.
310+
311+
On the other hand, `UnsafeAliased` is rather quirky in its behavior and having two marker traits (`Unique` and `Unpin`) might be too confusing, so sticking with `Unpin` might not be too bad in comparison.
312+
313+
If we do that, however, it seems preferrable to transition `Unpin` to an unsafe trait. There *is* a clear statement about the types' invariants associated with `Unpin`, so an `impl Unpin` already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don't have safe field projections on `Pin`. That said, solving this also requires solving the trouble around `Drop` and `Pin`, where effectively an `impl Drop` does an implicit `get_mut_unchecked`, i.e., implicitly assumes the type is `Unpin`.
314+
- `UnsafeAliased` could affect aliasing guarantees *both* on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing `&mut UnsafeAliased<T>` is cast or coerced to `&UnsafeAliased<T>`: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than `UnsafeCell` in the sense that replacing `UnsafeCell` by `UnsafeAliased` would always be correct. (Under the RFC as written, `UnsafeCell` and `UnsafeAliased` can be nested to remove aliasing requirements from both shared and mutable references.)
315+
316+
If we don't do this, we could consider removing `get` since since it seems too much like a foot-gun. But that makes shared references to `UnsafeAliased` fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are.
317+
- Instead of introducing a new type, we could say that `UnsafeCell` affects *both* shared and mutable references. That would lose some optimization potential on types like `&mut Cell<T>`, but would avoid the footgun of coercing an `&mut UnsafeAliased<T>` to `&UnsafeAliased<T>`. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is [able to do detect such issues](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aab417b535f7dbd266fbfe470ea208c7)).
318+
319+
320+
# Prior art
321+
[prior-art]: #prior-art
322+
323+
This is somewhat like `UnsafeCell`, but for mutable instead of shared references.
324+
325+
# Unresolved questions
326+
[unresolved-questions]: #unresolved-questions
327+
328+
- How do we transition code that relies on `Unpin` opting-out of aliasing guarantees, to the new type? Futures and generators just need a compiler patch, but there is probably other code that needs adjusting (e.g., Rust-for-Linux uses pinning to handle all sorts of self-referntial things in the Linux Kernel). Note that all such code is explicitly unsupported right now; the `Unpin` loophole has always explicitly been declared as temporary, unstable, and not something that we promise will actually work.
329+
- The name of the type needs to be bikeshed. `UnsafeAliased` might be too close to `UnsafeCell`, but that is a deliberate choice to indicate that this type has an effect when it appears in the *pointee*, unlike types like `MaybeUninit` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336) that have an effect when wrapped around the *pointer*. Like `UnsafeCell`, the aliasing allowed here is "interior". Other possible names: `UnsafeSelfReferential`, `UnsafePinned`, ...
330+
- Relatedly, in which module should this type live?
331+
- Should this type `derive(Copy)`? `UnsafeCell` does not, which is unfortunate because it now means some people might use `T: Copy` as indication that there is no `UnsafeCell` in `T`.
332+
- `Unpin` [also affects the `dereferenceable` attribute](https://github.com/rust-lang/rust/pull/106180), so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back `dereferenceable` when better semantics for it materialize on the LLVM side?
333+
334+
# Future possibilities
335+
[future-possibilities]: #future-possibilities
336+
337+
- Similar to how we might want the ability to project from `&UnsafeCell<Struct>` to `&UnsafeCell<Field>`, the same kind of projection could also be interesting for `UnsafeAliased`.

0 commit comments

Comments
 (0)