-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain more in depth what early and late bound generic parameters are #1732
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Bound vars and parameters | ||
|
||
## Early-bound parameters | ||
|
||
Early-bound parameters in rustc are identified by an index, stored in the | ||
[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes. | ||
The index counts from the outermost declaration in scope. This means that as you | ||
add more binders inside, the index doesn't change. | ||
|
||
For example, | ||
|
||
```rust,ignore | ||
trait Foo<T> { | ||
type Bar<U> = (Self, T, U); | ||
} | ||
``` | ||
|
||
Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a | ||
[`ParamTy`] with the index of `N`. | ||
|
||
In rustc, the [`Generics`] structure carries this information. So the | ||
[`Generics`] for `Bar` above would be just like for `U` and would indicate the | ||
'parent' generics of `Foo`, which declares `Self` and `T`. You can read more | ||
in [this chapter](./generics.md). | ||
|
||
[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html | ||
[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html | ||
[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html | ||
|
||
## Late-bound parameters | ||
|
||
Late-bound parameters in `rustc` are handled quite differently (they are also | ||
compiler-errors marked this conversation as resolved.
Show resolved
Hide resolved
|
||
specialized to lifetimes since, right now, only late-bound lifetimes are | ||
supported, though with GATs that has to change). We indicate their potential | ||
presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables | ||
BoxyUwU marked this conversation as resolved.
Show resolved
Hide resolved
|
||
there are at that binding level. This can only be determined by walking the | ||
type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be | ||
`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth showing an example with a debruijn index > 0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes but not in this PR because I don't really want to touch this too much until we sort out the naming of RegionKind's variants |
||
of the things bound within. | ||
|
||
Moreover, a reference to a late-bound lifetime is written `^0.a`: | ||
|
||
- The `0` is the index; it identifies that this lifetime is bound in the | ||
innermost binder (the `for`). | ||
- The `a` is the "name"; late-bound lifetimes in rustc are identified by a | ||
"name" -- the [`BoundRegionKind`] enum. This enum can contain a | ||
[`DefId`][defid] or it might have various "anonymous" numbered names. The | ||
latter arise from types like `fn(&u32, &u32)`, which are equivalent to | ||
something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those | ||
lifetimes must be generated. | ||
|
||
This setup of not knowing the full set of variables at a binding level has some | ||
advantages and some disadvantages. The disadvantage is that you must walk the | ||
type to find out what is bound at the given level and so forth. The advantage | ||
is primarily that, when constructing types from Rust syntax, if we encounter | ||
anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have | ||
to update the binder. | ||
|
||
[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html | ||
[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html | ||
[defid]: ./hir.html#identifiers-in-the-hir |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,107 +1,199 @@ | ||
# Early and Late Bound Variables | ||
# Early and Late Bound Parameter Definitions | ||
|
||
In Rust, item definitions (like `fn`) can often have generic parameters, which | ||
are always [_universally_ quantified][quant]. That is, if you have a function | ||
like | ||
Understanding this page likely requires a rudimentary understanding of higher ranked | ||
trait bounds/`for<'a>`and also what types such as `dyn for<'a> Trait<'a>` and | ||
`for<'a> fn(&'a u32)` mean. Reading [the nomincon chapter](https://doc.rust-lang.org/nomicon/hrtb.html) | ||
on HRTB may be useful for understanding this syntax. The meaning of `for<'a> fn(&'a u32)` | ||
is incredibly similar to the meaning of `T: for<'a> Trait<'a>`. | ||
|
||
If you are looking for information on the `RegionKind` variants `ReLateBound` and `ReEarlyBound` | ||
you should look at the section on [bound vars and params](./bound-vars-and-params.md). This section | ||
discusses what makes generic parameters on functions and closures late/early bound. Not the general | ||
concept of bound vars and generic parameters which `RegionKind` has named somewhat confusingly | ||
with this topic. | ||
|
||
## What does it mean for parameters to be early or late bound | ||
|
||
All function definitions conceptually have a zst (this is represented by `TyKind::FnDef` in rustc). | ||
BoxyUwU marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The only generics on this zst are the early bound parameters of the function definition. e.g. | ||
compiler-errors marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```rust | ||
fn foo<T>(x: T) { } | ||
fn foo<'a>(_: &'a u32) {} | ||
|
||
fn main() { | ||
let b = foo; | ||
// ^ `b` has type `FnDef(foo, [])` (no substs because `'a` is late bound) | ||
assert!(std::mem::size_of_val(&b) == 0); | ||
} | ||
``` | ||
|
||
this function is defined "for all T" (not "for some specific T", which would be | ||
[_existentially_ quantified][quant]). | ||
In order to call `b` the late bound parameters do need to be provided, these are inferred at the | ||
call site instead of when we refer to `foo`. | ||
```rust | ||
fn main() { | ||
let b = foo; | ||
let a: &'static u32 = &10; | ||
foo(a); | ||
// the lifetime argument for `'a` on `foo` is inferred at the callsite | ||
// the generic parameter `'a` on `foo` is inferred to `'static` here | ||
} | ||
``` | ||
|
||
Because late bound parameters are not part of the `FnDef`'s substs this allows us to prove trait | ||
bounds such as `F: for<'a> Fn(&'a u32)` where `F` is `foo`'s `FnDef`. e.g. | ||
```rust | ||
fn foo_early<'a, T: Trait<'a>>(_: &'a u32, _: T) {} | ||
fn foo_late<'a, T>(_: &'a u32, _: T) {} | ||
|
||
fn accepts_hr_func<F: for<'a> Fn(&'a u32, u32)>(_: F) {} | ||
|
||
fn main() { | ||
// doesnt work, the substituted bound is `for<'a> FnDef<'?0>: Fn(&'a u32, u32)` | ||
// `foo_early` only implements `for<'a> FnDef<'a>: Fn(&'a u32, u32)`- the lifetime | ||
// of the borrow in the function argument must be the same as the lifetime | ||
// on the `FnDef`. | ||
accepts_hr_func(foo_early); | ||
|
||
// works, the substituted bound is `for<'a> FnDef: Fn(&'a u32, u32)` | ||
accepts_hr_func(foo_late); | ||
} | ||
|
||
// the builtin `Fn` impls for `foo_early` and `foo_late` look something like: | ||
// `foo_early` | ||
impl<'a, T: Trait<'a>> Fn(&'a u32, T) for FooEarlyFnDef<'a, T> { ... } | ||
// `foo_late` | ||
impl<'a, T> Fn(&'a u32, T) for FooLateFnDef<T> { ... } | ||
|
||
``` | ||
|
||
Early bound parameters are present on the `FnDef`. Late bound generic parameters are not present | ||
on the `FnDef` but are instead constrained by the builtin `Fn*` impl. | ||
|
||
The same distinction applies to closures. Instead of `FnDef` we are talking about the anonymous | ||
closure type. Closures are [currently unsound](https://github.com/rust-lang/rust/issues/84366) in | ||
ways that are closely related to the distinction between early/late bound | ||
parameters (more on this later) | ||
|
||
The early/late boundness of generic parameters is only relevent for the desugaring of | ||
functions/closures into types with builtin `Fn*` impls. It does not make sense to talk about | ||
in other contexts. | ||
|
||
The `generics_of` query in rustc only contains early bound parameters. In this way it acts more | ||
like `generics_of(my_func)` is the generics for the FnDef than the generics provided to the function | ||
body although it's not clear to the author of this section if this was the actual justification for | ||
making `generics_of` behave this way. | ||
|
||
[quant]: ./appendix/background.md#quantified | ||
## What parameters are currently late bound | ||
|
||
While Rust *items* can be quantified over types, lifetimes, and constants, the | ||
types of values in Rust are only ever quantified over lifetimes. So you can | ||
have a type like `for<'a> fn(&'a u32)`, which represents a function pointer | ||
that takes a reference with any lifetime, or `for<'a> dyn Trait<'a>`, which is | ||
a `dyn` trait for a trait implemented for any lifetime; but we have no type | ||
like `for<T> fn(T)`, which would be a function that takes a value of *any type* | ||
as a parameter. This is a consequence of monomorphization -- to support a value | ||
of type `for<T> fn(T)`, we would need a single function pointer that can be | ||
used for a parameter of any type, but in Rust we generate customized code for | ||
each parameter type. | ||
Below are the current requirements for determining if a generic parameter is late bound. It is worth | ||
keeping in mind that these are not necessarily set in stone and it is almost certainly possible to | ||
be more flexible. | ||
|
||
One consequence of this asymmetry is a weird split in how we represent some | ||
generic types: _early-_ and _late-_ bound parameters. | ||
Basically, if we cannot represent a type (e.g. a universally quantified type), | ||
we have to bind it _early_ so that the unrepresentable type is never around. | ||
### Must be a lifetime parameter | ||
|
||
Consider the following example: | ||
Rust can't support types such as `for<T> dyn Trait<T>` or `for<T> fn(T)`, this is a | ||
fundamental limitation of the language as we are required to monomorphize type/const | ||
parameters and cannot do so behind dynamic dispatch. (technically we could probably | ||
support `for<T> dyn MarkerTrait<T>` as there is nothing to monomorphize) | ||
|
||
```rust,ignore | ||
fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... } | ||
Not being able to support `for<T> dyn Trait<T>` resulted in making all type and const | ||
parameters early bound. Only lifetime parameters can be late bound. | ||
|
||
### Must not appear in the where clauses | ||
|
||
In order for a generic parameter to be late bound it must not appear in any where clauses. | ||
This is currently an incredibly simplistic check that causes lifetimes to be early bound even | ||
if the where clause they appear in are always true, or implied by well formedness of function | ||
arguments. e.g. | ||
```rust | ||
fn foo1<'a: 'a>(_: &'a u32) {} | ||
// ^^ early bound parameter because it's in a `'a: 'a` clause | ||
// even though the bound obviously holds all the time | ||
fn foo2<'a, T: Trait<'a>(a: T, b: &'a u32) {} | ||
// ^^ early bound parameter because it's used in the `T: Trait<'a>` clause | ||
fn foo3<'a, T: 'a>(_: &'a T) {} | ||
// ^^ early bound parameter because it's used in the `T: 'a` clause | ||
// even though that bound is implied by wellformedness of `&'a T` | ||
fn foo4<'a, 'b: 'a>(_: Inv<&'a ()>, _: Inv<&'b ()>) {} | ||
// ^^ ^^ ^^^ note: | ||
// ^^ ^^ `Inv` stands for `Invariant` and is used to | ||
// ^^ ^^ make the the type parameter invariant. This | ||
// ^^ ^^ is necessary for demonstration purposes as | ||
// ^^ ^^ `for<'a, 'b> fn(&'a (), &'b ())` and | ||
// ^^ ^^ `for<'a> fn(&'a u32, &'a u32)` are subtypes- | ||
// ^^ ^^ of eachother which makes the bound trivially | ||
// ^^ ^^ satisfiable when making the fnptr. `Inv` | ||
// ^^ ^^ disables this subtyping. | ||
// ^^ ^^ | ||
// ^^^^^^ both early bound parameters because they are present in the | ||
// `'b: 'a` clause | ||
``` | ||
|
||
We cannot treat `'a`, `'b`, and `T` in the same way. Types in Rust can't have | ||
`for<T> { .. }`, only `for<'a> {...}`, so whenever you reference `foo` the type | ||
you get back can't be `for<'a, 'b, T> fn(&'a u32, y: &'b T)`. Instead, the `T` | ||
must be substituted early. In particular, you have: | ||
The reason for this requirement is that we cannot represent the `T: Trait<'a>` or `'a: 'b` clauses | ||
on a function pointer. `for<'a, 'b> fn(Inv<&'a ()>, Inv<&'b ()>)` is not a valid function pointer to | ||
represent`foo4` as it would allow calling the function without `'b: 'a` holding. | ||
|
||
```rust,ignore | ||
let x = foo; // T, 'b have to be substituted here | ||
x(...); // 'a substituted here, at the point of call | ||
x(...); // 'a substituted here with a different value | ||
### Must be constrained by where clauses or function argument types | ||
|
||
The builtin impls of the `Fn*` traits for closures and `FnDef`s cannot not have any unconstrained | ||
parameters. For example the following impl is illegal: | ||
```rust | ||
impl<'a> Trait for u32 { type Assoc = &'a u32; } | ||
``` | ||
We must not end up with a similar impl for the `Fn*` traits e.g. | ||
```rust | ||
impl<'a> Fn<()> for FnDef { type Assoc = &'a u32 } | ||
``` | ||
|
||
## Early-bound parameters | ||
Violating this rule can trivially lead to unsoundness as seen in [#84366](https://github.com/rust-lang/rust/issues/84366). | ||
Additionally if we ever support late bound type params then an impl like: | ||
```rust | ||
impl<T> Fn<()> for FnDef { type Assoc = T; } | ||
``` | ||
would break the compiler in various ways. | ||
|
||
Early-bound parameters in rustc are identified by an index, stored in the | ||
[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes. | ||
The index counts from the outermost declaration in scope. This means that as you | ||
add more binders inside, the index doesn't change. | ||
In order to ensure that everything functions correctly, we do not allow generic parameters to | ||
be late bound if it would result in a builtin impl that does not constrain all of the generic | ||
parameters on the builtin impl. Making a generic parameter be early bound trivially makes it be | ||
constrained by the builtin impl as it ends up on the self type. | ||
|
||
For example, | ||
Because of the requirement that late bound parameters must not appear in where clauses, checking | ||
this is simpler than the rules for checking impl headers constrain all the parameters on the impl. | ||
We only have to ensure that all late bound parameters appear at least once in the function argument | ||
types outside of an alias (e.g. an associated type). | ||
|
||
```rust,ignore | ||
trait Foo<T> { | ||
type Bar<U> = (Self, T, U); | ||
} | ||
The requirement that they not indirectly be in the substs of an alias for it to count is the | ||
same as why the follow code is forbidden: | ||
```rust | ||
impl<T: Trait> OtherTrait for <T as Trait>::Assoc { type Assoc = T } | ||
``` | ||
There is no guarantee that `<T as Trait>::Assoc` will normalize to different types for every | ||
instantiation of `T`. If we were to allow this impl we could get overlapping impls and the | ||
same is true of the builtin `Fn*` impls. | ||
|
||
## Making more generic parameters late bound | ||
|
||
It is generally considered desirable for more parameters to be late bound as it makes | ||
the builtin `Fn*` impls more flexible. Right now many of the requirements for making | ||
a parameter late bound are overly restrictive as they are tied to what we can currently | ||
(or can ever) do with fn ptrs. | ||
|
||
It would be theoretically possible to support late bound params in `where`-clauses in the | ||
language by introducing implication types which would allow us to express types such as: | ||
`for<'a, 'b: 'a> fn(Inv<&'a u32>, Inv<&'b u32>)` which would ensure `'b: 'a` is upheld when | ||
calling the function pointer. | ||
|
||
It would also be theoretically possible to support it by making the coercion to a fn ptr | ||
instantiate the parameter with an infer var while still allowing the FnDef to not have the | ||
generic parameter present as trait impls are perfectly capable of representing the where clauses | ||
on the function on the impl itself. This would also allow us to support late bound type/const | ||
vars allowing bounds like `F: for<T> Fn(T)` to hold. | ||
|
||
It is almost somewhat unclear if we can change the `Fn` traits to be structured differently | ||
so that we never have to make a parameter early bound just to make the builtin impl have all | ||
generics be constrained. Of all the possible causes of a generic parameter being early bound | ||
this seems the most difficult to remove. | ||
|
||
Whether these would be good ideas to implement is a separate question- they are only brought | ||
up to illustrate that the current rules are not necessarily set in stone and a result of | ||
"its the only way of doing this". | ||
|
||
Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a | ||
[`ParamTy`] with the index of `N`. | ||
|
||
In rustc, the [`Generics`] structure carries this information. So the | ||
[`Generics`] for `Bar` above would be just like for `U` and would indicate the | ||
'parent' generics of `Foo`, which declares `Self` and `T`. You can read more | ||
in [this chapter](./generics.md). | ||
|
||
[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html | ||
[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html | ||
[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html | ||
|
||
## Late-bound parameters | ||
|
||
Late-bound parameters in `rustc` are handled quite differently (they are also | ||
specialized to lifetimes since, right now, only late-bound lifetimes are | ||
supported, though with GATs that has to change). We indicate their potential | ||
presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables | ||
there are at that binding level. This can only be determined by walking the | ||
type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be | ||
`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names | ||
of the things bound within. | ||
|
||
Moreover, a reference to a late-bound lifetime is written `^0.a`: | ||
|
||
- The `0` is the index; it identifies that this lifetime is bound in the | ||
innermost binder (the `for`). | ||
- The `a` is the "name"; late-bound lifetimes in rustc are identified by a | ||
"name" -- the [`BoundRegionKind`] enum. This enum can contain a | ||
[`DefId`][defid] or it might have various "anonymous" numbered names. The | ||
latter arise from types like `fn(&u32, &u32)`, which are equivalent to | ||
something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those | ||
lifetimes must be generated. | ||
|
||
This setup of not knowing the full set of variables at a binding level has some | ||
advantages and some disadvantages. The disadvantage is that you must walk the | ||
type to find out what is bound at the given level and so forth. The advantage | ||
is primarily that, when constructing types from Rust syntax, if we encounter | ||
anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have | ||
to update the binder. | ||
|
||
[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html | ||
[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html | ||
[defid]: ./hir.html#identifiers-in-the-hir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth explicitly stating that
Generics
can be used to map from this index to a parameter definition.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably but also i dont really want to in this PR