Skip to content

Commit 28b7fab

Browse files
authored
Merge pull request rust-lang#31 from mark-i-m/markim_readme_01
Copy type inference readme
2 parents 6ccb3b0 + f47633d commit 28b7fab

File tree

1 file changed

+226
-0
lines changed

1 file changed

+226
-0
lines changed

src/type-inference.md

+226
Original file line numberDiff line numberDiff line change
@@ -1 +1,227 @@
11
# Type inference
2+
3+
The type inference is based on standard HM-type inference, but
4+
extended in various way to accommodate subtyping, region inference,
5+
and higher-ranked types.
6+
7+
## A note on terminology
8+
9+
We use the notation `?T` to refer to inference variables, also called
10+
existential variables.
11+
12+
We use the term "region" and "lifetime" interchangeably. Both refer to
13+
the `'a` in `&'a T`.
14+
15+
The term "bound region" refers to regions bound in a function
16+
signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
17+
"free" if it is not bound.
18+
19+
## Creating an inference context
20+
21+
You create and "enter" an inference context by doing something like
22+
the following:
23+
24+
```rust
25+
tcx.infer_ctxt().enter(|infcx| {
26+
// use the inference context `infcx` in here
27+
})
28+
```
29+
30+
Each inference context creates a short-lived type arena to store the
31+
fresh types and things that it will create, as described in
32+
[the README in the ty module][ty-readme]. This arena is created by the `enter`
33+
function and disposed after it returns.
34+
35+
[ty-readme]: src/librustc/ty/README.md
36+
37+
Within the closure, the infcx will have the type `InferCtxt<'cx, 'gcx,
38+
'tcx>` for some fresh `'cx` and `'tcx` -- the latter corresponds to
39+
the lifetime of this temporary arena, and the `'cx` is the lifetime of
40+
the `InferCtxt` itself. (Again, see [that ty README][ty-readme] for
41+
more details on this setup.)
42+
43+
The `tcx.infer_ctxt` method actually returns a build, which means
44+
there are some kinds of configuration you can do before the `infcx` is
45+
created. See `InferCtxtBuilder` for more information.
46+
47+
## Inference variables
48+
49+
The main purpose of the inference context is to house a bunch of
50+
**inference variables** -- these represent types or regions whose precise
51+
value is not yet known, but will be uncovered as we perform type-checking.
52+
53+
If you're familiar with the basic ideas of unification from H-M type
54+
systems, or logic languages like Prolog, this is the same concept. If
55+
you're not, you might want to read a tutorial on how H-M type
56+
inference works, or perhaps this blog post on
57+
[unification in the Chalk project].
58+
59+
[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/
60+
61+
All told, the inference context stores four kinds of inference variables as of this
62+
writing:
63+
64+
- Type variables, which come in three varieties:
65+
- General type variables (the most common). These can be unified with any type.
66+
- Integral type variables, which can only be unified with an integral type, and
67+
arise from an integer literal expression like `22`.
68+
- Float type variables, which can only be unified with a float type, and
69+
arise from a float literal expression like `22.0`.
70+
- Region variables, which represent lifetimes, and arise all over the dang place.
71+
72+
All the type variables work in much the same way: you can create a new
73+
type variable, and what you get is `Ty<'tcx>` representing an
74+
unresolved type `?T`. Then later you can apply the various operations
75+
that the inferencer supports, such as equality or subtyping, and it
76+
will possibly **instantiate** (or **bind**) that `?T` to a specific
77+
value as a result.
78+
79+
The region variables work somewhat differently, and are described
80+
below in a separate section.
81+
82+
## Enforcing equality / subtyping
83+
84+
The most basic operations you can perform in the type inferencer is
85+
**equality**, which forces two types `T` and `U` to be the same. The
86+
recommended way to add an equality constraint is using the `at`
87+
method, roughly like so:
88+
89+
```
90+
infcx.at(...).eq(t, u);
91+
```
92+
93+
The first `at()` call provides a bit of context, i.e., why you are
94+
doing this unification, and in what environment, and the `eq` method
95+
performs the actual equality constraint.
96+
97+
When you equate things, you force them to be precisely equal. Equating
98+
returns a `InferResult` -- if it returns `Err(err)`, then equating
99+
failed, and the enclosing `TypeError` will tell you what went wrong.
100+
101+
The success case is perhaps more interesting. The "primary" return
102+
type of `eq` is `()` -- that is, when it succeeds, it doesn't return a
103+
value of any particular interest. Rather, it is executed for its
104+
side-effects of constraining type variables and so forth. However, the
105+
actual return type is not `()`, but rather `InferOk<()>`. The
106+
`InferOk` type is used to carry extra trait obligations -- your job is
107+
to ensure that these are fulfilled (typically by enrolling them in a
108+
fulfillment context). See the [trait README] for more background here.
109+
110+
[trait README]: ../traits/README.md
111+
112+
You can also enforce subtyping through `infcx.at(..).sub(..)`. The same
113+
basic concepts apply as above.
114+
115+
## "Trying" equality
116+
117+
Sometimes you would like to know if it is *possible* to equate two
118+
types without error. You can test that with `infcx.can_eq` (or
119+
`infcx.can_sub` for subtyping). If this returns `Ok`, then equality
120+
is possible -- but in all cases, any side-effects are reversed.
121+
122+
Be aware though that the success or failure of these methods is always
123+
**modulo regions**. That is, two types `&'a u32` and `&'b u32` will
124+
return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the
125+
"two-phase" nature of how we solve region constraints.
126+
127+
## Snapshots
128+
129+
As described in the previous section on `can_eq`, often it is useful
130+
to be able to do a series of operations and then roll back their
131+
side-effects. This is done for various reasons: one of them is to be
132+
able to backtrack, trying out multiple possibilities before settling
133+
on which path to take. Another is in order to ensure that a series of
134+
smaller changes take place atomically or not at all.
135+
136+
To allow for this, the inference context supports a `snapshot` method.
137+
When you call it, it will start recording changes that occur from the
138+
operations you perform. When you are done, you can either invoke
139+
`rollback_to`, which will undo those changes, or else `confirm`, which
140+
will make the permanent. Snapshots can be nested as long as you follow
141+
a stack-like discipline.
142+
143+
Rather than use snapshots directly, it is often helpful to use the
144+
methods like `commit_if_ok` or `probe` that encapsulate higher-level
145+
patterns.
146+
147+
## Subtyping obligations
148+
149+
One thing worth discussing are subtyping obligations. When you force
150+
two types to be a subtype, like `?T <: i32`, we can often convert those
151+
into equality constraints. This follows from Rust's rather limited notion
152+
of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.
153+
154+
However, in some cases we have to be more careful. For example, when
155+
regions are involved. So if you have `?T <: &'a i32`, what we would do
156+
is to first "generalize" `&'a i32` into a type with a region variable:
157+
`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
158+
relate this new variable with the original bound:
159+
160+
&'?b i32 <: &'a i32
161+
162+
This will result in a region constraint (see below) of `'?b: 'a`.
163+
164+
One final interesting case is relating two unbound type variables,
165+
like `?T <: ?U`. In that case, we can't make progress, so we enqueue
166+
an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
167+
mechanism. You'll have to try again when more details about `?T` or
168+
`?U` are known.
169+
170+
## Region constraints
171+
172+
Regions are inferred somewhat differently from types. Rather than
173+
eagerly unifying things, we simply collect constraints as we go, but
174+
make (almost) no attempt to solve regions. These constraints have the
175+
form of an outlives constraint:
176+
177+
'a: 'b
178+
179+
Actually the code tends to view them as a subregion relation, but it's the same
180+
idea:
181+
182+
'b <= 'a
183+
184+
(There are various other kinds of constriants, such as "verifys"; see
185+
the `region_constraints` module for details.)
186+
187+
There is one case where we do some amount of eager unification. If you have an equality constraint
188+
between two regions
189+
190+
'a = 'b
191+
192+
we will record that fact in a unification table. You can then use
193+
`opportunistic_resolve_var` to convert `'b` to `'a` (or vice
194+
versa). This is sometimes needed to ensure termination of fixed-point
195+
algorithms.
196+
197+
## Extracting region constraints
198+
199+
Ultimately, region constraints are only solved at the very end of
200+
type-checking, once all other constraints are known. There are two
201+
ways to solve region constraints right now: lexical and
202+
non-lexical. Eventually there will only be one.
203+
204+
To solve **lexical** region constraints, you invoke
205+
`resolve_regions_and_report_errors`. This will "close" the region
206+
constraint process and invoke the `lexical_region_resolve` code. Once
207+
this is done, any further attempt to equate or create a subtyping
208+
relationship will yield an ICE.
209+
210+
Non-lexical region constraints are not handled within the inference
211+
context. Instead, the NLL solver (actually, the MIR type-checker)
212+
invokes `take_and_reset_region_constraints` periodically. This
213+
extracts all of the outlives constraints from the region solver, but
214+
leaves the set of variables intact. This is used to get *just* the
215+
region constraints that resulted from some particular point in the
216+
program, since the NLL solver needs to know not just *what* regions
217+
were subregions but *where*. Finally, the NLL solver invokes
218+
`take_region_var_origins`, which "closes" the region constraint
219+
process in the same way as normal solving.
220+
221+
## Lexical region resolution
222+
223+
Lexical region resolution is done by initially assigning each region
224+
variable to an empty value. We then process each outlives constraint
225+
repeatedly, growing region variables until a fixed-point is reached.
226+
Region variables can be grown using a least-upper-bound relation on
227+
the region lattice in a fairly straight-forward fashion.

0 commit comments

Comments
 (0)