|
1 | 1 | # Type inference
|
| 2 | + |
| 3 | +The type inference is based on standard HM-type inference, but |
| 4 | +extended in various way to accommodate subtyping, region inference, |
| 5 | +and higher-ranked types. |
| 6 | + |
| 7 | +## A note on terminology |
| 8 | + |
| 9 | +We use the notation `?T` to refer to inference variables, also called |
| 10 | +existential variables. |
| 11 | + |
| 12 | +We use the term "region" and "lifetime" interchangeably. Both refer to |
| 13 | +the `'a` in `&'a T`. |
| 14 | + |
| 15 | +The term "bound region" refers to regions bound in a function |
| 16 | +signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is |
| 17 | +"free" if it is not bound. |
| 18 | + |
| 19 | +## Creating an inference context |
| 20 | + |
| 21 | +You create and "enter" an inference context by doing something like |
| 22 | +the following: |
| 23 | + |
| 24 | +```rust |
| 25 | +tcx.infer_ctxt().enter(|infcx| { |
| 26 | + // use the inference context `infcx` in here |
| 27 | +}) |
| 28 | +``` |
| 29 | + |
| 30 | +Each inference context creates a short-lived type arena to store the |
| 31 | +fresh types and things that it will create, as described in |
| 32 | +[the README in the ty module][ty-readme]. This arena is created by the `enter` |
| 33 | +function and disposed after it returns. |
| 34 | + |
| 35 | +[ty-readme]: src/librustc/ty/README.md |
| 36 | + |
| 37 | +Within the closure, the infcx will have the type `InferCtxt<'cx, 'gcx, |
| 38 | +'tcx>` for some fresh `'cx` and `'tcx` -- the latter corresponds to |
| 39 | +the lifetime of this temporary arena, and the `'cx` is the lifetime of |
| 40 | +the `InferCtxt` itself. (Again, see [that ty README][ty-readme] for |
| 41 | +more details on this setup.) |
| 42 | + |
| 43 | +The `tcx.infer_ctxt` method actually returns a build, which means |
| 44 | +there are some kinds of configuration you can do before the `infcx` is |
| 45 | +created. See `InferCtxtBuilder` for more information. |
| 46 | + |
| 47 | +## Inference variables |
| 48 | + |
| 49 | +The main purpose of the inference context is to house a bunch of |
| 50 | +**inference variables** -- these represent types or regions whose precise |
| 51 | +value is not yet known, but will be uncovered as we perform type-checking. |
| 52 | + |
| 53 | +If you're familiar with the basic ideas of unification from H-M type |
| 54 | +systems, or logic languages like Prolog, this is the same concept. If |
| 55 | +you're not, you might want to read a tutorial on how H-M type |
| 56 | +inference works, or perhaps this blog post on |
| 57 | +[unification in the Chalk project]. |
| 58 | + |
| 59 | +[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/ |
| 60 | + |
| 61 | +All told, the inference context stores four kinds of inference variables as of this |
| 62 | +writing: |
| 63 | + |
| 64 | +- Type variables, which come in three varieties: |
| 65 | + - General type variables (the most common). These can be unified with any type. |
| 66 | + - Integral type variables, which can only be unified with an integral type, and |
| 67 | + arise from an integer literal expression like `22`. |
| 68 | + - Float type variables, which can only be unified with a float type, and |
| 69 | + arise from a float literal expression like `22.0`. |
| 70 | +- Region variables, which represent lifetimes, and arise all over the dang place. |
| 71 | + |
| 72 | +All the type variables work in much the same way: you can create a new |
| 73 | +type variable, and what you get is `Ty<'tcx>` representing an |
| 74 | +unresolved type `?T`. Then later you can apply the various operations |
| 75 | +that the inferencer supports, such as equality or subtyping, and it |
| 76 | +will possibly **instantiate** (or **bind**) that `?T` to a specific |
| 77 | +value as a result. |
| 78 | + |
| 79 | +The region variables work somewhat differently, and are described |
| 80 | +below in a separate section. |
| 81 | + |
| 82 | +## Enforcing equality / subtyping |
| 83 | + |
| 84 | +The most basic operations you can perform in the type inferencer is |
| 85 | +**equality**, which forces two types `T` and `U` to be the same. The |
| 86 | +recommended way to add an equality constraint is using the `at` |
| 87 | +method, roughly like so: |
| 88 | + |
| 89 | +``` |
| 90 | +infcx.at(...).eq(t, u); |
| 91 | +``` |
| 92 | + |
| 93 | +The first `at()` call provides a bit of context, i.e., why you are |
| 94 | +doing this unification, and in what environment, and the `eq` method |
| 95 | +performs the actual equality constraint. |
| 96 | + |
| 97 | +When you equate things, you force them to be precisely equal. Equating |
| 98 | +returns a `InferResult` -- if it returns `Err(err)`, then equating |
| 99 | +failed, and the enclosing `TypeError` will tell you what went wrong. |
| 100 | + |
| 101 | +The success case is perhaps more interesting. The "primary" return |
| 102 | +type of `eq` is `()` -- that is, when it succeeds, it doesn't return a |
| 103 | +value of any particular interest. Rather, it is executed for its |
| 104 | +side-effects of constraining type variables and so forth. However, the |
| 105 | +actual return type is not `()`, but rather `InferOk<()>`. The |
| 106 | +`InferOk` type is used to carry extra trait obligations -- your job is |
| 107 | +to ensure that these are fulfilled (typically by enrolling them in a |
| 108 | +fulfillment context). See the [trait README] for more background here. |
| 109 | + |
| 110 | +[trait README]: ../traits/README.md |
| 111 | + |
| 112 | +You can also enforce subtyping through `infcx.at(..).sub(..)`. The same |
| 113 | +basic concepts apply as above. |
| 114 | + |
| 115 | +## "Trying" equality |
| 116 | + |
| 117 | +Sometimes you would like to know if it is *possible* to equate two |
| 118 | +types without error. You can test that with `infcx.can_eq` (or |
| 119 | +`infcx.can_sub` for subtyping). If this returns `Ok`, then equality |
| 120 | +is possible -- but in all cases, any side-effects are reversed. |
| 121 | + |
| 122 | +Be aware though that the success or failure of these methods is always |
| 123 | +**modulo regions**. That is, two types `&'a u32` and `&'b u32` will |
| 124 | +return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the |
| 125 | +"two-phase" nature of how we solve region constraints. |
| 126 | + |
| 127 | +## Snapshots |
| 128 | + |
| 129 | +As described in the previous section on `can_eq`, often it is useful |
| 130 | +to be able to do a series of operations and then roll back their |
| 131 | +side-effects. This is done for various reasons: one of them is to be |
| 132 | +able to backtrack, trying out multiple possibilities before settling |
| 133 | +on which path to take. Another is in order to ensure that a series of |
| 134 | +smaller changes take place atomically or not at all. |
| 135 | + |
| 136 | +To allow for this, the inference context supports a `snapshot` method. |
| 137 | +When you call it, it will start recording changes that occur from the |
| 138 | +operations you perform. When you are done, you can either invoke |
| 139 | +`rollback_to`, which will undo those changes, or else `confirm`, which |
| 140 | +will make the permanent. Snapshots can be nested as long as you follow |
| 141 | +a stack-like discipline. |
| 142 | + |
| 143 | +Rather than use snapshots directly, it is often helpful to use the |
| 144 | +methods like `commit_if_ok` or `probe` that encapsulate higher-level |
| 145 | +patterns. |
| 146 | + |
| 147 | +## Subtyping obligations |
| 148 | + |
| 149 | +One thing worth discussing are subtyping obligations. When you force |
| 150 | +two types to be a subtype, like `?T <: i32`, we can often convert those |
| 151 | +into equality constraints. This follows from Rust's rather limited notion |
| 152 | +of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`. |
| 153 | + |
| 154 | +However, in some cases we have to be more careful. For example, when |
| 155 | +regions are involved. So if you have `?T <: &'a i32`, what we would do |
| 156 | +is to first "generalize" `&'a i32` into a type with a region variable: |
| 157 | +`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then |
| 158 | +relate this new variable with the original bound: |
| 159 | + |
| 160 | + &'?b i32 <: &'a i32 |
| 161 | + |
| 162 | +This will result in a region constraint (see below) of `'?b: 'a`. |
| 163 | + |
| 164 | +One final interesting case is relating two unbound type variables, |
| 165 | +like `?T <: ?U`. In that case, we can't make progress, so we enqueue |
| 166 | +an obligation `Subtype(?T, ?U)` and return it via the `InferOk` |
| 167 | +mechanism. You'll have to try again when more details about `?T` or |
| 168 | +`?U` are known. |
| 169 | + |
| 170 | +## Region constraints |
| 171 | + |
| 172 | +Regions are inferred somewhat differently from types. Rather than |
| 173 | +eagerly unifying things, we simply collect constraints as we go, but |
| 174 | +make (almost) no attempt to solve regions. These constraints have the |
| 175 | +form of an outlives constraint: |
| 176 | + |
| 177 | + 'a: 'b |
| 178 | + |
| 179 | +Actually the code tends to view them as a subregion relation, but it's the same |
| 180 | +idea: |
| 181 | + |
| 182 | + 'b <= 'a |
| 183 | + |
| 184 | +(There are various other kinds of constriants, such as "verifys"; see |
| 185 | +the `region_constraints` module for details.) |
| 186 | + |
| 187 | +There is one case where we do some amount of eager unification. If you have an equality constraint |
| 188 | +between two regions |
| 189 | + |
| 190 | + 'a = 'b |
| 191 | + |
| 192 | +we will record that fact in a unification table. You can then use |
| 193 | +`opportunistic_resolve_var` to convert `'b` to `'a` (or vice |
| 194 | +versa). This is sometimes needed to ensure termination of fixed-point |
| 195 | +algorithms. |
| 196 | + |
| 197 | +## Extracting region constraints |
| 198 | + |
| 199 | +Ultimately, region constraints are only solved at the very end of |
| 200 | +type-checking, once all other constraints are known. There are two |
| 201 | +ways to solve region constraints right now: lexical and |
| 202 | +non-lexical. Eventually there will only be one. |
| 203 | + |
| 204 | +To solve **lexical** region constraints, you invoke |
| 205 | +`resolve_regions_and_report_errors`. This will "close" the region |
| 206 | +constraint process and invoke the `lexical_region_resolve` code. Once |
| 207 | +this is done, any further attempt to equate or create a subtyping |
| 208 | +relationship will yield an ICE. |
| 209 | + |
| 210 | +Non-lexical region constraints are not handled within the inference |
| 211 | +context. Instead, the NLL solver (actually, the MIR type-checker) |
| 212 | +invokes `take_and_reset_region_constraints` periodically. This |
| 213 | +extracts all of the outlives constraints from the region solver, but |
| 214 | +leaves the set of variables intact. This is used to get *just* the |
| 215 | +region constraints that resulted from some particular point in the |
| 216 | +program, since the NLL solver needs to know not just *what* regions |
| 217 | +were subregions but *where*. Finally, the NLL solver invokes |
| 218 | +`take_region_var_origins`, which "closes" the region constraint |
| 219 | +process in the same way as normal solving. |
| 220 | + |
| 221 | +## Lexical region resolution |
| 222 | + |
| 223 | +Lexical region resolution is done by initially assigning each region |
| 224 | +variable to an empty value. We then process each outlives constraint |
| 225 | +repeatedly, growing region variables until a fixed-point is reached. |
| 226 | +Region variables can be grown using a least-upper-bound relation on |
| 227 | +the region lattice in a fairly straight-forward fashion. |
0 commit comments