diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 273b409ec..3d0180ede 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -76,6 +76,12 @@ - [Move paths](./borrow_check/moves_and_initialization/move_paths.md) - [MIR type checker](./borrow_check/type_check.md) - [Region inference](./borrow_check/region_inference.md) + - [Constraint propagation](./borrow_check/region_inference/constraint_propagation.md) + - [Lifetime parameters](./borrow_check/region_inference/lifetime_parameters.md) + - [Member constraints](./borrow_check/region_inference/member_constraints.md) + - [Placeholders and universes][pau] + - [Closure constraints](./borrow_check/region_inference/closure_constraints.md) + - [Errror reporting](./borrow_check/region_inference/error_reporting.md) - [Two-phase-borrows](./borrow_check/two_phase_borrows.md) - [Constant evaluation](./const-eval.md) - [miri const evaluator](./miri.md) @@ -92,3 +98,5 @@ [Appendix C: Glossary](./appendix/glossary.md) [Appendix D: Code Index](./appendix/code-index.md) [](./important-links.md) + +[pau]: ./borrow_check/region_inference/placeholders_and_universes.md diff --git a/src/appendix/glossary.md b/src/appendix/glossary.md index 3d77def91..007b1bb4c 100644 --- a/src/appendix/glossary.md +++ b/src/appendix/glossary.md @@ -50,6 +50,7 @@ newtype | a "newtype" is a wrapper around some other type (e.g. NLL | [non-lexical lifetimes](../borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. obligation | something that must be proven by the trait system ([see more](../traits/resolution.html)) +point | used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph. projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](../traits/goals-and-clauses.html#trait-ref) promoted constants | constants extracted from a function and lifted to static scope; see [this section](../mir/index.html#promoted) for more details. provider | the function that executes a query ([see more](../query.html)) diff --git a/src/borrow_check/region_inference.md b/src/borrow_check/region_inference.md index 7fe50b870..46caab567 100644 --- a/src/borrow_check/region_inference.md +++ b/src/borrow_check/region_inference.md @@ -24,20 +24,19 @@ The MIR-based region analysis consists of two major functions: - [`compute_regions`], invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables that `replace_regions_in_mir` introduced. - - To do that, it first runs the [MIR type checker]. This - is basically a normal type-checker but specialized to MIR, which - is much simpler than full Rust, of course. Running the MIR type - checker will however create **outlives constraints** between - region variables (e.g., that one variable must outlive another - one) to reflect the subtyping relationships that arise. - - It also adds **liveness constraints** that arise from where variables - are used. - - After this, we create a [`RegionInferenceContext`] with the constraints we - have computed and the inference variables we introduced and use the - [`solve`] method to infer values for all region inference varaibles. + - To do that, it first runs the [MIR type checker]. This is + basically a normal type-checker but specialized to MIR, which is + much simpler than full Rust, of course. Running the MIR type + checker will however create various [constraints][cp] between region + variables, indicating their potential values and relationships to + one another. + - After this, we perform [constraint propagation][cp] by creating a + [`RegionInferenceContext`] and invoking its [`solve`] + method. - The [NLL RFC] also includes fairly thorough (and hopefully readable) coverage. +[cp]: ./region_inference/constraint_propagation.html [fvb]: ../appendix/background.html#free-vs-bound [`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html [`compute_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.compute_regions.html @@ -48,7 +47,7 @@ The MIR-based region analysis consists of two major functions: ## Universal regions -The [`UnversalRegions`] type represents a collection of _universal_ regions +The [`UniversalRegions`] type represents a collection of _universal_ regions corresponding to some MIR `DefId`. It is constructed in [`replace_regions_in_mir`] when we replace all regions with fresh inference variables. [`UniversalRegions`] contains indices for all the free regions in @@ -71,6 +70,8 @@ TODO: write about _how_ these regions are computed. [`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html + + ## Region variables The value of a region can be thought of as a **set**. This set contains all @@ -103,8 +104,10 @@ The kinds of region elements are as follows: ## Constraints -Before we can infer the value of regions, we need to collect constraints on the -regions. There are two primary types of constraints. +Before we can infer the value of regions, we need to collect +constraints on the regions. The full set of constraints is described +in [the section on constraint propagation][cp], but the two most +common sorts of constraints are: 1. Outlives constraints. These are constraints that one region outlives another (e.g. `'a: 'b`). Outlives constraints are generated by the [MIR type @@ -234,460 +237,3 @@ tests and universal regions, as discussed above. [`check_type_tests`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.check_type_tests [`check_universal_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.check_universal_regions -## Closures - -When we are checking the type tests and universal regions, we may come across a -constraint that we can't prove yet if we are in a closure body! However, the -necessary constraints may actually hold (we just don't know it yet). Thus, if -we are inside a closure, we just collect all the constraints we can't prove yet -and return them. Later, when we are borrow check the MIR node that created the -closure, we can also check that these constraints hold. At that time, if we -can't prove they hold, we report an error. - -## Placeholders and universes - -(This section describes ongoing work that hasn't landed yet.) - -From time to time we have to reason about regions that we can't -concretely know. For example, consider this program: - -```rust,ignore -// A function that needs a static reference -fn foo(x: &'static u32) { } - -fn bar(f: for<'a> fn(&'a u32)) { - // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference - let x = 22; - f(&x); -} - -fn main() { - bar(foo); -} -``` - -This program ought not to type-check: `foo` needs a static reference -for its argument, and `bar` wants to be given a function that that -accepts **any** reference (so it can call it with something on its -stack, for example). But *how* do we reject it and *why*? - -### Subtyping and Placeholders - -When we type-check `main`, and in particular the call `bar(foo)`, we -are going to wind up with a subtyping relationship like this one: - -```text -fn(&'static u32) <: for<'a> fn(&'a u32) ----------------- ------------------- -the type of `foo` the type `bar` expects -``` - -We handle this sort of subtyping by taking the variables that are -bound in the supertype and replacing them with -[universally quantified](../appendix/background.html#quantified) -representatives, written like `!1`. We call these regions "placeholder -regions" – they represent, basically, "some unknown region". - -Once we've done that replacement, we have the following relation: - -```text -fn(&'static u32) <: fn(&'!1 u32) -``` - -The key idea here is that this unknown region `'!1` is not related to -any other regions. So if we can prove that the subtyping relationship -is true for `'!1`, then it ought to be true for any region, which is -what we wanted. - -So let's work through what happens next. To check if two functions are -subtypes, we check if their arguments have the desired relationship -(fn arguments are [contravariant](../appendix/background.html#variance), so -we swap the left and right here): - -```text -&'!1 u32 <: &'static u32 -``` - -According to the basic subtyping rules for a reference, this will be -true if `'!1: 'static`. That is – if "some unknown region `!1`" lives -outlives `'static`. Now, this *might* be true – after all, `'!1` -could be `'static` – but we don't *know* that it's true. So this -should yield up an error (eventually). - -### What is a universe - -In the previous section, we introduced the idea of a placeholder -region, and we denoted it `!1`. We call this number `1` the **universe -index**. The idea of a "universe" is that it is a set of names that -are in scope within some type or at some point. Universes are formed -into a tree, where each child extends its parents with some new names. -So the **root universe** conceptually contains global names, such as -the the lifetime `'static` or the type `i32`. In the compiler, we also -put generic type parameters into this root universe (in this sense, -there is not just one root universe, but one per item). So consider -this function `bar`: - -```rust,ignore -struct Foo { } - -fn bar<'a, T>(t: &'a T) { - ... -} -``` - -Here, the root universe would consist of the lifetimes `'static` and -`'a`. In fact, although we're focused on lifetimes here, we can apply -the same concept to types, in which case the types `Foo` and `T` would -be in the root universe (along with other global types, like `i32`). -Basically, the root universe contains all the names that -[appear free](../appendix/background.html#free-vs-bound) in the body of `bar`. - -Now let's extend `bar` a bit by adding a variable `x`: - -```rust,ignore -fn bar<'a, T>(t: &'a T) { - let x: for<'b> fn(&'b u32) = ...; -} -``` - -Here, the name `'b` is not part of the root universe. Instead, when we -"enter" into this `for<'b>` (e.g., by replacing it with a placeholder), we will create -a child universe of the root, let's call it U1: - -```text -U0 (root universe) -│ -└─ U1 (child universe) -``` - -The idea is that this child universe U1 extends the root universe U0 -with a new name, which we are identifying by its universe number: -`!1`. - -Now let's extend `bar` a bit by adding one more variable, `y`: - -```rust,ignore -fn bar<'a, T>(t: &'a T) { - let x: for<'b> fn(&'b u32) = ...; - let y: for<'c> fn(&'b u32) = ...; -} -``` - -When we enter *this* type, we will again create a new universe, which -we'll call `U2`. Its parent will be the root universe, and U1 will be -its sibling: - -```text -U0 (root universe) -│ -├─ U1 (child universe) -│ -└─ U2 (child universe) -``` - -This implies that, while in U2, we can name things from U0 or U2, but -not U1. - -**Giving existential variables a universe.** Now that we have this -notion of universes, we can use it to extend our type-checker and -things to prevent illegal names from leaking out. The idea is that we -give each inference (existential) variable – whether it be a type or -a lifetime – a universe. That variable's value can then only -reference names visible from that universe. So for example is a -lifetime variable is created in U0, then it cannot be assigned a value -of `!1` or `!2`, because those names are not visible from the universe -U0. - -**Representing universes with just a counter.** You might be surprised -to see that the compiler doesn't keep track of a full tree of -universes. Instead, it just keeps a counter – and, to determine if -one universe can see another one, it just checks if the index is -greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see -U2, because 0 >= 2 is false. - -How can we get away with this? Doesn't this mean that we would allow -U2 to also see U1? The answer is that, yes, we would, **if that -question ever arose**. But because of the structure of our type -checker etc, there is no way for that to happen. In order for -something happening in the universe U1 to "communicate" with something -happening in U2, they would have to have a shared inference variable X -in common. And because everything in U1 is scoped to just U1 and its -children, that inference variable X would have to be in U0. And since -X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest -to see by using a kind of generic "logic" example: - -```text -exists { - forall { ... /* Y is in U1 ... */ } - forall { ... /* Z is in U2 ... */ } -} -``` - -Here, the only way for the two foralls to interact would be through X, -but neither Y nor Z are in scope when X is declared, so its value -cannot reference either of them. - -### Universes and placeholder region elements - -But where does that error come from? The way it happens is like this. -When we are constructing the region inference context, we can tell -from the type inference context how many placeholder variables exist -(the `InferCtxt` has an internal counter). For each of those, we -create a corresponding universal region variable `!n` and a "region -element" `placeholder(n)`. This corresponds to "some unknown set of other -elements". The value of `!n` is `{placeholder(n)}`. - -At the same time, we also give each existential variable a -**universe** (also taken from the `InferCtxt`). This universe -determines which placeholder elements may appear in its value: For -example, a variable in universe U3 may name `placeholder(1)`, `placeholder(2)`, and -`placeholder(3)`, but not `placeholder(4)`. Note that the universe of an inference -variable controls what region elements **can** appear in its value; it -does not say region elements **will** appear. - -### Placeholders and outlives constraints - -In the region inference engine, outlives constraints have the form: - -```text -V1: V2 @ P -``` - -where `V1` and `V2` are region indices, and hence map to some region -variable (which may be universally or existentially quantified). The -`P` here is a "point" in the control-flow graph; it's not important -for this section. This variable will have a universe, so let's call -those universes `U(V1)` and `U(V2)` respectively. (Actually, the only -one we are going to care about is `U(V1)`.) - -When we encounter this constraint, the ordinary procedure is to start -a DFS from `P`. We keep walking so long as the nodes we are walking -are present in `value(V2)` and we add those nodes to `value(V1)`. If -we reach a return point, we add in any `end(X)` elements. That part -remains unchanged. - -But then *after that* we want to iterate over the placeholder `placeholder(x)` -elements in V2 (each of those must be visible to `U(V2)`, but we -should be able to just assume that is true, we don't have to check -it). We have to ensure that `value(V1)` outlives each of those -placeholder elements. - -Now there are two ways that could happen. First, if `U(V1)` can see -the universe `x` (i.e., `x <= U(V1)`), then we can just add `placeholder(x)` -to `value(V1)` and be done. But if not, then we have to approximate: -we may not know what set of elements `placeholder(x)` represents, but we -should be able to compute some sort of **upper bound** B for it – -some region B that outlives `placeholder(x)`. For now, we'll just use -`'static` for that (since it outlives everything) – in the future, we -can sometimes be smarter here (and in fact we have code for doing this -already in other contexts). Moreover, since `'static` is in the root -universe U0, we know that all variables can see it – so basically if -we find that `value(V2)` contains `placeholder(x)` for some universe `x` -that `V1` can't see, then we force `V1` to `'static`. - -### Extending the "universal regions" check - -After all constraints have been propagated, the NLL region inference -has one final check, where it goes over the values that wound up being -computed for each universal region and checks that they did not get -'too large'. In our case, we will go through each placeholder region -and check that it contains *only* the `placeholder(u)` element it is known to -outlive. (Later, we might be able to know that there are relationships -between two placeholder regions and take those into account, as we do -for universal regions from the fn signature.) - -Put another way, the "universal regions" check can be considered to be -checking constraints like: - -```text -{placeholder(1)}: V1 -``` - -where `{placeholder(1)}` is like a constant set, and V1 is the variable we -made to represent the `!1` region. - -## Back to our example - -OK, so far so good. Now let's walk through what would happen with our -first example: - -```text -fn(&'static u32) <: fn(&'!1 u32) @ P // this point P is not imp't here -``` - -The region inference engine will create a region element domain like this: - -```text -{ CFG; end('static); placeholder(1) } - --- ------------ ------- from the universe `!1` - | 'static is always in scope - all points in the CFG; not especially relevant here -``` - -It will always create two universal variables, one representing -`'static` and one representing `'!1`. Let's call them Vs and V1. They -will have initial values like so: - -```text -Vs = { CFG; end('static) } // it is in U0, so can't name anything else -V1 = { placeholder(1) } -``` - -From the subtyping constraint above, we would have an outlives constraint like - -```text -'!1: 'static @ P -``` - -To process this, we would grow the value of V1 to include all of Vs: - -```text -Vs = { CFG; end('static) } -V1 = { CFG; end('static), placeholder(1) } -``` - -At that point, constraint propagation is complete, because all the -outlives relationships are satisfied. Then we would go to the "check -universal regions" portion of the code, which would test that no -universal region grew too large. - -In this case, `V1` *did* grow too large – it is not known to outlive -`end('static)`, nor any of the CFG – so we would report an error. - -## Another example - -What about this subtyping relationship? - -```text -for<'a> fn(&'a u32, &'a u32) - <: -for<'b, 'c> fn(&'b u32, &'c u32) -``` - -Here we would replace the bound region in the supertype with a placeholder, as before, yielding: - -```text -for<'a> fn(&'a u32, &'a u32) - <: -fn(&'!1 u32, &'!2 u32) -``` - -then we instantiate the variable on the left-hand side with an -existential in universe U2, yielding the following (`?n` is a notation -for an existential variable): - -```text -fn(&'?3 u32, &'?3 u32) - <: -fn(&'!1 u32, &'!2 u32) -``` - -Then we break this down further: - -```text -&'!1 u32 <: &'?3 u32 -&'!2 u32 <: &'?3 u32 -``` - -and even further, yield up our region constraints: - -```text -'!1: '?3 -'!2: '?3 -``` - -Note that, in this case, both `'!1` and `'!2` have to outlive the -variable `'?3`, but the variable `'?3` is not forced to outlive -anything else. Therefore, it simply starts and ends as the empty set -of elements, and hence the type-check succeeds here. - -(This should surprise you a little. It surprised me when I first realized it. -We are saying that if we are a fn that **needs both of its arguments to have -the same region**, we can accept being called with **arguments with two -distinct regions**. That seems intuitively unsound. But in fact, it's fine, as -I tried to explain in [this issue][ohdeargoditsallbroken] on the Rust issue -tracker long ago. The reason is that even if we get called with arguments of -two distinct lifetimes, those two lifetimes have some intersection (the call -itself), and that intersection can be our value of `'a` that we use as the -common lifetime of our arguments. -nmatsakis) - -[ohdeargoditsallbroken]: https://github.com/rust-lang/rust/issues/32330#issuecomment-202536977 - -## Final example - -Let's look at one last example. We'll extend the previous one to have -a return type: - -```text -for<'a> fn(&'a u32, &'a u32) -> &'a u32 - <: -for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32 -``` - -Despite seeming very similar to the previous example, this case is going to get -an error. That's good: the problem is that we've gone from a fn that promises -to return one of its two arguments, to a fn that is promising to return the -first one. That is unsound. Let's see how it plays out. - -First, we replace the bound region in the supertype with a placeholder: - -```text -for<'a> fn(&'a u32, &'a u32) -> &'a u32 - <: -fn(&'!1 u32, &'!2 u32) -> &'!1 u32 -``` - -Then we instantiate the subtype with existentials (in U2): - -```text -fn(&'?3 u32, &'?3 u32) -> &'?3 u32 - <: -fn(&'!1 u32, &'!2 u32) -> &'!1 u32 -``` - -And now we create the subtyping relationships: - -```text -&'!1 u32 <: &'?3 u32 // arg 1 -&'!2 u32 <: &'?3 u32 // arg 2 -&'?3 u32 <: &'!1 u32 // return type -``` - -And finally the outlives relationships. Here, let V1, V2, and V3 be the -variables we assign to `!1`, `!2`, and `?3` respectively: - -```text -V1: V3 -V2: V3 -V3: V1 -``` - -Those variables will have these initial values: - -```text -V1 in U1 = {placeholder(1)} -V2 in U2 = {placeholder(2)} -V3 in U2 = {} -``` - -Now because of the `V3: V1` constraint, we have to add `placeholder(1)` into `V3` (and -indeed it is visible from `V3`), so we get: - -```text -V3 in U2 = {placeholder(1)} -``` - -then we have this constraint `V2: V3`, so we wind up having to enlarge -`V2` to include `placeholder(1)` (which it can also see): - -```text -V2 in U2 = {placeholder(1), placeholder(2)} -``` - -Now constraint propagation is done, but when we check the outlives -relationships, we find that `V2` includes this new element `placeholder(1)`, -so we report an error. - -## Borrow Checker Errors - -TODO: we should discuss how to generate errors from the results of these analyses. diff --git a/src/borrow_check/region_inference/closure_constraints.md b/src/borrow_check/region_inference/closure_constraints.md new file mode 100644 index 000000000..13230d037 --- /dev/null +++ b/src/borrow_check/region_inference/closure_constraints.md @@ -0,0 +1,10 @@ +# Propagating closure constraints + +When we are checking the type tests and universal regions, we may come +across a constraint that we can't prove yet if we are in a closure +body! However, the necessary constraints may actually hold (we just +don't know it yet). Thus, if we are inside a closure, we just collect +all the constraints we can't prove yet and return them. Later, when we +are borrow check the MIR node that created the closure, we can also +check that these constraints hold. At that time, if we can't prove +they hold, we report an error. diff --git a/src/borrow_check/region_inference/constraint_propagation.md b/src/borrow_check/region_inference/constraint_propagation.md new file mode 100644 index 000000000..7b9eeb13d --- /dev/null +++ b/src/borrow_check/region_inference/constraint_propagation.md @@ -0,0 +1,224 @@ +# Constraint propagation + +The main work of the region inference is **constraint propagation**, +which is done in the [`propagate_constraints`] function. There are +three sorts of constraints that are used in NLL, and we'll explain how +`propagate_constraints` works by "layering" those sorts of constraints +on one at a time (each of them is fairly independent from the others): + +- liveness constraints (`R live at E`), which arise from liveness; +- outlives constraints (`R1: R2`), which arise from subtyping; +- [member constraints][m_c] (`member R_m of [R_c...]`), which arise from impl Trait. + +[`propagate_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.propagate_constraints +[m_c]: ./member_constraints.html + +In this chapter, we'll explain the "heart" of constraint propagation, +covering both liveness and outlives constraints. + +## Notation and high-level concepts + +Conceptually, region inference is a "fixed-point" computation. It is +given some set of constraints `{C}` and it computes a set of values +`Values: R -> {E}` that maps each region `R` to a set of elements +`{E}` (see [here][riv] for more notes on region elements): + +- Initially, each region is mapped to an empty set, so `Values(R) = + {}` for all regions `R`. +- Next, we process the constraints repeatedly until a fixed-point is reached: + - For each constraint C: + - Update `Values` as needed to satisfy the constraint + +[riv]: ../region-inference.html#region-variables + +As a simple example, if we have a liveness constraint `R live at E`, +then we can apply `Values(R) = Values(R) union {E}` to make the +constraint be satisfied. Similarly, if we have an outlives constraints +`R1: R2`, we can apply `Values(R1) = Values(R1) union Values(R2)`. +(Member constraints are more complex and we discuss them [in this section][m_c].) + +In practice, however, we are a bit more clever. Instead of applying +the constraints in a loop, we can analyze the constraints and figure +out the correct order to apply them, so that we only have to apply +each constraint once in order to find the final result. + +Similarly, in the implementation, the `Values` set is stored in the +`scc_values` field, but they are indexed not by a *region* but by a +*strongly connected component* (SCC). SCCs are an optimization that +avoids a lot of redundant storage and computation. They are explained +in the section on outlives constraints. + +## Liveness constraints + +A **liveness constraint** arises when some variable whose type +includes a region R is live at some [point] P. This simply means that +the value of R must include the point P. Liveness constraints are +computed by the MIR type checker. + +[point]: ../../appendix/glossary.html + +A liveness constraint `R live at E` is satisfied if `E` is a member of +`Values(R)`. So to "apply" such a constraint to `Values`, we just have +to compute `Values(R) = Values(R) union {E}`. + +The liveness values are computed in the type-check and passed to the +region inference upon creation in the `liveness_constraints` argument. +These are not represented as individual constraints like `R live at E` +though; instead, we store a (sparse) bitset per region variable (of +type [`LivenessValues`]). This way we only need a single bit for each +liveness constraint. + +[`liveness_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#structfield.liveness_constraints +[`LivenessValues`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/values/struct.LivenessValues.html + +One thing that is worth mentioning: All lifetime parameters are always +considered to be live over the entire function body. This is because +they correspond to some portion of the *caller's* execution, and that +execution clearly includes the time spent in this function, since the +caller is waiting for us to return. + +## Outlives constraints + +An outlives constraint `'a: 'b` indicates that the value of `'a` must +be a **superset** of the value of `'b`. That is, an outlives +constraint `R1: R2` is satisfied if `Values(R1)` is a superset of +`Values(R2)`. So to "apply" such a constraint to `Values`, we just +have to compute `Values(R1) = Values(R1) union Values(R2)`. + +One observation that follows from this is that if you have `R1: R2` +and `R2: R1`, then `R1 = R2` must be true. Similarly, if you have: + +``` +R1: R2 +R2: R3 +R3: R4 +R4: R1 +``` + +then `R1 = R2 = R3 = R4` follows. We take advantage of this to make things +much faster, as described shortly. + +In the code, the set of outlives constraints is given to the region +inference context on creation in a parameter of type +[`ConstraintSet`]. The constraint set is basically just a list of `'a: +'b` constraints. + +### The outlives constraint graph and SCCs + +In order to work more efficiently with outlives constraints, they are +[converted into the form of a graph][graph-fn], where the nodes of the +graph are region variables (`'a`, `'b`) and each constraint `'a: 'b` +induces an edge `'a -> 'b`. This conversion happens in the +[`RegionInferenceContext::new`] function that creates the inference +context. + +[`ConstraintSet`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html +[graph-fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html#method.graph +[`RegionInferenceContext::new`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.new + +When using a graph representation, we can detect regions that must be equal +by looking for cycles. That is, if you have a constraint like + +``` +'a: 'b +'b: 'c +'c: 'd +'d: 'a +``` + +then this will correspond to a cycle in the graph containing the +elements `'a...'d`. + +Therefore, one of the first things that we do in propagating region +values is to compute the **strongly connected components** (SCCs) in +the constraint graph. The result is stored in the [`constraint_sccs`] +field. You can then easily find the SCC that a region `r` is a part of +by invoking `constraint_sccs.scc(r)`. + +Working in terms of SCCs allows us to be more efficient: if we have a +set of regions `'a...'d` that are part of a single SCC, we don't have +to compute/store their values separarely. We can just store one value +**for the SCC**, since they must all be equal. + +If you look over the region inference code, you will see that a number +of fields are defined in terms of SCCs. For example, the +[`scc_values`] field stores the values of each SCC. To get the value +of a specific region `'a` then, we first figure out the SCC that the +region is a part of, and then find the value of that SCC. + +[`constraint_sccs`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#structfield.constraint_sccs +[`scc_values`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#structfield.scc_values + +When we compute SCCs, we not only figure out which regions are a +member of each SCC, we also figure out the edges between them. So for example +consider this set of outlives constraints: + +``` +'a: 'b +'b: 'a + +'a: 'c + +'c: 'd +'d: 'c +``` + +Here we have two SCCs: S0 contains `'a` and `'b`, and S1 contains `'c` +and `'d`. But these SCCs are not independent: because `'a: 'c`, that +means that `S0: S1` as well. That is -- the value of `S0` must be a +superset of the value of `S1`. One crucial thing is that this graph of +SCCs is always a DAG -- that is, it never has cycles. This is because +all the cycles have been removed to form the SCCs themselves. + +### Applying liveness constraints to SCCs + +The liveness constraints that come in from the type-checker are +expressed in terms of regions -- that is, we have a map like +`Liveness: R -> {E}`. But we want our final result to be expressed +in terms of SCCs -- we can integrate these liveness constraints very +easily just by taking the union: + +``` +for each region R: + let S be the SCC that contains R + Values(S) = Values(S) union Liveness(R) +``` + +In the region inferencer, this step is done in [`RegionInferenceContext::new`]. + +### Applying outlives constraints + +Once we have computed the DAG of SCCs, we use that to structure out +entire computation. If we have an edge `S1 -> S2` between two SCCs, +that means that `Values(S1) >= Values(S2)` must hold. So, to compute +the value of `S1`, we first compute the values of each successor `S2`. +Then we simply union all of those values together. To use a +quasi-iterator-like notation: + +``` +Values(S1) = + s1.successors() + .map(|s2| Values(s2)) + .union() +``` + +In the code, this work starts in the [`propagate_constraints`] +function, which iterates over all the SCCs. For each SCC `S1`, we +compute its value by first computing the value of its +successors. Since SCCs form a DAG, we don't have to be concerned about +cycles, though we do need to keep a set around to track whether we +have already processed a given SCC or not. For each successor `S2`, once +we have computed `S2`'s value, we can union those elements into the +value for `S1`. (Although we have to be careful in this process to +properly handle [higher-ranked +placeholders](./placeholders_and_universes.html). Note that the value +for `S1` already contains the liveness constraints, since they were +added in [`RegionInferenceContext::new`]. + +Once that process is done, we now have the "minimal value" for `S1`, +taking into account all of the liveness and outlives +constraints. However, in order to complete the process, we must also +consider [member constraints][m_c], which are described in [a later +section][m_c]. + + diff --git a/src/borrow_check/region_inference/error_reporting.md b/src/borrow_check/region_inference/error_reporting.md new file mode 100644 index 000000000..79b3e077c --- /dev/null +++ b/src/borrow_check/region_inference/error_reporting.md @@ -0,0 +1,3 @@ +# Reporting region errors + +TODO: we should discuss how to generate errors from the results of these analyses. diff --git a/src/borrow_check/region_inference/lifetime_parameters.md b/src/borrow_check/region_inference/lifetime_parameters.md new file mode 100644 index 000000000..ecd3a19a5 --- /dev/null +++ b/src/borrow_check/region_inference/lifetime_parameters.md @@ -0,0 +1,125 @@ +# Universal regions + +"Universal regions" is the name that the code uses to refer to "named +lifetimes" -- e.g., lifetime parameters and `'static`. The name +derives from the fact that such lifetimes are "universally quantified" +(i.e., we must make sure the code is true for all values of those +lifetimes). It is worth spending a bit of discussing how lifetime +parameters are handled during region inference. Consider this example: + +```rust +fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 { + x +} +``` + +This example is intended not to compile, because we are returning `x`, +which has type `&'a u32`, but our signature promises that we will +return a `&'b u32` value. But how are lifetimes like `'a` and `'b` +integrated into region inference, and how this error wind up being +detected? + +## Universal regions and their relationships to one another + +Early on in region inference, one of the first things we do is to +construct a [`UniversalRegions`] struct. This struct tracks the +various universal regions in scope on a particular function. We also +create a [`UniversalRegionRelations`] struct, which tracks their +relationships to one another. So if you have e.g. `where 'a: 'b`, then +the [`UniversalRegionRelations`] struct would track that `'a: 'b` is +known to hold (which could be tested with the [`outlives`] function. + +[`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html +[`UniversalRegionRelations`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html +[`outlives`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html#method.outlives + +## Everything is a region variable + +One important aspect of how NLL region inference works is that **all +lifetimes** are represented as numbered variables. This means that the +only variant of [`ty::RegionKind`] that we use is the [`ReVar`] +variant. These region variables are broken into two major categories, +based on their index: + +[`ty::RegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html +[`ReVar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html#variant.ReVar + +- 0..N: universal regions -- the ones we are discussing here. In this + case, the code must be correct with respect to any value of those + variables that meets the declared relationships. +- N..M: existential regions -- inference variables where the region + inferencer is tasked with finding *some* suitable value. + +In fact, the universal regions can be further subdivided based on +where they were brought into scope (see the [`RegionClassification`] +type). These subdivions are not important for the topics discussed +here, but become important when we consider [closure constraint +propagation](./closure_constraints.html), so we discuss them there. + +[`RegionClassification`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/enum.RegionClassification.html#variant.Local + +## Universal lifetimes as the elements of a region's value + +As noted previously, the value that we infer for each region is a set +`{E}`. The elements of this set can be points in the control-flow +graph, but they can also be an element `end('a)` corresponding to each +universal lifetime `'a`. If the value for some region `R0` includes +`end('a`), then this implies that `R0` must extend until the end of `'a` +in the caller. + +## The "value" of a universal region + +During region inference, we compute a value for each universal region +in the same way as we compute values for other regions. This value +represents, effectively, the **lower bound** on that universal region +-- the things that it must outlive. We now describe how we use this +value to check for errors. + +## Liveness and universal regions + +All universal regions have an initial liveness constraint that +includes the entire function body. This is because lifetime parameters +are defined in the caller and must include the entirety of the +function call that invokes this particular function. In addition, each +universal region `'a` includes itself (that is, `end('a)`) in its +liveness constraint (i.e., `'a` must extend until the end of +itself). In the code, these liveness constraints are setup in +[`init_free_and_bound_regions`]. + +[`init_free_and_bound_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.init_free_and_bound_regions + +## Propagating outlives constraints for universal regions + +So, consider the first example of this section: + +```rust +fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 { + x +} +``` + +Here, returning `x` requires that `&'a u32 <: &'b u32`, which gives +rise to an outlives constraint `'a: 'b`. Combined with our default liveness +constraints we get: + +``` +'a live at {B, end('a)} // B represents the "function body" +'b live at {B, end('b)} +'a: 'b +``` + +When we process the `'a: 'b` constraint, therefore, we will add +`end('b)` into the value for `'a`, resulting in a final value of `{B, +end('a), end('b)}`. + +## Detecting errors + +Once we have finished constraint propagation, we then enforce a +constraint that if some universal region `'a` includes an element +`end('b)`, then `'a: 'b` must be declared in the function's bounds. If +not, as in our example, that is an error. This check is done in the +[`check_universal_regions`] function, which simply iterates over all +universal regions, inspects their final value, and tests against the +declared [`UniversalRegionRelations`]. + +[`check_universal_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.check_universal_regions diff --git a/src/borrow_check/region_inference/member_constraints.md b/src/borrow_check/region_inference/member_constraints.md new file mode 100644 index 000000000..3f934e864 --- /dev/null +++ b/src/borrow_check/region_inference/member_constraints.md @@ -0,0 +1,193 @@ +# Member constraints + +A member constraint `'m member of ['c_1..'c_N]` expresses that the +region `'m` must be *equal* to some **choice regions** `'c_i` (for +some `i`). These constraints cannot be expressed by users, but they +arise from `impl Trait` due to its lifetime capture rules. Consider a +function such as the following: + +```rust +fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. } +``` + +Here, the true return type (often called the "hidden type") is only +permitted to capture the lifetimes `'a` or `'b`. You can kind of see +this more clearly by desugaring that `impl Trait` return type into its +more explicit form: + +```rust +type MakeReturn<'x, 'y> = impl Trait<'x, 'y>; +fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. } +``` + +Here, the idea is that the hidden type must be some type that could +have been written in place of the `impl Trait<'x, 'y>` -- but clearly +such a type can only reference the regions `'x` or `'y` (or +`'static`!), as those are the only names in scope. This limitation is +then translated into a restriction to only access `'a` or `'b` because +we are returning `MakeReturn<'a, 'b>`, where `'x` and `'y` have been +replaced with `'a` and `'b` respectively. + +## Detailed example + +To help us explain member constraints in more detail, let's spell out +the `make` example in a bit more detail. First off, let's assume that +you have some dummy trait: + +```rust +trait Trait<'a, 'b> { } +impl Trait<'_, '_> for T { } +``` + +and this is the `make` function (in desugared form): + +```rust +type MakeReturn<'x, 'y> = impl Trait<'x, 'y>; +fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { + (a, b) +} +``` + +What happens in this case is that the return type will be `(&'0 u32, &'1 u32)`, +where `'0` and `'1` are fresh region variables. We will have the following +region constraints: + +``` +'0 live at {L} +'1 live at {L} +'a: '0 +'b: '1 +'0 member of ['a, 'b, 'static] +'1 member of ['a, 'b, 'static] +``` + +Here the "liveness set" `{L}` corresponds to that subset of the body +where `'0` and `'1` are live -- basically the point from where the +return tuple is constructed to where it is returned (in fact, `'0` and +`'1` might have slightly different liveness sets, but that's not very +interesting to the point we are illustrating here). + +The `'a: '0` and `'b: '1` constraints arise from subtyping. When we +construct the `(a, b)` value, it will be assigned type `(&'0 u32, &'1 +u32)` -- the region variables reflect that the lifetimes of these +references could be made smaller. For this value to be created from +`a` and `b`, however, we do require that: + +``` +(&'a u32, &'b u32) <: (&'0 u32, &'1 u32) +``` + +which means in turn that `&'a u32 <: &'0 u32` and hence that `'a: '0` +(and similarly that `&'b u32 <: &'1 u32`, `'b: '1`). + +Note that if we ignore member constraints, the value of `'0` would be +inferred to some subset of the function body (from the liveness +constraints, which we did not write explicitly). It would never become +`'a`, because there is no need for it too -- we have a constraint that +`'a: '0`, but that just puts a "cap" on how *large* `'0` can grow to +become. Since we compute the *minimal* value that we can, we are happy +to leave `'0` as being just equal to the liveness set. This is where +member constraints come in. + +## Choices are always lifetime parameters + +At present, the "choice" regions from a member constraint are always +lifetime parameters from the current function. This falls out from the +placement of impl Trait, though in the future it may not be the case. +We take some advantage of this fact, as it simplifies the current +code. In particular, we don't have to consider a case like `'0 member +of ['1, 'static]`, in which the value of both `'0` and `'1` are being +inferred and hence changing. See [rust-lang/rust#61773] for more +information. + +[#61773]: https://github.com/rust-lang/rust/issues/61773 + +## Applying member constraints + +Member constraints are a bit more complex than other forms of +constraints. This is because they have a "or" quality to them -- that +is, they describe multiple choices that we must select from. E.g., in +our example constraint `'0 member of ['a, 'b, 'static]`, it might be +that `'0` is equal to `'a`, `'b`, *or* `'static`. How can we pick the +correct one? What we currently do is to look for a *minimal choice* +-- if we find one, then we will grow `'0` to be equal to that minimal +choice. To find that minimal choice, we take two factors into +consideration: lower and upper bounds. + +### Lower bounds + +The *lower bounds* are those lifetimes that `'0` *must outlive* -- +i.e., that `'0` must be larger than. In fact, when it comes time to +apply member constraints, we've already *computed* the lower bounds of +`'0` because we computed its minimal value (or at least, the lower +bounds considering everything but member constraints). + +Let `LB` be the current value of `'0`. We know then that `'0: LB` must +hold, whatever the final value of `'0` is. Therefore, we can rule out +any choice `'choice` where `'choice: LB` does not hold. + +Unfortunately, in our example, this is not very helpful. The lower +bound for `'0` will just be the liveness set `{L}`, and we know that +all the lifetime parameters outlive that set. So we are left with the +same set of choices here. (But in other examples, particularly those +with different variance, lower bound constraints may be relevant.) + +### Upper bounds + +The *upper bounds* are those lifetimes that *must outlive* `'0` -- +i.e., that `'0` must be *smaller* than. In our example, this would be +`'a`, because we have the constraint that `'a: '0`. In more complex +examples, the chain may be more indirect. + +We can use upper bounds to rule out members in a very similar way to +lower lower bounds. If UB is some upper bound, then we know that `UB: +'0` must hold, so we can rule out any choice `'choice` where `UB: +'choice` does not hold. + +In our example, we would be able to reduce our choice set from `['a, +'b, 'static]` to just `['a]`. This is because `'0` has an upper bound +of `'a`, and neither `'a: 'b` nor `'a: 'static` is known to hold. + +(For notes on how we collect upper bounds in the implementation, see +[the section below](#collecting).) + +### Minimal choice + +After applying lower and upper bounds, we can still sometimes have +multiple possibilities. For example, imagine a variant of our example +using types with the opposite variance. In that case, we would have +the constraint `'0: 'a` instead of `'a: '0`. Hence the current value +of `'0` would be `{L, 'a}`. Using this as a lower bound, we would be +able to narrow down the member choices to `['a, 'static]` because `'b: +'a` is not known to hold (but `'a: 'a` and `'static: 'a` do hold). We +would not have any upper bounds, so that would be our final set of choices. + +In that case, we apply the **minimal choice** rule -- basically, if +one of our choices if smaller than the others, we can use that. In +this case, we would opt for `'a` (and not `'static`). + +This choice is consistent with the general 'flow' of region +propagation, which always aims to compute a minimal value for the +region being inferred. However, it is somewhat arbitrary. + + + +### Collecting upper bounds in the implementation + +In practice, computing upper bounds is a bit inconvenient, because our +data structures are setup for the opposite. What we do is to compute +the **reverse SCC graph** (we do this lazilly and cache the result) -- +that is, a graph where `'a: 'b` induces an edge `SCC('b) -> +SCC('a)`. Like the normal SCC graph, this is a DAG. We can then do a +depth-first search starting from `SCC('0)` in this graph. This will +take us to all the SCCs that must outlive `'0`. + +One wrinkle is that, as we walk the "upper bound" SCCs, their values +will not yet have been fully computed. However, we **have** already +applied their liveness constraints, so we have some information about +their value. In particular, for any regions representing lifetime +parameters, their value will contain themselves (i.e., the initial +value for `'a` includes `'a` and the value for `'b` contains `'b`). So +we can collect all of the lifetime parameters that are reachable, +which is precisely what we are interested in. + diff --git a/src/borrow_check/region_inference/placeholders_and_universes.md b/src/borrow_check/region_inference/placeholders_and_universes.md new file mode 100644 index 000000000..e40ee11b1 --- /dev/null +++ b/src/borrow_check/region_inference/placeholders_and_universes.md @@ -0,0 +1,441 @@ +# Placeholders and universes + +From time to time we have to reason about regions that we can't +concretely know. For example, consider this program: + +```rust,ignore +// A function that needs a static reference +fn foo(x: &'static u32) { } + +fn bar(f: for<'a> fn(&'a u32)) { + // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference + let x = 22; + f(&x); +} + +fn main() { + bar(foo); +} +``` + +This program ought not to type-check: `foo` needs a static reference +for its argument, and `bar` wants to be given a function that that +accepts **any** reference (so it can call it with something on its +stack, for example). But *how* do we reject it and *why*? + +## Subtyping and Placeholders + +When we type-check `main`, and in particular the call `bar(foo)`, we +are going to wind up with a subtyping relationship like this one: + +```text +fn(&'static u32) <: for<'a> fn(&'a u32) +---------------- ------------------- +the type of `foo` the type `bar` expects +``` + +We handle this sort of subtyping by taking the variables that are +bound in the supertype and replacing them with +[universally quantified](../appendix/background.html#quantified) +representatives, denoted like `!1` here. We call these regions "placeholder +regions" – they represent, basically, "some unknown region". + +Once we've done that replacement, we have the following relation: + +```text +fn(&'static u32) <: fn(&'!1 u32) +``` + +The key idea here is that this unknown region `'!1` is not related to +any other regions. So if we can prove that the subtyping relationship +is true for `'!1`, then it ought to be true for any region, which is +what we wanted. + +So let's work through what happens next. To check if two functions are +subtypes, we check if their arguments have the desired relationship +(fn arguments are [contravariant](../appendix/background.html#variance), so +we swap the left and right here): + +```text +&'!1 u32 <: &'static u32 +``` + +According to the basic subtyping rules for a reference, this will be +true if `'!1: 'static`. That is – if "some unknown region `!1`" lives +outlives `'static`. Now, this *might* be true – after all, `'!1` +could be `'static` – but we don't *know* that it's true. So this +should yield up an error (eventually). + +## What is a universe? + +In the previous section, we introduced the idea of a placeholder +region, and we denoted it `!1`. We call this number `1` the **universe +index**. The idea of a "universe" is that it is a set of names that +are in scope within some type or at some point. Universes are formed +into a tree, where each child extends its parents with some new names. +So the **root universe** conceptually contains global names, such as +the the lifetime `'static` or the type `i32`. In the compiler, we also +put generic type parameters into this root universe (in this sense, +there is not just one root universe, but one per item). So consider +this function `bar`: + +```rust,ignore +struct Foo { } + +fn bar<'a, T>(t: &'a T) { + ... +} +``` + +Here, the root universe would consist of the lifetimes `'static` and +`'a`. In fact, although we're focused on lifetimes here, we can apply +the same concept to types, in which case the types `Foo` and `T` would +be in the root universe (along with other global types, like `i32`). +Basically, the root universe contains all the names that +[appear free](../appendix/background.html#free-vs-bound) in the body of `bar`. + +Now let's extend `bar` a bit by adding a variable `x`: + +```rust,ignore +fn bar<'a, T>(t: &'a T) { + let x: for<'b> fn(&'b u32) = ...; +} +``` + +Here, the name `'b` is not part of the root universe. Instead, when we +"enter" into this `for<'b>` (e.g., by replacing it with a placeholder), we will create +a child universe of the root, let's call it U1: + +```text +U0 (root universe) +│ +└─ U1 (child universe) +``` + +The idea is that this child universe U1 extends the root universe U0 +with a new name, which we are identifying by its universe number: +`!1`. + +Now let's extend `bar` a bit by adding one more variable, `y`: + +```rust,ignore +fn bar<'a, T>(t: &'a T) { + let x: for<'b> fn(&'b u32) = ...; + let y: for<'c> fn(&'b u32) = ...; +} +``` + +When we enter *this* type, we will again create a new universe, which +we'll call `U2`. Its parent will be the root universe, and U1 will be +its sibling: + +```text +U0 (root universe) +│ +├─ U1 (child universe) +│ +└─ U2 (child universe) +``` + +This implies that, while in U2, we can name things from U0 or U2, but +not U1. + +**Giving existential variables a universe.** Now that we have this +notion of universes, we can use it to extend our type-checker and +things to prevent illegal names from leaking out. The idea is that we +give each inference (existential) variable – whether it be a type or +a lifetime – a universe. That variable's value can then only +reference names visible from that universe. So for example if a +lifetime variable is created in U0, then it cannot be assigned a value +of `!1` or `!2`, because those names are not visible from the universe +U0. + +**Representing universes with just a counter.** You might be surprised +to see that the compiler doesn't keep track of a full tree of +universes. Instead, it just keeps a counter – and, to determine if +one universe can see another one, it just checks if the index is +greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see +U2, because 0 >= 2 is false. + +How can we get away with this? Doesn't this mean that we would allow +U2 to also see U1? The answer is that, yes, we would, **if that +question ever arose**. But because of the structure of our type +checker etc, there is no way for that to happen. In order for +something happening in the universe U1 to "communicate" with something +happening in U2, they would have to have a shared inference variable X +in common. And because everything in U1 is scoped to just U1 and its +children, that inference variable X would have to be in U0. And since +X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest +to see by using a kind of generic "logic" example: + +```text +exists { + forall { ... /* Y is in U1 ... */ } + forall { ... /* Z is in U2 ... */ } +} +``` + +Here, the only way for the two foralls to interact would be through X, +but neither Y nor Z are in scope when X is declared, so its value +cannot reference either of them. + +## Universes and placeholder region elements + +But where does that error come from? The way it happens is like this. +When we are constructing the region inference context, we can tell +from the type inference context how many placeholder variables exist +(the `InferCtxt` has an internal counter). For each of those, we +create a corresponding universal region variable `!n` and a "region +element" `placeholder(n)`. This corresponds to "some unknown set of other +elements". The value of `!n` is `{placeholder(n)}`. + +At the same time, we also give each existential variable a +**universe** (also taken from the `InferCtxt`). This universe +determines which placeholder elements may appear in its value: For +example, a variable in universe U3 may name `placeholder(1)`, `placeholder(2)`, and +`placeholder(3)`, but not `placeholder(4)`. Note that the universe of an inference +variable controls what region elements **can** appear in its value; it +does not say region elements **will** appear. + +## Placeholders and outlives constraints + +In the region inference engine, outlives constraints have the form: + +```text +V1: V2 @ P +``` + +where `V1` and `V2` are region indices, and hence map to some region +variable (which may be universally or existentially quantified). The +`P` here is a "point" in the control-flow graph; it's not important +for this section. This variable will have a universe, so let's call +those universes `U(V1)` and `U(V2)` respectively. (Actually, the only +one we are going to care about is `U(V1)`.) + +When we encounter this constraint, the ordinary procedure is to start +a DFS from `P`. We keep walking so long as the nodes we are walking +are present in `value(V2)` and we add those nodes to `value(V1)`. If +we reach a return point, we add in any `end(X)` elements. That part +remains unchanged. + +But then *after that* we want to iterate over the placeholder `placeholder(x)` +elements in V2 (each of those must be visible to `U(V2)`, but we +should be able to just assume that is true, we don't have to check +it). We have to ensure that `value(V1)` outlives each of those +placeholder elements. + +Now there are two ways that could happen. First, if `U(V1)` can see +the universe `x` (i.e., `x <= U(V1)`), then we can just add `placeholder(x)` +to `value(V1)` and be done. But if not, then we have to approximate: +we may not know what set of elements `placeholder(x)` represents, but we +should be able to compute some sort of **upper bound** B for it – +some region B that outlives `placeholder(x)`. For now, we'll just use +`'static` for that (since it outlives everything) – in the future, we +can sometimes be smarter here (and in fact we have code for doing this +already in other contexts). Moreover, since `'static` is in the root +universe U0, we know that all variables can see it – so basically if +we find that `value(V2)` contains `placeholder(x)` for some universe `x` +that `V1` can't see, then we force `V1` to `'static`. + +## Extending the "universal regions" check + +After all constraints have been propagated, the NLL region inference +has one final check, where it goes over the values that wound up being +computed for each universal region and checks that they did not get +'too large'. In our case, we will go through each placeholder region +and check that it contains *only* the `placeholder(u)` element it is known to +outlive. (Later, we might be able to know that there are relationships +between two placeholder regions and take those into account, as we do +for universal regions from the fn signature.) + +Put another way, the "universal regions" check can be considered to be +checking constraints like: + +```text +{placeholder(1)}: V1 +``` + +where `{placeholder(1)}` is like a constant set, and V1 is the variable we +made to represent the `!1` region. + +## Back to our example + +OK, so far so good. Now let's walk through what would happen with our +first example: + +```text +fn(&'static u32) <: fn(&'!1 u32) @ P // this point P is not imp't here +``` + +The region inference engine will create a region element domain like this: + +```text +{ CFG; end('static); placeholder(1) } + --- ------------ ------- from the universe `!1` + | 'static is always in scope + all points in the CFG; not especially relevant here +``` + +It will always create two universal variables, one representing +`'static` and one representing `'!1`. Let's call them Vs and V1. They +will have initial values like so: + +```text +Vs = { CFG; end('static) } // it is in U0, so can't name anything else +V1 = { placeholder(1) } +``` + +From the subtyping constraint above, we would have an outlives constraint like + +```text +'!1: 'static @ P +``` + +To process this, we would grow the value of V1 to include all of Vs: + +```text +Vs = { CFG; end('static) } +V1 = { CFG; end('static), placeholder(1) } +``` + +At that point, constraint propagation is complete, because all the +outlives relationships are satisfied. Then we would go to the "check +universal regions" portion of the code, which would test that no +universal region grew too large. + +In this case, `V1` *did* grow too large – it is not known to outlive +`end('static)`, nor any of the CFG – so we would report an error. + +## Another example + +What about this subtyping relationship? + +```text +for<'a> fn(&'a u32, &'a u32) + <: +for<'b, 'c> fn(&'b u32, &'c u32) +``` + +Here we would replace the bound region in the supertype with a placeholder, as before, yielding: + +```text +for<'a> fn(&'a u32, &'a u32) + <: +fn(&'!1 u32, &'!2 u32) +``` + +then we instantiate the variable on the left-hand side with an +existential in universe U2, yielding the following (`?n` is a notation +for an existential variable): + +```text +fn(&'?3 u32, &'?3 u32) + <: +fn(&'!1 u32, &'!2 u32) +``` + +Then we break this down further: + +```text +&'!1 u32 <: &'?3 u32 +&'!2 u32 <: &'?3 u32 +``` + +and even further, yield up our region constraints: + +```text +'!1: '?3 +'!2: '?3 +``` + +Note that, in this case, both `'!1` and `'!2` have to outlive the +variable `'?3`, but the variable `'?3` is not forced to outlive +anything else. Therefore, it simply starts and ends as the empty set +of elements, and hence the type-check succeeds here. + +(This should surprise you a little. It surprised me when I first realized it. +We are saying that if we are a fn that **needs both of its arguments to have +the same region**, we can accept being called with **arguments with two +distinct regions**. That seems intuitively unsound. But in fact, it's fine, as +I tried to explain in [this issue][ohdeargoditsallbroken] on the Rust issue +tracker long ago. The reason is that even if we get called with arguments of +two distinct lifetimes, those two lifetimes have some intersection (the call +itself), and that intersection can be our value of `'a` that we use as the +common lifetime of our arguments. -nmatsakis) + +[ohdeargoditsallbroken]: https://github.com/rust-lang/rust/issues/32330#issuecomment-202536977 + +## Final example + +Let's look at one last example. We'll extend the previous one to have +a return type: + +```text +for<'a> fn(&'a u32, &'a u32) -> &'a u32 + <: +for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32 +``` + +Despite seeming very similar to the previous example, this case is going to get +an error. That's good: the problem is that we've gone from a fn that promises +to return one of its two arguments, to a fn that is promising to return the +first one. That is unsound. Let's see how it plays out. + +First, we replace the bound region in the supertype with a placeholder: + +```text +for<'a> fn(&'a u32, &'a u32) -> &'a u32 + <: +fn(&'!1 u32, &'!2 u32) -> &'!1 u32 +``` + +Then we instantiate the subtype with existentials (in U2): + +```text +fn(&'?3 u32, &'?3 u32) -> &'?3 u32 + <: +fn(&'!1 u32, &'!2 u32) -> &'!1 u32 +``` + +And now we create the subtyping relationships: + +```text +&'!1 u32 <: &'?3 u32 // arg 1 +&'!2 u32 <: &'?3 u32 // arg 2 +&'?3 u32 <: &'!1 u32 // return type +``` + +And finally the outlives relationships. Here, let V1, V2, and V3 be the +variables we assign to `!1`, `!2`, and `?3` respectively: + +```text +V1: V3 +V2: V3 +V3: V1 +``` + +Those variables will have these initial values: + +```text +V1 in U1 = {placeholder(1)} +V2 in U2 = {placeholder(2)} +V3 in U2 = {} +``` + +Now because of the `V3: V1` constraint, we have to add `placeholder(1)` into `V3` (and +indeed it is visible from `V3`), so we get: + +```text +V3 in U2 = {placeholder(1)} +``` + +then we have this constraint `V2: V3`, so we wind up having to enlarge +`V2` to include `placeholder(1)` (which it can also see): + +```text +V2 in U2 = {placeholder(1), placeholder(2)} +``` + +Now constraint propagation is done, but when we check the outlives +relationships, we find that `V2` includes this new element `placeholder(1)`, +so we report an error.