-
Notifications
You must be signed in to change notification settings - Fork 243
Notes on optimizations for type inference and memoization
Notes from a discussion with Leo about type inference and memoization of type-checking
- An efficient, optimistic type-checker
When computing a solution to a meta-variable (G |- ?m : t) := e,
Lean 3 uses a fast check to ensure that e has type t in context
G.
In Lean, this fast check is called infer_type with the property that
if infer_type G e = t then if e is well-typed, then it has
type t in context G.
For example, this means that if
e = f e1 .. en
and
f : (x1:t1) -> ... -> (xn:tn) -> t
then infer_type G e returns t [e1..en/x1..xn] after only inferring
the type of f and not inspecting any of the arguments e1 .. en.
This relies on a meta-property of the unification algorithm, which is
that any assignment to a meta-variable computed by the unification
algorithm is well-typed in the environment of the meta-variable---what
needs to be checked is that it is well-typed at the expected type of
the meta-variable and infer_type efficiently computes a type which
can then be compared to the expected type for definitional equality.
Some examples from F*:
a.
t : int -> Type
f : #x:nat -> t x -> Type
z : t -1
====================================
f #?u z
In this context, we would solve ?u := -1
and infer_type _ -1 returns int since - : int -> int.
So, the fast check fails (rightfully) and one can fall back on the current slow path, which should also rightfully fail.
b.
t : int -> Type
f : #x:nat -> t x -> Type
y : int
z : t y
====================================
if (y >= 0) then f #?u z
In this context, (?u := y)
and infer_type _ y returns int since y : int.
So, the fast check fails and one can fall back on the current slow path, which, this time, would succeed, correctly.
c. Consider applying a split tactic to a goal of the form
(a /\ b) /\ (c /\ d) where a, b, c, d are large terms
and split: #p:Type -> #q:Type -> p -> q -> p /\ q. This time,
we have ?p := (a /\ b) and ?q = (c /\ d) and infer_type _ (a /\ b)
quickly returns Type just from the type of /\, independently of a
and b. Likewise for (c /\ d).
- Memoization of type-checking
The optimization above is not applicable in the Lean kernel, since meta-variables do not even appear in the kernel.
Consider again a goal of the form (a /\ b) /\ (c /\ d) for large
terms a, b, c, d.
And consider splitting the goal twice producing a proof term of the form
split #(a /\ b) #(c /\ d) (split #a #b pf_a pf_b) (split #c #d pf_c pf_d)
where `split: #p:Type -> #q:Type -> p -> q -> p /\ q`
Checking the proof term above using F*'s naive strategy is very
inefficient, since it requires type-checking the large term a
multiple times.
The Lean kernel memoizes the type-checking of terms but using a locally nameless representation of terms (like F*) where every name in a term is accompanied by its type (i.e., the sort field of a bv). This means that terms can be type-checked independently of their environment.
Note, this only works for terms that are known to be well-scoped to
begin with. So, when type-checking a term e in a context G, the
free variables of e (which are memoized in the representation of
e, as in F*) are checked to make sure they are included in G and
then e is type-checked independently of G, with memoization.
See this page for more discussions on memoization.
- Representation of meta-variables used in unification
In F* currently (as in Lean 2) a meta-variable introduced in
context is introduced at a pi-type. i.e., in a context G, a
meta-variable ?m is introduced at type G -> t and, typically,
we generate a unification problem of the form
(?m : G -> t) (x0 : t0) ... (xn : tn) =?= e
This is representation, although very general, has many downsides:
-
It is very redundant. The entire context
Gis replicated in the(x0:t0) .. (xn:tn)and, worse, occurrences of other meta-variables of a similar shape intilead to a combinatorial blowup. -
Even a simple solution for
?mproduces a large lambda term, which must be checked. -
Checking for various common special cases, e.g., checking to see if (?m : G -> t) (x0 : t0) ... (xn : tn) is a pattern involves a quadratic check of the
xito check for no duplicates.
Instead, in Lean 3, a meta-variable introduced in a context has
the form G |- ?m : t, i.e., a meta-variable introduced in
context G at type t.
a. This avoids redundancy, since the context G is shared as
usual and no additional occurrences of the bindings in G are
produced.
b. Simple solutions for ?m do not introduce large lambda terms.
c. A meta-variable in the form G |- ?m : t is always a pattern,
no check required.