Bidirectional Type Checkers for λ→ and λΠ
This week I learned that my clever trick for writing a type checker actually has a proper name: bidirectional type checking. In this post I’ll explain what exactly that is and we’ll use it to write a few fun type checkers.
First of all, let’s talk about one of the fundamental conflicts when designing a statically typed language: how much information need we demand from the user? Clearly we can go too far in either direction. Even people who are supposedly against type inference support at least some inference. I’m not aware of a language that requires you to write something like
my_function((my_var : int) + (1 : int) : int) : string
Clearly inferring the types of some expressions are necessary. On the other hand, if we leave out all type annotations then it becomes a lot harder for a human reader to figure out what’s going on! I at least, need to see signatures for top level functions or I become grumpy.
So inside a type checker we always have two sort of processes
- I know this must have the type T, I’ll check to make sure this is the case
- I have no idea what the type of this expression is, I’ll examine the expression to figure it out
In a bidirectional type checker, we acknowledge these two phases by explicitly separating the type checker into two functions
Our type checker thus has two directions, one where we use the type to validate the expression (the type flows in) or we synthesize the type form the expression (the type flows out). That’s all that this is!
It turns out that a technique like this is surprisingly robust. It handles everything from subtyping to simple dependent types! To see how this actually plays out I think it’d be best to just dive in and do something with it.
Laying Out Our Language
Now when we’re building a bidirectional type checker we really want our AST to explicitly indicate inferrable vs checkable types. Clearly the parser might not care so much about this distinction, but prior to type checking it’s helpful to create this polarized tree.
For a simple language you can imagine
data Ty = Bool
| Arr Ty Ty
deriving(Eq, Show)
data IExpr = Var Int
| App IExpr CExpr
| Annot CExpr Ty
| If CExpr IExpr IExpr
| ETrue
| EFalse
data CExpr = Lam CExpr
| CI IExpr
This is just simply typed lambda calculus with booleans. We’re using DeBruijn indices so we need not specify a variable for Lam
. The IExpr
type is for expressions we can infer types for, while CExpr
is for types we can check.
Much this isn’t checking, we can always infer the types of variables, inferring the types of lambdas is hard, etc. Something worth noting is CI
. For any inferrable type, we can make it checkable by inferring a type and checking that it’s equal to what we expected. This is actually how Haskell works, GHC is just inferring type without bothering with your signature and then just checks you were right in the first place!
Now that we’ve separated out our expressions, we can easily define our type checker.
type Env = [Ty]
(?!) :: [a] -> Int -> Maybe a
xs ?! i = if i < length xs then Just (xs !! i) else Nothing
inferType :: Env -> IExpr -> Maybe Ty
inferType env (Var i) = env ?! i
inferType env (App l r) =
case inferType env l of
Just (Arr lTy rTy) -> checkType env r lTy >> return rTy
_ -> Nothing
inferType env (Annot e an) = checkType env e an >> return an
inferType _ ETrue = return Bool
inferType _ EFalse = return Bool
inferType env (If i t e) = do
checkType env i Bool
lTy <- inferType env t
rTy <- inferType env e
guard (lTy == rTy)
return lTy
checkType :: Env -> CExpr -> Ty -> Maybe ()
checkType env (Lam ce) (Arr l r) = checkType (l : env) ce r
checkType env (CI e) t = inferType env e >>= guard . (t ==)
checkType _ _ _ = Nothing
So our type checker doesn’t have many surprises in it. The environment is easy to maintain since DeBruijn indices are easily stored in a list.
Now that we’ve seen how a bidirectional type checker more or less works, let’s kick it up a notch.
Type Checking Dependent Types
Type checking a simple dependently typed language is actually not nearly as bad as you’d expect. The first thing to realize is that since dependent types have only one syntactic category.
We maintain the distinction between inferrable and checkable values, resulting in
data IExpr = Var Int
| App IExpr CExpr
| Annot CExpr CExpr
| ETrue
| EFalse
| Bool
| Star -- New stuff starts here
| Pi CExpr CExpr
| Const String
| Free Int
deriving (Eq, Show, Ord)
data CExpr = Lam CExpr
| CI IExpr
deriving (Eq, Show, Ord)
So you can see we’ve added 4 new expressions, all inferrable. Star
is just the kind of types as it is in Haskell. Pi
is the dependent function type, it’s like Arr
, except the return type can depend on the supplied value.
For example, you can imagine a type like
Which says something like “give me an integer n
and a value and I’ll give you back a list of length n
”.
Interestingly, we’ve introduce constants. These are necessary simply because without them this language is unbelievable boring. Constants would be defined in the environment and they represent constant, irreducible terms. You should think of them almost like constructors in Haskell. For example, one can imagine that 3 constants
Which serve to define the natural numbers.
Last but not least, we’ve added “free variables” as an explicit
Now an important piece of a type checker is comparing types for equality, in STLC, equivalent types are syntactically equal so that was solved with deriving Eq
. Here we need a bit more subtlety. Indeed, now we need to check arbitrary expressions for equality! This is hard. We’ll reduce things as much as possible and then just check syntactic equality. This means that if True then a else b
would equal a
as we’d hope, but \x -> if x then a else a
wouldn’t.
Now in order to facilitate this check we’ll define a type for fully reduced expressions. Since we’re only interested in checking equality on these terms we can toss the inferrable vs checkable division out the window.
data VConst = CAp VConst Val
| CVar String
| CFree Int
data Val = VStar
| VBool
| VTrue
| VFalse
| VConst VConst
| VArr Val Val
| VPi Val (Val -> Val)
| VLam (Val -> Val)
| VGen Int
Now since we have constants we can have chains of application that we can’t reduce, that’s what VConst
is. Notice that this handles the case of just having a constant nicely.
The value dichotomy uses a nice trick from the “Simple Easy!” paper, we use HOAS to have functions that reduce themselves when applied. The downside of this is that we need VGen
to peek inside the now opaque VLam
and VPi
. The idea is we’ll generate a unique Int
and apply the functions to VGen i
.
Now in order to conveniently generate these fresh integers I used monad-gen
(it’s not self promotion if it’s useful :). Equality checking comes to
-- *Whistle and fidget with hands*
instance Enum Val where
toEnum = VGen
fromEnum _ = error "You're a bad person."
eqTerm :: Val -> Val -> Bool
eqTerm l r = runGen $ go l r
where go VStar VStar = return True
go VBool VBool = return True
go VTrue VTrue = return True
go VFalse VFalse = return True
go (VArr f a) (VArr f' a') = (&&) <$> go f f' <*> go a a'
go (VLam f) (VLam g) = gen >>= \v -> go (f v) (g v)
go (VPi f) (VPi g) = gen >>= \v -> go (f v) (g v)
go (VGen i) (VGen j) = return (i == j)
go (VConst c) (VConst c') = case (c, c') of
(CVar v, CVar v') -> return (v == v')
(CAp f a, CAp f' a') ->
(&&) <$> go (VConst f) (VConst f') <*> go a a'
_ -> return False
go _ _ = return False
Basically we just recurse and return true or false at the leaves.
Now that we know how to check equality of values, we actually need to map terms into those values. This involves basically writing a little interpreter.
inf :: [Val] -> IExpr -> Val
inf _ ETrue = VTrue
inf _ EFalse = VFalse
inf _ Bool = VBool
inf _ Star = VStar
inf _ (Free i) = VConst (CFree i)
inf _ (Const s) = VConst (CVar s)
inf env (Annot e _) = cnf env e
inf env (Var i) = env !! i
inf env (Pi l r) = VPi (cnf env l) (\v -> cnf (v : env) r)
inf env (App l r) =
case inf env l of
VLam f -> f (cnf env r)
VConst c -> VConst . CAp c $ cnf env r
_ -> error "Impossible: evaluated ill-typed expression"
cnf :: [Val] -> CExpr -> Val
cnf env (CI e) = inf env e
cnf env (Lam c) = VLam $ \v -> cnf (v : env) c
The interesting cases are for Lam
, Pi
, and App
. For App
we actually do reductions wherever we can, otherwise we know that we’ve just got a constant so we slap that on the front. Lam
and Pi
are basically the same, they wrap the evaluation of the body in a function and evaluate it based on whatever is fed in. This is critical, otherwise App
’s reductions get much more complicated.
We need one final thing. You may have noticed that all Val
’s are closed, there’s no free DeBruijn variables. This means that when we go under a binder we can’t type check open terms since we’re representing types as values and the term we’re checking shares variables with its type.
This means that our type checker when it goes under a binder is going to substitute the now free variable for a fresh Free i
. Frankly, this kinda sucks. I poked about for a better solution but this is what “Simple Easy!” does too..
To do these substitutions we have
ibind :: Int -> IExpr -> IExpr -> IExpr
ibind i e (Var j) | i == j = e
ibind i e (App l r) = App (ibind i e l) (cbind i e r)
ibind i e (Annot l r) = Annot (cbind i e l) (cbind i e r)
ibind i e (Pi l r) = Pi (cbind i e l) (cbind i e r)
ibind _ _ e = e -- Non recursive cases
cbind :: Int -> IExpr -> CExpr -> CExpr
cbind i e (Lam b) = Lam (cbind (i + 1) e b)
cbind i e (CI c) = CI (ibind i e c)
This was a bit more work than I anticipated, but now we’re ready to actually write the type checker!
Since we’re doing bidirectional type checking, we’re once again going to have two functions, inferType
and checkType
. Our environments is now a record
The inferring stage is mostly the same
inferType :: Env -> IExpr -> GenT Int Maybe Val
inferType _ (Var _) = lift Nothing -- The term is open
inferType (Env _ m) (Const s) = lift $ M.lookup s m
inferType (Env m _) (Free i) = lift $ M.lookup i m
inferType _ ETrue = return VBool
inferType _ EFalse = return VBool
inferType _ Bool = return VStar
inferType _ Star = return VStar
inferType env (Annot e ty) = do
checkType env ty VStar
let v = cnf [] ty
checkType env e v >> return v
inferType env (App f a) = do
ty <- inferType env f
case ty of
VPi aTy body -> do
checkType env a aTy
return (body $ cnf [] a)
_ -> lift Nothing
inferType env (Pi ty body) = do
checkType env ty VStar
i <- gen
let v = cnf [] ty
env' = env{locals = M.insert i v (locals env)}
checkType env' (cbind 0 (Free i) body) VStar
return VStar
The biggest difference is that now we have to compute some types on the fly. For example in Annot
we check that we are in fact annotating with a type, then we reduce it to a value. This order is critical! Remember that cnf
requires well typed terms.
Beyond this there are two interesting cases, there’s App
which nicely illustrates what a pi type means and Pi
which demonstrates how to deal with a binder.
For App
we start in the same way, we grab the (function) type of the function. We can then check that the argument has the right type. To produce the output type however, we have to normalize the argument as far as we can and then feed it to body
which computes the return type. Remember that if there’s some free variable in a
then it’ll just be represented as VConst (CFree ...)
.
Pi
checks that we’re quantifying over a type first off. From there it generates a fresh free variable and updates the environment before recursing. We use cbind
to replace all occurrences of the now unbound variable for an explicit Free
.
checkType
is pretty trivial after this. Lam
is almost identical to Pi
and CI
is just eqTerm
.
checkType :: Env -> CExpr -> Val -> GenT Int Maybe ()
checkType env (CI e) v = inferType env e >>= guard . eqTerm v
checkType env (Lam ce) (VPi argTy body) = do
i <- gen
let ce' = cbind 0 (Free i) ce
env' = env{locals = M.insert i argTy (locals env)}
checkType env' ce' (body $ VConst (CFree i))
checkType _ _ _ = lift Nothing
And that’s it!
Wrap Up
So let’s circle back to where we started: bidirectional type checking! Hopefully we’ve seen how structuring a type checker around these two core functions yields something quite pleasant.
What makes this really interesting though is how well it scales. You can use this style type checker to handle subtyping, [dependent] pattern matching, heaps and tons of interesting features.
At 400 lines though, I think I’ll stop here :)
comments powered by Disqus