Cooking λΠ 3 ways
After my last post, I didn’t quite feel like ending there. I was a little dissatisfied with how binding was handled in the type checker, the odd blend of HOAS, GUIDs, and DeBruijn variables was… unique.
In the post I explore 3 versions of the same code
- The original method
- Using
bound
to handle all binding - Full HOAS
There’s a lot of code in this post, enough that I think it’s worth hosting the code on its own. You can find it on github and bitbucket.
The Original
I’ve already described most of the original method here. To recap
- Values were HOAS
- Terms were DeBruijn
- To bridge the gap, we had “free constants” randomly generated
The issue I had with this is we almost got the worst of all 3 worlds! We were constantly bumping a counter to keep up with the free constants we needed to generate. We had to muddy up the types of values with another notion of free constants so we could actually inspect variables under HOAS binders! And finally, we had to do the painful and tedious substitutions on DeBruijn terms.
On the other hand, if you’d never used any of those binding schemes together, you too can go triple or nothing and try to understand that code :)
What I really wanted was to unify how I represented values and terms. I still wanted a clearly correct notion of equality, but in this way I could probably dodge at least two of the above.
The obvious thing to do would be to stick with DeBruijn variables and just instantiate free variables with constants. This is ugly, but it’s moderately less horrible if we use a library to help us with the process.
bound
So my first stab at this approach was with Edward Kmett’s bound. For those who aren’t familiar with this library, it centers around the data type Scope
. Scope b f a
binds variables of type b
in the structure f
with free variables of type a
. The assumption is that f
will be a monad which represents our AST.
Further, f
is parameterized over variables, it doesn’t attempt to distinguish between bound and free ones however. This means that >>=
corresponds to substitution. Then what Scope
does is instantiate these variables to B b a
which is precisely equivalent to Either b a
.
What this results in is that each free variable is a different type from bound ones. Scope
provides various functions for instantiating bound variables and abstracting over free ones. That’s bound
in a nutshell.
It’s a bit easier to grok this by example, here’s our calculus ported to use Scope
data Expr a = Var a
| App (Expr a) (Expr a)
| Annot (Expr a) (Expr a)
| ETrue
| EFalse
| Bool
| Star
| Pi (Expr a) (Scope () Expr a)
| Lam (Scope () Expr a)
| C String
deriving(Functor, Eq)
So the first major difference is that our polarization between inferrable and checkable terms is gone! This wasn’t something I was happy about, but in order to use Scope
we need a monad instance and we can’t define two mutually dependent monad instances without a function from CExpr -> IExpr
, something that clearly doesn’t exist.
Since each binder can only bind one variable at a time, we represent the newly bound variable as just ()
. This would be more complicated if we supported patterns or something similar.
Now in addition to just this, we also need a bunch of boilerplate to define some type class instances for Scope
’s benefit.
instance Eq1 Expr where (==#) = (==)
instance Applicative Expr where
pure = return
(<*>) = ap
instance Monad Expr where
return = Var
Var a >>= f = f a
(App l r) >>= f = App (l >>= f) (r >>= f)
ETrue >>= _ = ETrue
EFalse >>= _ = EFalse
Bool >>= _ = Bool
Star >>= _ = Star
C s >>= _ = C s
Annot l r >>= f = Annot (l >>= f) (r >>= f)
Pi l s >>= f = Pi (l >>= f) (s >>>= f)
Lam e >>= f = Lam (e >>>= f)
That weird >>>=
is just >>=
that works through Scope
s. It’s a little bit frustrating that we need this somewhat boilerplate-y monad instance, but I think the results might be worth it.
From here we completely forgo an explicit Val
type. We’re completely scrapping that whole HOAS and VConst
ordeal. Instead we’ll just trust Scope
’s clever Eq
instance to handle alpha conversion. We do need to implement normalization though
type Val = Expr
nf :: Expr a -> Val a
nf = \case
(Annot e t) -> nf e -- Important, nf'd data throws away annotations
(Lam e) -> Lam (toScope . nf . fromScope $ e)
(Pi l r) -> Pi (nf l) (toScope . nf . fromScope $ r)
(App l r) ->
case l of
Lam f -> nf (instantiate1 r f)
l' -> App l' (nf r)
e -> e
What’s interestingly different is actual work is shifted from within the higher order binders we had before into the case expression in App
.
It’s also worth mentioning the few bound specifics here. toScope
and fromScope
expose the underlying f (V b a)
that a Scope
is hiding. We’re then can polymorphically recur (eat your heart out sml) over the now unbound variables and continue on our way.
Again, notice that I’ve defined nothing to do with substitution or scoping, this is all being handled by bound.
Now our actual type checker is still essentially identical. We’re still using monad-gen
to generate unique variable names, it’s just that now bound
handles the messy substitution. The lack of distinction between inferrable, checkable, and normalized terms did trip me up once our twice though.
data Env = Env { localVars :: M.Map Int (Val Int)
, constants :: M.Map String (Val Int) }
type TyM = ReaderT Env (GenT Int Maybe)
unbind :: (MonadGen a m, Functor m, Monad f) => Scope () f a -> m (a, f a)
unbind scope = ((,) <*> flip instantiate1 scope . return) <$> gen
unbindWith :: Monad f => a -> Scope () f a -> f a
unbindWith = instantiate1 . return
inferType :: Expr Int -> TyM (Val Int)
inferType (Var i) = asks (M.lookup i . localVars) >>= maybe mzero return
inferType (C s) = asks (M.lookup s . constants) >>= maybe mzero return
inferType ETrue = return Bool
inferType EFalse = return Bool
inferType Bool = return Star
inferType Star = return Star
inferType (Lam _) = mzero -- We can only check lambdas
inferType (Annot e ty) = do
checkType ty Star
let v = nf ty
v <$ checkType e v
inferType (App f a) = do
ty <- inferType f
case ty of
Pi aTy body -> nf (App (Lam body) a) <$ checkType a aTy
_ -> mzero
inferType (Pi t s) = do
checkType t Star
(newVar, s') <- unbind s
local (\e -> e{localVars = M.insert newVar (nf t) $ localVars e}) $
Star <$ checkType s' Star
checkType :: Expr Int -> Val Int -> TyM ()
checkType (Lam s) (Pi t ts) = do
(newVar, s') <- unbind s
local (\e -> e{localVars = M.insert newVar (nf t) $ localVars e}) $
checkType s' (nf $ unbindWith newVar ts)
checkType e t = inferType e >>= guard . (== t)
I defined two helper functions unbind
and unbindWith
which both ease the process of opening a scope and introducing a new free variable. I actually split these off into a tiny library, but I haven’t uploaded it to hackage yet.
- Code size decreased by ~50 lines
- No more explicit substitution
- All the annoying plumbing is in the monad instance which is pretty mechanical
- We did lose the really nice separation of terms we had before though :(
I suppose that 4. would be a nonissue for a lot of people who don’t care about bidirectional type checkers.
HOAS
Higher order abstract syntax is a really nifty trick. The idea is that Haskell already has a perfectly good notion of variables and substitution lying around! Let’s just use that. We represent our functions with actual ->
s and we don’t have a constructor for variables anymore.
The only issue is that Haskell doesn’t let us inspect the bodies of functions. We need to do this, however, for a type checker! To deal with this we dirty our AST a bit and add in IGen
’s, placeholders for where normal Haskell variables would normally go. Our new AST looks like this
data Expr = App Expr Expr
| Annot Expr Expr
| ETrue
| EFalse
| Bool
| Star
| Pi Expr (Expr -> Expr)
| Lam (Expr -> Expr)
| C String
| IGen Int
type NF = Expr
Notice how both Pi
and Lam
have functions embedded in them. Now normalization is actually quite slick because functions are easy to work with in Haskell
nf :: Expr -> NF
nf ETrue = ETrue
nf EFalse = EFalse
nf Bool = Bool
nf Star = Star
nf (C s) = C s
nf (IGen i) = IGen i
nf (Annot l _) = nf l
nf (Pi t f) = Pi (nf t) (nf . f)
nf (Lam f) = Lam (nf . f)
nf (App l r) = case nf l of
Lam f -> nf . f $ l
l' -> App l' (nf r)
This is actually quite similar to the Val
type we started with. That was also used HOAS and we end up with a similarly structured normalization.
For the same reason, the equivalence checking procedure is pretty much the same thing
eqTerm :: NF -> NF -> Bool
eqTerm l r = runGenWith (successor s) (IGen 0) $ go l r
where s (IGen i) = IGen (i + 1)
s _ = error "Impossible!"
go Star Star = return True
go Bool Bool = return True
go ETrue ETrue = return True
go EFalse EFalse = return True
go (Annot l r) (Annot l' r') = (&&) <$> go l l' <*> go r r'
go (App l r) (App l' r') = (&&) <$> go l l' <*> go r r'
go (Pi t f) (Pi t' g) =
(&&) <$> go t t' <*> (gen >>= \v -> go (f v) (g v))
go (IGen i) (IGen j) = return (i == j)
go _ _ = return False
In fact, the only differences are that
- There are a few more cases, even though they won’t ever be called
- We don’t need that horrible top level
Enum
instance
The only reason for two is that the amazing maintainer of monad-gen
(hi!) rejiggered some the library to not be so Enum
dependent.
Now from here our type checker is basically what we had before. In the interest of saving time, I’ll highlight the interesting bits: the constructors that bind variables.
data Env = Env { localVars :: M.Map Int NF
, constants :: M.Map String NF }
type TyM = GenT Int (ReaderT Env Maybe)
inferType :: Expr -> TyM NF
inferType (Pi t f) = do
checkType t Star
let t' = nf t
i <- gen
local (\e -> e{localVars = M.insert i t' $ localVars e}) $
Star <$ checkType (f $ IGen i) Star
checkType :: Expr -> NF -> TyM ()
checkType (Lam f) (Pi t g) = do
i <- gen
let t' = nf t
rTy = nf (g $ IGen i)
local (\e -> e{localVars = M.insert i t' $ localVars e}) $
checkType (f $ IGen i) rTy
At this point you may have started to notice the pattern, the only real difference here is that substitution is completely free. Otherwise, I don’t really have much to say about HOAS.
Wrap Up
In conclusion, I think we can all agree that the original version of this type checker was unpleasant to say the least. It did considerably improve with bound
mostly because the normalize-and-compare equivalence checking is really easy since bound
handles alpha conversion. On the other hand, actually doing work beneath a binder is a bit of a pain since we have to take care to never unwrap a binder with a previously bound variable. We handled this with a hacky little trick with monad-gen
, but a permanent and clean solution still seems hard.
We can avoid this fully by hitching a ride on Haskell’s variables and substitution using HOAS, this is wonderful until it’s not. The issue is that comparing functions for equality is still a pain so we ended up with an equivalence check much like what we had in the original version.
In the future it’d be interesting to try this with unbound
, a library in the same domain as bound
with a very different approach.