In this post I’d just like to walk through some fun code, nothing particularly theoryy. The code I’d like to go through is a simple little module in ML that lets you easily construct “dynamic” types. This isn’t through the usual “really big sum of products” approach but instead is completely open and can be extended for every new defined type (at runtime).
The basic idea behind this trick hinges on how exceptions work in SML. Well, really it’s not about exceptions so much as what exceptions work with. In ML we can declare new exceptions like this
exception Foo of tyarg
and this gives us a new exception constructor Foo
and we can raise and handle it like you would expect
(raise (Foo 1)) handle Foo x => x
But what’s particularly interesting is that Foo
actually has a type. Really it’s just a constructor for a special type exn
. This means we can do things like pass around exception constructors, apply them, etc, etc.
exn
is what we might call an extensible data type, we can extend it arbitrarily. We could imagine allowing users to define their own such types but in SML we’ve just go the one. The reason we even have this one is because it’s a great choice if you can only allow one type to be raised and handled.
What we’re going to do is use the fact that we can generate new extensions to exn
at run time to create an exn
based structure providing a way to implement “tags”. Once we have these tags we’ll be able to implement a pair of functions
val tag : 'a tag > 'a > dynamic
val untag : 'a tag > dynamic > 'a option
So tags let us “forget” the type of some expression and treat it as some dynamic blob to be recovered at some time in the future. Concretely, we’d like to implement this signature
signature TAG =
sig
type dynamic
type 'a tag
val new : unit > 'a tag
val tag : 'a tag > 'a > dynamic
val untag : 'a tag > dynamic > 'a option
end
So let’s start implementing the thing. First we need to decide what the type dynamic
should be. I propose that it should be exn
. The reason being that we can always extend exn
in various ways so if we implement things with dynamic = exn
we’ll have the ability to make dynamic
“grow a new branch” to accommodate whatever we’re working with.
structure Tag :> TAG =
struct
type dynamic = exn
end
Ok, so what should tag
be? Well it’s going to be type indexed obviously so that we can even talk about the signatures of (un)tag
, but more importantly its purpose should be to tell us how to package something up into an exn
so we can get it back out. The downside of this whole extensible data type thing is that if we forget about the constructor we used to make an exn
it’s just lost forever! A tag
will make sure that once we make a constructor to use with dynamic
we won’t find ourselves with a dynamic
and no way to inspect it.
The best way I can think of for doing this is to just back the (un)tag
operations straight into the implementation of the type.
structure Tag :> TAG =
struct
type dynamic = exn
type 'a tag = {into : 'a > exn, out : exn > 'a option}
end
Now this makes it look like tags could perform arbitrary operations in the process of tagging and untagging, but really we’re going to implement it so it’s all very simple and efficient.
In particular, we’re now in a position to define our three core operators
structure Tag :> TAG =
struct
type dynamic = exn
type 'a tag = {into : 'a > exn, out : exn > 'a option}
fun new () : 'a tag =
let
exception Fresh of 'a
in
{ into = Fresh
, out = fn e =>
case e of Fresh a => SOME a  _ => NONE
}
end
fun tag {into, out} = into
fun untag {into, out} = out
end
Now tag and untag are pretty simple because we basically implemented them up in new
so let’s look carefully at that. We start by first minting a new constructor for exn
. We know that this will not clash with any other exception in existence, no one else can raise it or handle it unless we explicitly give them this constructor. Now while we have access to it, we bundle the constructor into the tag
record we’re making.
into
is quite easy to implement because it’s just constructor application. out
is also straightforward, all we do is pattern match to see if the given exn
is correct. All we do in the actual matching bit is see if we’ve been given something made with our Fresh
constructor and return the included a
if we did. The handling everything else is important, otherwise this would explode horribly every time we failed to untag something.
And there’s a nice way of implementing the same sort of run time typing you get in dynamic languages in SML. One nice advantage of this over the usual
datatype dynamic = INT of int  STRING of string  ...
approach is we can always extend our dynamic
with user defined types. So we can do something like
datatype foo = Foo of int
val fooTag = Tag.new () : foo tag
val d = Tag.tag fooTag (Foo 2)
val SOME (Foo 2) = Tag.untag fooTag d
There you go, this is just a very short post on a very short piece of code that let’s us do something fun. Some nice things you can do now
dynamic
to write an infinite loop without direct recursionexn
and using the generative effect of allocating a reference insteadSo summer seems to be about over. I’m very happy with mine, I learned quite a lot. In particular over the last few months I’ve been reading and fiddling with a different kind of type theory than I was used to: computational type theory. This is the type theory that underlies Nuprl (or JonPRL cough cough).
One thing that stood out to me was that you could do all these absolutely crazy things in this system that seemed impossible after 3 years of Coq and Agda. In this post I’d like to sketch some of the philosophical differences between CTT and a type theory more in the spirit of CiC.
First things first, let’s go over the more familiar notion of type theory. To develop one of these type theories you start by discussing some syntax. You lay out the syntax for some types and some terms
A ::= Σ x : A. A  Π x : A. A  ⊤  ⊥  ...
M ::= M M  λ x : A. M  <M, M>  π₁ M  ⋆  ...
And now we want to describe the all important M : A
relation. This tells us that some term has some type. It’s is inductively defined from a finite set of inferences. Ideally, it’s even decidable for philosophical reasons I’ve never cared too much about. In fact, it’s this relation that really governs our whole type theory, everything else is going to stem from this.
As an afterthought, we may decide that we want to identify certain terms which other terms this is called definitional equality. It’s another inductively defined (and decidable) judgment M ≡ N : A
. Two quick things to note here
M ≡ N : A
is independent of the complexity of A
The last point is some concern because it means that equality for functions is never going to be right for what we want. We have this uniformly complex judgment M ≡ N : A
but when A = Π x : B. C
the complexity should be greater and dependent on the complexity of B
and C
. That’s how it works in math after all, equality at functions is defined pointwise, something we can’t really do here if ≡
is to be decidable or just be of the same complexity no matter the type.
Now we can do lots of things with our theory. One thing we almost always want to do is now go back and build an operational semantics for our terms. This operational semantics should be some judgment M ↦ M
with the property that M ↦ N
will imply that M ≡ N
. This gives us some computational flavor in our type theory and lets us run the pieces of syntax we carved out with M : A
.
But these terms that we’ve written down aren’t really programs. They’re just serializations of the collections of rules we’ve applied to prove a proposition. There’s no ingrained notion of “running” an M
since it’s built on after the fact. What we have instead is this ≡
relation which just specifies which symbols we consider equivalent but even it is was defined arbitrarily. There’s no reason we ≡
needs to be a reasonable term rewriting system or anything. If we’re good at our jobs it will be, sometimes (HoTT) it’s not completely clear what that computation system is even though we’re working to find it. So I’d describe a (good) formal type theory as an axiomatic system like any other that we can add a computational flavor to.
This leads to the first interpretation of the propsastypes correspondence. This states that the inductively defined judgments of a logic give rise to a type theory whose terms are proof terms for those same inductively defined judgments. It’s an identification of similar looking syntactic systems. It’s useful to be sure if you want to develop a formal type theory, but it gives us less insight into the computational nature of a logic because we’ve reflected into a type theory which we have no reason to suspect has a reasonable computational characterization.
Now we can look at a second flavor of type theory. In this setting the way we order our system is very different. We start with an programming language, a collection of terms and an untyped evaluation relation between them. We don’t necessarily care about all of what’s in the language. As we define types later we’ll say things like “Well, the system has to include at least X” but we don’t need to exhaustively specify all of the system. It follows that we have actually no clue when defining the type theory how things compute. They just compute somehow. We don’t really even want the system to be strongly normalizing, it’s perfectly valid to take the lambda calculus or Perl (PerlPRL!).
So we have some terms and ↦
, on top of this we start by defining a notion of equality between terms. This equality is purely computational and has no notion of types yet (like M ≡ N : A
) because we have no types yet. This equality is sometimes denoted ~
, we usually define it as M ~ N
if and only if M ↦ O(Ms)
if and only if N ↦ O(Ns)
and if they terminate than Ms ~ Ns
. By this I mean that two terms are the same if they compute in the same way, either by diverging or running to the same value built from ~
equal components. For more on this, you could read Howe’s paper.
So now we still have a type theory with no types.. To fix this we go off an define inferences to answer three questions.
A = B
)a ∈ A
)a = b ∈ A
)The first questions is usually answered in a boring way, for instance, we would say that Π x : A. B = Π x : A'. B'
if we know that A = A'
and B = B'
under the assumption that we have some x ∈ A
. We then specify two and three. There we just give the rules for demonstrating that some value, which is a program existing entirely independently of the type we’re building, is in the type. Continuing with functions, we might state that
e x ∈ B (x ∈ A)
———————————————————
e ∈ Π x : A. B
Here I’m using _ (_)
as syntax for a hypothetical judgment, we have to know that e ∈ B
under the assumption that we know that x ∈ A
. Next we have to decide what it means for two values to be equal as functions. We’re going to do this behaviourally, by specifying that they behave as equal programs when used as functions. Since we use functions by applying them all we have to do is specify that they behave equally on application
v x = v' x ∈ B (x ∈ A)
————————————————————————
v = v' ∈ Π x : A. B
Equality is determined on a per type basis. Furthermore, it’s allowed to use the equality of smaller types in its definition. This means that when defining equality for Π x : A. B
we get to use the equalities for A
and B
! We make no attempt to maintain either decidability or uniform complexity in the collections of terms specified by _ = _ ∈ _
as we did with ≡
. As another example, let’s have a look at the equality type.
A = A' a = a' ∈ A b = b' ∈ A
————————————————————————————————
I(a; b; A) = I(a'; b'; A')
a = b ∈ A
——————————————
⋆ ∈ I(a; b; A)
a = b ∈ A
——————————————————
⋆ = ⋆ ∈ I(a; b; A)
Things to notice here, first off the various rules depend on the rules governing membership and equality in A
as we should expect. Secondly, ⋆
(the canonical occupant of I(...)
) has no type information. There’s no way to reconstruct whatever reasoning went into proving a = b ∈ A
because there’s no computational content in it. The thing on the left of the ∈
only describes the portions of our proof that involve computation and equalities in computational type theory are always computationally trivial. Therefore, they get the same witness no matter the proof, no matter the types involved. Finally, the infamous equality reflection rule is really just the principle of inversion that we’re allowed to use in reasoning about hypothetical judgments.
This leads us to the second cast of propsastypes. This one states that constructive proof has computational character. Every proof that we write in a logic like this gives us back an (untyped) program which we can run as appropriate for the theorem we’ve proven. This is the idea behind Kleene’s realizability model. Similar to what we’d do with a logical relation we define what each type means by defining the class of appropriate programs that fit its specification. For example, we defined functions to be the class of things that apply and proofs of equality are ⋆ when the equality is true and there are no proofs when it’s false. Another way of phrasing this correspondence is typesasspecs. Types are used to identify a collection of terms that may be used in some particular way instead of merely specifying the syntax of their terms. To read a bit more about this see Stuart Allen and Bob Harper’s work on the do a good job of explaining how this plays out for type theory.
A lot of the ways we actually interact with type theories is not on the blackboard but through some proof assistant which mechanizes the tedious aspects of using a type theory. For formal type theory this is particularly natural. It’s decidable whether M : A
holds so the user just writes a term and says “Hey this is a proof of A
” and the computer can take care of all the work of checking it. This is the basic experience we get with Coq, Agda, Idris, and others. Even ≡
is handled without us thinking about it.
With computational type theory life is a little sadder. We can’t just write terms like we would for a formal type theory because M ∈ A
isn’t decidable! We need to help guide the computer through the process of validating that our term is well typed. This is the price we pay for having an exceptionally rich notion of M = N ∈ A
and M ∈ A
, there isn’t a snowball’s chance in hell of it being decidable ^{1}. To make this work we switch gears and instead of trying to construct terms we start working with what’s called a program refinement logic, a PRL. A PRL is basically a sequent calculus with a central judgment of
H ≫ A ◁ e
This is going to be set up so that H ⊢ e ∈ A
holds, but there’s a crucial difference. With ∈
everything was an input. To mechanize it we would write a function accepting a context and two terms and checking whether one is a member of the other. With H ≫ A ◁ e
only H
and A
are inputs, e
should be thought of as an output. What we’ll do with this judgment is work with a tactic language to construct a derivation of H ≫ A
without even really thinking with that ◁ e
and the system will use our proof to construct the term for us. So in Agda when I want to write a sorting function what I might do is say
I just give the definition and Agda is going to do the grunt work to make sure that I don’t apply a nat to a string or something equally nutty. In a system like (JonNuMetaλ)prl what we do instead is define the type that our sorting function ought to have and use tactics to prove the existence of a realizer for it. By default we don’t really specify what exactly that realizer. For example, if I was writing JonPRL maybe I’d say
 Somehow this says a list of nats is a sorted version of another
Operator sorting : (0; 0).
Theorem sort : [(xs : List Nat) {ys : List Nat  issorting(ys; xs)}] {
 Tactics go here.
}
I specify a sufficiently strong type so that if I can construct a realizer for it then I clearly have constructed a sorting algorithm. Of course we have tactics which let us say things “I want to use this realizer” and then we have to go off and show that the candidate realizer is a validate realizer. In that situation we’re actually acting as a type checker, constructing a derivation implying e ∈ A
.
Well, that’s this summer in a nutshell. Before I finish I had one more possible look on things. Computational type theory is not concerned with something being provable in an axiomatic system, rather it’s about describing constructions. Brouwer’s core idea is that a proof is a mental construction and computational type theory is a system for proving that a particular a computable process actually builds the correct object. It’s a translation of Brouwer’s notion of proof into terms a computer scientist might be interested in.
To be clear, this is the chance of the snowball not melting. Not the snowball’s chances of being able to decide whether or not M ∈ A
holds. Though I suppose they’re roughly the same.↩
I was reading a recent proposal to merge types and kinds in Haskell to start the transition to dependently typed Haskell. One thing that caught my eye as I was reading it was that this proposal adds * :: *
to the type system. This is of some significance because it means that once this is fully realized, Haskell will be inconsistent (as a logic) in a new way! Of course, this isn’t a huge deal since Haskell is already woefully inconsistent with
unsafePerformIO
So it’s not like we’ll be entering new territory here. All that it means is that there’s a new way to inhabit every type in Haskell. If you were using Haskell as a proof assistant you were already in for a rude awakening I’m afraid :)
This is an issue of significance though for languages like Idris or Agda where such a thing would actually render proofs useless. Famously, MartinLöf’s original type theory did have Type : Type
(or * :: *
in Haskell spelling) and Girard managed to derive a contradiction (Girard’s paradox). I’ve always been told that the particulars of this construction are a little bit complicated but to remember that Type : Type
is bad.
In this post I’d like to prove that Type : Type
is a contradiction in JonPRL. This is a little interesting because in most proof assistants this would work in two steps
Type : Type
OK to be fair, in something like Agda you could use the compiler hacking they’ve already done and just say {# OPTIONS setinset #}
or whatever the flag is. The spirit of the development is the same though
In JonPRL, I’m just going to prove this as a regular implication. We have a proposition which internalizes membership and I’ll demonstrate not(member(U{i}; U{i}))
is provable (U{i}
is how we say Type
in JonPRL). It’s the same logic as we had before.
Before we can really get to the proof we want to talk about, we should go through some of the more advanced features of JonPRL we need to use.
JonPRL is a little different than most proof assistants, for example We can define a type of all closed terms in our language and whose equality is purely computational. This type is base
. To prove that =(a; b; base)
holds you have to prove ceq(a; b)
, the finest grain equality in JonPRL. Two terms are ceq
if they
ceq
componentsWhat’s particularly exciting is that you can substitute any term for any other term ceq
to it, no matter at what type it’s being used and under what hypotheses. In fact, the reduce
tactic (which performs beta reductions) can conceptually be thought of as substituting a bunch of terms for their weakheadnormal forms which are ceq
to the original terms. The relevant literature behind this is found in Doug Howe’s “Equality in a Lazy Computation System”. There’s more in JonPRL in this regard, we also have the asymmetric version of ceq
(called approx
) but we won’t need it today.
Next, let’s talk about the image type. This is a type constructor with the following formation rule:
H ⊢ A : U{i} H ⊢ f : base
—————————————————————————————————
H ⊢ image(A; f) : U{i}
So here A
is a type and f
is anything. Things are going to be equal image
if we can prove that they’re of the form f w
and f w'
where w = w' ∈ A
. So image
gives us the codomain (range) of a function. What’s pretty crazy about this is that it’s not just the range of some function A → B
, we don’t really need a whole new type for that. It’s the range of literally any closed term we can apply. We can take the range of the Y combinator over pi types. We can take the range of lam(x. ⊥)
over unit
, anything we want!
This construct lets us define some really incredible things as a user of JonPRL. For example, the “squash” of a type is supposed to be a type which is occupied by <>
(and only <>
) if and only if there was an occupant of the original type. You can define these in HoTT with higher inductive types. Or, you can define these in this type theory as
Operator squash : (0).
[squash(A)] =def= [image(A; lam(x. <>))]
x ∈ squash(A)
if and only if we can construct an a
so that a ∈ A
and lam(x. <>) a ~ x
. Clearly x
must be <>
and we can construct such an a
if and only if A
is nonempty.
We can also define the setunion of two types. Something is supposed to be in the set union if and only if it’s in one or the other. Two define such a thing with an image type we have
Operator union : (0).
[union(A; B)] =def= [image((x : unit + unit) * decide(x; _.A; _.B); lam(x.snd(x)))]
This one is a bit more complicated. The domain of things we’re applying our function to this time is
(x : unit + unit) * decide(x; _.A; _.B)
This is a dependent pair, sometimes called a Σ type. The first component is a boolean; if it is true
the second component is of type A
, and otherwise it’s of type B
. So for every term of type A
or B
, there’s a term of this Σ type. In fact, we can recover that original term of type A
or B
by just grabbing the second component of the term! We don’t have to worry about the type of such an operation because we’re not creating something with a function type, just something in base
.
union
s let us define an absolutely critical admissible rule in our system. JonPRL has this propositional reflection of the equality judgment and membership, but in MartinLöf’s type theory, membership is nonnegatable. By this I mean that if we have some a
so that a = a ∈ A
doesn’t hold, we won’t be able to prove =(a; a; A) > void
. See in order to prove such a thing we first have to prove that =(a; b; A) > void
is a type, which means proving that =(a; a; A)
is a type.
In order to prove that =(a; b; A)
is a proposition we have to prove =(a; a; A)
, =(b; b; A)
, and =(A; A; U{i})
. The process of proving these will actually also show that the corresponding judgments, a ∈ A
, b ∈ A
, and A ∈ U{i}
hold.
However, in the case that a
and b
are the same term this is just the same as proving =(a; b; A)
! So =(a; a; A)
is a proposition only if it’s true. However, we can add a rule that says that =(a; b; A)
is a proposition if a = a ∈ (A ∪ base)
and similarly for b
! This fixes our negatibility issue because we can just prove that =(a; a; base)
, something that may be true even if a
is not equal in A
. Before having a function take a member(...)
was useless (member(a; A)
is just thin sugar for =(a; a; A)
! member(a; A)
is a proposition if and only if a = a ∈ A
holds, in other words, it’s a proposition if and only if it’s true! With this new rule, we can prove member(a; A)
is a proposition if A ∈ U{i}
and a ∈ base
, a much weaker set of conditions that are almost always true. We can apply this special rule in JonPRL with eqeqbase
instead of just eqcd
like the rest of our equality rules.
Now let’s actually begin proving Russell’s paradox. To start with some notation.
Infix 20 "∈" := member.
Infix 40 "~" := ceq.
Infix 60 "∪" := bunion.
Prefix 40 "¬" := not.
This let’s us say a ∈ b
instead of member(a; b)
. JonPRL recently grew this ability to add transparent notation to terms, it makes our theorems a lot prettier.
Next we define the central term to our proof:
Operator Russell : ().
[Russell] =def= [{x : U{i}  ¬ (x ∈ x)}]
Here we’ve defined Russell
as shorthand for a subset type, in particular a subset of U{i}
(the universe of types). x ∈ Russell
if x ∈ U{i}
and ¬ (x ∈ x)
. Now normally we won’t be able to prove that this is a type (specifically x ∈ x
is going to be a problem), but in our case we’ll have some help from an assumption that U{i} ∈ U{i}
.
Now we begin to define a small set of tactics that we’ll want. These tactics are really where the fiddly bits of using JonPRL’s tactic system come into play. If you’re just reading this for the intuition as to why Type ∈ Type
is bad just skip this. You’ll still understand the construction even if you don’t understand these bits of the proof.
First we have a tactic which finds an occurrence of H : A + B
in the context and eliminate it. This gives us two goals, one with an A
and one with a B
. To do this we use match, which gives us something like match goal with
in Coq.
Tactic breakplus {
@{ [H : _ + _  _] => elim <H>; thin <H> }
}.
Note the syntax [H : ...  ...]
to match on a sequent. In particular here we just have _ + _
and _
. Next we have a tactic bunioneqright
. It’s to help us work with bunion
s (unions). Basically it turns =(M; N; bunion(A; B))
into
=(lam(x.snd(x)) <<>, M>; lam(x.snd(x)) <<>, N>; bunion(A; B))
This is actually helpful because it turns out that once we unfold bunion
we have to prove that M
and N
are in an image type, remember that bunion
is just a thin layer of sugar on top of image types. In order to prove something is in the image type it needs to be of the form f a
where f
in our case is lam(x. snd(x))
.
This is done with
Tactic bunioneqright {
@{ [ =(M; N; L ∪ R)] =>
csubst [M ~ lam(x. snd(x)) <inr(<>), M>] [h.=(h;_;_)];
aux { unfold <snd>; reduce; auto };
csubst [N ~ lam(x. snd(x)) <inr(<>), N>] [h.=(_;h;_)];
aux { unfold <snd>; reduce; auto };
}
}.
The key here is csubst
. It takes a ceq
as its first argument and a “targeting”. It then tries to replace each occurrence of the left side of the equality with the right. To find each occurrence the targeting maps a variable to each occurrence. We’re allowed to use wildcards in the targeting as well. It also relegates actually proving the equality into a new subgoal. It’s easy enough to prove so we demonstrate it with aux {unfold <snd>; reduce; auto}
.
We only need to apply this tactic after eqeqbase
, this applies that rule I mentioned earlier about proving equalities to be wellformed in a much more liberal environment. Therefore we wrap those two tactics into one more convenient package.
Tactic eqbasetac {
@{ [ =(=(M; N; A); =(M'; N'; A'); _)] =>
eqeqbase; auto;
bunioneqright; unfold <bunion>
}
}.
There is one last tactic in this series, this one to prove that member(X; X) ∈ U{i'}
is wellformed (a type). It starts by unfolding member
into =(=(X; X; X); =(X; X; X); U{i})
and then applying the new tactic. Then we do other things. These things aren’t pretty. I suggest we just ignore them.
Tactic impredicativitywftac {
unfold <member>; eqbasetac;
eqcd; ?{@{[ =(_; _; base)] => auto}};
eqcd @i'; ?{breakplus}; reduce; auto
}.
Finally we have a tactic to prove that if we have not(P)
and P
existing in the context proves void
. This is another nice application match
Tactic contradiction {
unfold <not implies>;
@{ [H : P > void, H' : P  void] =>
elim <H> [H'];
unfold <member>;
auto
}
}.
We start by unfolding not
and implies
. This gives us P > void
and P
. From there, we just apply one to the other giving us a void as we wanted.
We’re now ready to prove our theorem. We start with
Theorem typenotintype : [¬ (U{i} ∈ U{i})] {
}.
We now have the main subgoal
Remaining subgoals:
[main] ⊢ not(member(U{i}; U{i}))
We can start by unfold not
and implies
. Remember that not
isn’t a built in thing, it’s just sugar. By unfolding it we get the more primitive form, something that actually apply the intro
tactic to.
{
unfold <not implies>; intro
}
Once unfolded, we’d get a goal along the lines of member(U{i}; U{i}) > void
. We immediately apply intro
to this though. Now we have two subgoals; one is the result of applying intro
, namely a hypothesis x : member(U{i}; U{i})
and a goal void
. The second subgoal is the “wellformedness” obligation.
We have to prove that member(U{i}; U{i})
is a type in order to apply the intro
tactic. This is a crucial difference between Coqlike systems and these proofrefinement logics. The process of demonstrating that what you’re proving is a proposition is intermingled with actually constructing the proof. It means you get to apply all the normal mathematical tools you have for proving things to be true in order to prove that they’re types. This gives us a lot of flexibility, but at the cost of sometimes annoying subgoals. They’re annotated with [aux]
(as opposed to [main]
). This means we can target them all at once using with the aux
tactics.
To summarize that whole paragraph as JonPRL would say it, our proof state is
[main]
1. x : member(U{i}; U{i})
⊢ void
[aux] ⊢ member(member(U{i}; U{i}); U{i'})
Let’s get rid of that auxiliary subgoal using that impredictivitywftac
, this subgoal is in fact exactly what it was made for.
{
unfold <not implies>; intro
aux { impredicativitywftac };
}
This picks off that [aux]
goal leaving us with just
[main]
1. x : member(U{i}; U{i})
⊢ void
Now we need to prove some lemmas. They state that Russell
is actually a type. This is possible to do here and only here because we’ll need to actually use x
in the process of proving this. It’s a very nice example of what explicitly proving wellformedness can give you! After all, the process of demonstrating that Russell
is a type is nontrivial and only possible in this hypothetical context, rather than just hoping that JonPRL is clever enough to figure that out for itself we get to demonstrate it locally.
We’re going to use the assert
tactic to get these lemmas. This lets us state a term, prove it as a subgoal and use it as a hypothesis in the main goal. If you’re logically minded, it’s cut.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
}
The thing in angle brackets is the name it will get in our hypothetical context for the main goal. This leaves us with two subgoals. The aux
one being the assertion and the main
one being allowed to assume it.
[aux]
1. x : member(U{i}; U{i})
⊢ member(Russell; U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
We can prove this by basically working our way towards using impredicativitywftac
. We’ll use aux
again to target the aux
subgoal. We’ll start by unfolding everything and applying eqcd
.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
};
}
Remember that Russell
is {x : U{i}  ¬ (x ∈ x)}
We just applied eqcd
to a subset type (Russell
), so we get two subgoals. One says that U{i}
is a type, one says that if x ∈ U{i}
then ¬ (x ∈ x)
is also a type. In essence this just says that a subset type is a type if both components are types. The former goal is quite straightforward so we applied auto
and take care of it. Now we have one new subgoal to handle
[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(not(member(x'; x')); not(member(x'; x')); U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
The second subgoal is just the rest of the proof, and the first subgoal is what we want to handle. It says that if we have a type x
, then not(member(x; x))
is a type (albeit in ugly notation). To prove this we have to unfold not
. So we’ll do this and apply eqcd
again.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
};
}
Remember that not(P)
desugars to P > void
. Applying eqcd
is going to give us two subgoals, P
is a type and void
is a type. However, member(void; U{i})
is pretty easy to prove, so we apply auto
again which takes care of one of our two new goals. Now we just have
[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(member(x'; x'); member(x'; x'); U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
Now we’re getting to the root of the issue. We’re trying to prove that member(x'; x')
is a type. This is happily handled by impredicativitywftac
which will use our assumption that U{i} ∈ U{i}
because it’s smart like that.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
}
Now we just have that main goal with the assumption russellwf
added.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
Now we have a similar wellformedness goal to assert and prove. We want to prove that ∈(Russell; Russell)
is a type. This is easier though; we can prove it easily using impredicativitywftac
.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { impredicativitywftac; cum @i; auto };
}
That cum @i
is a quirk of impredicativitywftac
. It basically means that instead of proving =(...; ...; U{i'})
we can prove =(...; ...; U{i})
since U{i}
is a universe below U{i'}
and all universes are cumulative.
Our goal is now
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
⊢ void
Ok, so now the reasoning can start now that we have all these wellformedness lemmas. Our proof sketch is basically as follows
Russell ∈ Russell
is false. This is because if Russell
was in Russell
then by definition of Russell
it isn’t in Russell
.not(Russell ∈ Russell)
holds, then Russell ∈ Russell
holds.Here’s the first assertion:
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { impredicativitywftac; cum @i; auto };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
}
Here are our subgoals:
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
⊢ not(member(Russell; Russell))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
We want to prove that first one. To start, let’s unfold that not
and move member(Russell; Russell)
to the hypothesis and use it to prove void
. We do this with intro
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
}
}
Notice that the wellformedness goal that intro
generated is handled by our assumption! After all, it’s just member(Russell; Russell) ∈ U{i}
, we already proved it. Now our subgoals look like this
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Here’s our clever plan
Russell ∈ Russell
, there’s an X : Russell
so that ceq(Russell; X)
holdsX : Russell
, we can unfold it to say that X : {x ∈ U{i}  ¬ (x ∈ x)}
X
and derive that ¬ (X ∈ X)
ceq(Russell; X)
gives ¬ (Russell; Russell)
Let’s start explaining this to JonPRL by introducing that X
(here called R
). We’ll assert an R : Russell
such that R ~ Russell
. We do this using dependent pairs (here written (x : A) * B(x)
).
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
}
}
We’ve proven this by intro
. For proving dependent products we provide an explicit witness for the first component. Basically to prove (x : A) * B(x)
we say intro [Foo]
. We then have a goal Foo ∈ A
and B(Foo)
. Since subgoals are fully independent of each other, we have to give the witness for the first component upfront. It’s a little awkward, Jon’s working on it :).
In this case we use intro [Russell]
. After this we have to prove that this witness has type Russell
and then prove the second component holds. Happily, auto
takes care of both of these obligations so intro [Russell] @i; auto
handles it all.
Now we promptly eliminate this pair. It gives us two new facts, that R : Russell
and R ~ Russell
hold.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>
}
}
This leaves our goal as
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
5. s : Russell
6. t : ceq(s; Russell)
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now let’s invert on the hypothesis that s : Russell
; we want to use it to conclude that ¬ (s ∈ s)
holds since that will give us ¬ (R ∈ R)
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
}
}
Now that we’ve unfolded all of those Russell
s our goal is a little bit harder to read, remember to mentally substitute {x : U{i}  not(member(x; x))}
as Russell
.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now we use #7 to derive that not(member(Russell; Russell))
holds.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ member(Russell; Russell)];
aux {
unfold <Russell>;
};
}
}
This leaves us with 3 subgoals, the first one being the assertion.
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ not(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now to prove this, what we need to do is substitute the unfolded Russell
for x''
; from there it’s immediate by assumption. We perform the substitution with chypsubst
. This takes a direction in which to substitute, which hypothesis to use, and another targeting telling us where to apply the substitution.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ member(Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
};
}
}
This leaves us with a much more tractable goal.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ not(member(x''; x''))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
We’d like to just apply assumption
but it’s not immediately applicable due to some technically details (basically we can only apply an assumption in a proof irrelevant context but we have to unfold Russell
and introduce it to demonstrate that it’s irrelevant). So just read what’s left as a (very) convoluted assumption
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ (Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
unfold <not implies>
intro; aux { impredicativitywftac };
contradiction
};
}
}
Now we’re almost through this assertion, our subgoals look like this (pay attention to 9 and 4)
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Once we unfold that Russell
we have an immediate contradiction so unfold <Russell>; contradiction
solves it.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ (Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
unfold <not implies>;
intro; aux { impredicativitywftac };
contradiction
};
unfold <Russell>; contradiction
}
}
This takes care of this subgoal, so now we’re back on the main goal. This time though we have an extra hypothesis which will provide the leverage we need to prove our next assertion.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now we’re going to claim that Russell
is in fact a member of Russell
. This will follow from the fact that we’ve proved already that Russell
isn’t in Russell
(yeah, it seems pretty paradoxical already).
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell];
}
Giving us
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ member(Russell; Russell)
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
5. H : member(Russell; Russell)
⊢ void
Proving this is pretty straightforward, we only have to demonstrate that not(Russell ∈ Russell)
and Russell ∈ U{i}
, both of which we have as assumptions. The rest of the proof is just more wellformedness goals.
First we unfold everything and apply eqcd
. This gives us 3 subgoals, the first two are Russell ∈ U{i}
and ¬(Russell ∈ Russell)
. Since we have these as assumptions we’ll use main {assumption}
. That will target both these goals and prove them immediately. Here by using main
we avoid applying this to the wellformedness goal, which in this case actually isn’t the assumption.
{
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell];
aux {
unfold <member Russell>; eqcd;
unfold <member>;
main { assumption };
};
}
This just leaves us with one awful wellformedness goal requiring us to prove that not(=(x; x; x))
is a type if x
is a type. We actually proved something similar back when we prove that Russell
was wellformed. The proof is the same as then, just unfold, eqcd
and impredicativitywftac
. We use ?{!{auto}}
to only apply auto
in a subgoal where it immediately proves it. Here ?{}
says “run this or do nothing” and !{}
says “run this, if it succeeds stop, if it does anything else, fail”. This is not an interesting portion of the proof, don’t burn too many cycles trying to figure this out.
{
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell] <russellinrussell>;
aux {
unfold <member Russell>; eqcd;
unfold <member>;
main { assumption };
unfold <not implies>; eqcd; ?{!{auto}};
impredicativitywftac;
};
}
Now we just have the final subgoal to prove. We’re actually in a position to do so now.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
5. russellinrussell : member(Russell; Russell)
⊢ void
Now that we’ve shown P
and not(P)
hold at the same time all we need to do is apply contradiction
and we’re done.
Theorem typenotintype [¬ (U{i} ∈ U{i})] {
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell] <russellinrussell>;
aux { ... };
contradiction
}.
And there you have it, a complete proof of Russell’s paradox fully formalized in JonPRL! We actually proved a slightly stronger result than just that the type of types cannot be in itself, we proved that at any point in the hierarchy of universes (the first of which is Type
/*
/whatever) if you tie it off, you’ll get a contradiction.
I hope you found this proof interesting. Even if you’re not at all interested in JonPRL, it’s nice to see that allowing one to have U{i} ∈ U{i}
or * :: *
gives you the ability to have a type like Russell
and with it, inhabit void
. I also find it especially pleasing that we can prove something like this in JonPRL; it’s growing up so fast.
Thanks to Jon for greatly improving the original proof we had
comments powered by Disqus ]]>I wanted to write about something related to all the stuff I’ve been reading for research lately. I decided to talk about a super cool trick in a field called domain theory. It’s a method of generating a solution to a large class of recursive equations.
In order to go through this idea we’ve got some background to cover. I wanted to make this post readable even if you haven’t read too much domain theory (you do need to know what a functor/colimit is though, nothing crazy though). We’ll start with a whirlwind tutorial of the math behind domain theory. From there we’ll transform the problem of finding a solution to an equation into something categorically tractable. Finally, I’ll walk through the construction of a solution.
I decided not to show an example of applying this technique to model a language because that would warrant its own post, hopefully I’ll write about that soon :)
The basic idea with domain theory comes from a simple problem. Suppose we want to model the lambda calculus. We want a collection of mathematical objects D
so that we can treat element of D
as a function D > D
and each function D > D
as an element of D
. To see why this is natural, remember that we want to turn each program E
into d ∈ D
. If E = λ x. E'
then we need to turn the function e ↦ [e/x]E'
into a term. This means D → D
needs to be embeddable in D
. On the other hand, we might have E = E' E''
in which case we need to turn E'
into a function D → D
so that we can apply it. This means we need to be able to embed D
into D → D
.
After this we can turn a lambda calculus program into a specific element of D
and reason about its properties using the ambient mathematical tools for D
. This is semantics, understanding programs by studying their meaning in some mathematical structure. In our specific case that structure is D
with the isomorphism D ≅ D → D
. However, there’s an issue! We know that D
can’t just be a set because then there cannot be such an isomorphism! In the case where D ≅ N
, then D → D ≅ R
and there’s a nice proof by diagonalization that such an isomorphism cannot exist.
So what can we do? We know there are only countably many programs, but we’re trying to state that there exists an isomorphism between our programs (countable) and functions on them (uncountable). Well the issue is that we don’t really mean all functions on D
, just the ones we can model as lambda terms. For example, the function which maps all divergent programs to 1
and all terminating ones to 0
need not be considered because there’s no lambda term for it! How do we consider “computable” functions though? It’s not obvious since we define computable functions using the lambda calculus, what we’re trying to model here. Let’s set aside this question for a moment.
Another question is how do we handle this program: (λ x. x x) (λ x. x x)
? It doesn’t have a value after all! It doesn’t behave like a normal mathematical function because applying it to something doesn’t give us back a new term, it just runs forever! To handle this we do something really clever. We stop considering just a collection of terms and instead look at terms with an ordering relation ⊑
! The idea is that ⊑ represents definedness. A program which runs to a value is more defined than a program which just loops forever. Similarly, two functions behave the same on all inputs except for 0
where one loops we could say one is more defined than the other. What we’ll do is define ⊑ abstractly and then model programs into sets with such a relation defined upon them. In order to build up this theory we need a few definitions
A partially ordered set (poset) is a set A
and a binary relation ⊑
where
a ⊑ a
a ⊑ b
and b ⊑ c
implies a ⊑ c
a ⊑ b
and b ⊑ a
implies a = b
We often just denote the pair <A, ⊑>
as A
when the ordering is clear. With a poset A
, of particular interest are chains in it. A chain is collection of elements aᵢ
so that aᵢ ⊑ aⱼ
if i ≤ j
. For example, in the partial order of natural numbers and ≤
, a chain is just a run of ascending numbers. Another fundamental concept is called a least upper bound (lub). A lub of a subset P ⊆ A
is an element of x ∈ A
so that y ∈ P
implies y ⊑ x
and if this property holds for some z
also in A
, then x ⊑ z
. So a least upper bound is just the smallest thing bigger than the subset. This isn’t always guaranteed to exist, for example, in our poset of natural numbers N
, the subset N
has no upper bounds at all! When such a lub does exist, we denote it with ⊔P
. Some partial orders have an interesting property, all chains in them have least upper bounds. We call this posets complete partial orders or cpos.
For example while N
isn’t a cpo, ω
(the natural numbers + an element greater than all of them) is! As a quick puzzle, can you show that all finite partial orders are in fact CPOs?
We can define a number of basic constructions on cpos. The most common is the “lifting” operation which takes a cpo D
and returns D⊥
, a cpo with a least element ⊥
. A cpo with such a least element is called “pointed” and I’ll write that as cppo (complete pointed partial order). Another common example, given two cppos, D
and E
, we can construct D ⊗ E
. An element of this cppo is either ⊥
or <l, r>
where l ∈ D  {⊥}
and r ∈ E  {⊥}
. This is called the smash product because it “smashes” the ⊥s out of the components. Similarly, there’s smash sums D ⊕ E
.
The next question is the classic algebraic question to ask about a structure: what are the interesting functions on it? We’ll in particular be interested in functions which preserve the ⊑ relation and the taking of lub’s on chains. For this we have two more definitions:
x ⊑ y
implies f(x) ⊑ f(y)
C
, ⊔ f(P) = f(⊔ P)
.Notably, the collection of cppos and continuous functions form a category! This is because clearly x ↦ x
is continuous and the composition of two continuous functions is continuous. This category is called Cpo
. It’s here that we’re going to do most of our interesting constructions.
Finally, we have to discuss one important construction on Cpo
: D → E
. This is the set of continuous functions from D
to E
. The ordering on this is pointwise, meaning that f ⊑ g
if for all x ∈ D
, f(x) ⊑ g(x)
. This is a cppo where ⊥
is x ↦ ⊥
and all the lubs are determined pointwise.
This gives us most of the mathematics we need to do the constructions we’re going to want, to demonstrate something cool here’s a fun theorem which turns out to be incredibly useful: Any continuous function f : D → D
on a cppo D
has a least fixed point.
To construct this least point we need to find an x
so that x = f(x)
. To do this, note first that x ⊑ f(x)
by definition and by the monotonicity of f
: f(x) ⊑ f(y)
if x ⊆ y
. This means that the collection of elements fⁱ(⊥)
forms a chain with the ith element being the ith iteration of f
! Since D
is a cppo, this chain has an upper bound: ⊔ fⁱ(⊥)
. Moreover, f(⊔ fⁱ(⊥)) = ⊔ f(fⁱ(⊥))
by the continuity of f
, but ⊔ fⁱ(⊥) = ⊥ ⊔ (⊔ f(fⁱ(⊥))) = ⊔ f(fⁱ(⊥))
so this is a fixed point! The proof that it’s a least fixed point is elided because typesetting in markdown is a bit of a bother.
So there you have it, very, very basic domain theory. I can now answer the question we weren’t sure about before, the slogan is “computable functions are continuous functions”.
Cpo
So now we can get to the result showing domain theory incredibly useful. Remember our problem before? We wanted to find a collection D
so that
D ≅ D → D
However it wasn’t clear how to do this due to size issues. In Cpo
however, we can absolutely solve this. This huge result was due to Dana Scott. First, we make a small transformation to the problem that’s very common in these scenarios. Instead of trying to solve this equation (something we don’t have very many tools for) we’re going to instead look for the fixpoint of this functor
F(X) = X → X
The idea here is that we’re going to prove that all well behaved endofunctors on Cpo have fixpoints. By using this viewpoint we get all the powerful tools we normally have for reasoning about functors in category theory. However, there’s a problem: the above isn’t a functor! It has both positive and negative occurrences of X
so it’s neither a co nor contravariant functor. To handle this we apply another clever trick. Let’s not look at endofunctors, but rather functors Cpoᵒ × Cpo → Cpo
(I believe this should be attributed to Freyd). This is a binary functor which is covariant in the second argument and contravariant in the first. We’ll use the first argument everywhere there’s a negative occurrence of X
and the second for every positive occurrence. Take note: we need things to be contravariant in the first argument because we’re using that first argument negatively: if we didn’t do that we wouldn’t have a functor.
Now we have
F(X⁻, X⁺) = X⁻ → X⁺
This is functorial. We can also always recover the original map simply by diagonalizing: F(X) = F(X, X)
. We’ll now look for an object D
so that F(D, D) ≅ D
. Not quite a fixed point, but still equivalent to the equation we were looking at earlier.
Furthermore, we need one last critical property, we want F
to be locally continuous. This means that the maps on morphisms determined by F
should be continuous so F(⊔ P, g) = ⊔ F(P, g)
and viceversa (here P
is a set of functions). Note that such morphisms have an ordering because they belong to the pointwise ordered cppo we talked about earlier.
We have one final thing to set up before this proof: what about if there’s multiple nonisomorphic solutions to F
? We want a further coherence condition that’s going to provide us with 2 things
What we want is called minimal invariance. Suppose we have a D
and an i : D ≅ F(D, D)
. This is the minimal invariant solution if and only if the least fixed point of f(e) = i⁻ ∘ F(e, e) ∘ i
is id
. In other words, we want it to be the case that
d = ⊔ₓ fˣ(⊥)(d) (d ∈ D)
I mentally picture this as saying that the isomorphism is set up so that for any particular d
we choose, if we apply i
, fmap
over it, apply i
again, repeat and repeat, eventually this process will halt and we’ll run out of things to fmap
over. It’s a sort of a statement that each d ∈ D
is “finite” in a very, very handwavy sense. Don’t worry if that didn’t make much sense, it’s helpful to me but it’s just my intuition. This property has some interesting effects though: it means that if we find such a D
then (D, D)
is going to be both the initial algebra and final coalgebra of F
.
Without further ado, let’s prove that every locally continuous functor F
. We start by defining the following
D₀ = {⊥}
Dᵢ = F(Dᵢ₋₁, Dᵢ₋₁)
This gives us a chain of cppos that gradually get larger. How do we show that they’re getting larger? By defining an section from Dᵢ
to Dⱼ
where j = i + 1
. A section is a function f
which is paired with a (unique) function f⁰
so that f⁰f = id
and ff⁰ ⊑ id
. In other words, f
embeds its domain into the codomain and f⁰
tells us how to get it out. Putting something in and taking it out is a round trip. Since the codomain may be bigger though taking something out and putting it back only approximates a round trip. Our sections are defined thusly
s₀ = x ↦ ⊥ r₀ = x ↦ ⊥
sᵢ = F(rᵢ₋₁, sᵢ₋₁) rᵢ = F(rᵢ₋₁, sᵢ₋₁)
It would be very instructive to work out that these definitions are actually sections and retractions. Since typesetting this subscripts is a little rough, if it’s clear from context I’ll just write r
and s
. Now we’ve got this increasing chain, we define an interesting object
D = {x ∈ Πᵢ Dᵢ  x.(i1) = r(x.i)}
In other words, D
is the collection of infinitely large pairs. Each component if from one of those Dᵢ
s above and they cohere with each other so using s
and r
to step up the chain takes you from one component to the next. Next we define a way to go from a single Dᵢ
to a D
: upᵢ : Dᵢ → D
where
upᵢ(x).j = x if i = j
 rᵈ(x) if i  j = d > 0
 sᵈ(x) if j  i = d > 0
Interestingly, note that πᵢ ∘ upᵢ = id
(easy proof) and that upᵢ ∘ πᵢ ⊑ id
(slightly harder proof). This means that we’ve got more sections lying around: every Dᵢ
can be fed into D
. Consider the following diagram
s s s
D0 ——> D1 ——> D2 ——> ...
I claim that D
is the colimit to this diagram where the collection of arrows mapping into it are given with upᵢ
. Seeing this is a colimit follows from the fact that πᵢ ∘ upᵢ
is just id
. Specifically, suppose we have some object C
and a family of morphisms cᵢ : Dᵢ → C
which commute properly with s
. We need to find a unique morphism h
so that cᵢ = h ∘ upᵢ
. Define h
as ⊔ᵢ cᵢπᵢ
. Then
h ∘ upⱼ = (⊔j<i cᵢsʲrʲ) ⊔ cᵢ ⊔ (⊔j>i cᵢrʲsʲ) = (⊔j<i cᵢsʲrʲ) ⊔ cᵢ
The last step follows from the fact that rʲsʲ = id
. Furthermore, sʲrʲ ⊑ id
so cᵢsʲrʲ ⊑ cᵢ
so that whole massive term just evaluates to cᵢ
as required. So we have a colimit. Notice that if we apply F
to each Dᵢ
in the diagram we end up with a new diagram.
s s s
D1 ——> D2 ——> D3 ——> ...
D
is still the colimit (all we’ve done is shift the diagram over by one) but by identical reasoning to D
being a colimit, so is F(D, D)
. This means we have a unique isomorphism i : D ≅ F(D, D)
. The fact that i
is the minimal invariant follows from the properties we get from the fact that i
comes from a colimit.
With this construction we can construct our model of the lambda calculus simply by finding the minimal invariant of the locally continuous functor F(D⁻, D⁺) = D⁻ → D⁺
(it’s worth proving it’s locally continuous). Our denotation is defined as [e]ρ ∈ D
where e
is a lambda term and ρ
is a map of the free variables of e
to other elements of D
. This is inductively defined as
[λx. e]ρ = i⁻(d ↦ [e]ρ[x ↦ d])
[e e']ρ = i([e]ρ)([e']ρ)
[x]ρ = ρ(x)
Notice here that for the two main constructions we just use i
and i⁻
to fold and unfold the denotations to treat them as functions. We could go on to prove that this denotation is sound and complete but that’s something for another post.
That’s the main result I wanted to demonstrate. With this single proof we can actually model a very large class of programming languages into Cpo
. Hopefully I’ll get around to showing how we can pull a similar trick with a relational structure on Cpo
in order to prove full abstraction. This is nicely explained in Andrew Pitt’s “Relational Properties of Domains”.
If you’re interested in domain theory I learned from Gunter’s “Semantics of Programming Languages” book and recommend it.
comments powered by Disqus ]]>I’ve been trying to write a blog post to this effect for a while now, hopefully this one will stick. I intend for this to be a bit more openended than most of my other posts, if you’re interested in seeing the updated version look here. Pull requests/issues are more than welcome on the repository. I hope you learn something from this.
Lots of people seem curious about type theory but it’s not at all clear how to go from no math background to understanding “Homotopical Patch Theory” or whatever the latest cool paper is. In this repository I’ve gathered links to some of the resources I’ve personally found helpful.
I strongly urge you to start by reading one or more of the textbooks immediately below. They give a nice selfcontained introduction and a foundation for understanding the papers that follow. Don’t get hung up on any particular thing, it’s always easier to skim the first time and read closely on a second pass.
Practical Foundations of Programming Languages (PFPL)
I reference this more than any other book. It’s a very wide ranging survey of programming languages that assumes very little background knowledge. A lot people prefer the next book I mention but I think PFPL does a better job explaining the foundations it works from and then covers more topics I find interesting.
Types and Programming Languages (TAPL)
Another very widely used introductory book (the one I learned with). It’s good to read in conjunction with PFPL as they emphasize things differently. Notably, this includes descriptions of type inference which PFPL lacks and TAPL lacks most of PFPL’s descriptions of concurrency/interesting imperative languages. Like PFPL this is very accessible and well written.
Advanced Topics in Types and Programming Languages (ATTAPL)
Don’t feel the urge to read this all at once. It’s a bunch of fully independent but excellent chapters on a bunch of different topics. Read what looks interesting, save what doesn’t. It’s good to have in case you ever need to learn more about one of the subjects in a pinch.
One of the fun parts of taking in an interest in type theory is that you get all sorts of fun new programming languages to play with. Some major proof assistants are
Coq
Coq is one of the more widely used proof assistants and has the best introductory material by far in my opinion.
Agda
Agda is in many respects similar to Coq, but is a smaller language overall. It’s relatively easy to learn Agda after Coq so I recommend doing that. Agda has some really interesting advanced constructs like inductionrecursion.
Idris
It might not be fair to put Idris in a list of “proof assistants” since it really wants to be a proper programming language. It’s one of the first serious attempts at writing a programming language with dependent types for actual programming though.
Twelf
Twelf is by far the simplest system in this list, it’s the absolute minimum a language can have and still be dependently typed. All of this makes it easy to pick up, but there are very few users and not a lot of introductory material which makes it a bit harder to get started with. It does scale up to serious use though.
The Works of Per MartinLöf
Per MartinLöf has contributed a ton to the current state of dependent type theory. So much so that it’s impossible to escape his influence. His papers on MartinLöf Type Theory (he called it Intuitionistic Type Theory) are seminal.
If you’re confused by the papers above read the book in the next entry and try again. The book doesn’t give you as good a feel for the various flavors of MLTT (which spun off into different areas of research) but is easier to follow.
Programming In MartinLöf’s Type Theory
It’s good to read the original papers and here things from the horses mouth, but MartinLöf is much smarter than us and it’s nice to read other people explanations of his material. A group of people at Chalmer’s have elaborated it into a book.
The Works of John Reynold’s
John Reynold’s works are similarly impressive and always a pleasure to read.
Computational Type Theory
While most dependent type theories (like the ones found in Coq, Agda, Idris..) are based on MartinLöf later intensional type theories, computational type theory is different. It’s a direct descendant of his extensional type theory that has been heavily developed and forms the basis of NuPRL nowadays. The resources below describe the various parts of how CTT works.
Homotopy Type Theory
A new exciting branch of type theory. This exploits the connection between homotopy theory and type theory by treating types as spaces. It’s the subject of a lot of active research but has some really nice introductory resources even now.
Frank Pfenning’s Lecture Notes
Over the years, Frank Pfenning has accumulated lecture notes that are nothing short of heroic. They’re wonderful to read and almost as good as being in one of his lectures.
Learning category theory is necessary to understand some parts of type theory. If you decide to study categorical semantics, realizability, or domain theory eventually you’ll have to buckledown and learn a little at least. It’s actually really cool math so no harm done!
Category Theory for Computer Scientists
This is the absolute smallest introduction to category theory you can find that’s still useful for a computer scientist. It’s very light on what it demands for prior knowledge of pure math but doesn’t go into too much depth.
Category Theory
One of the better introductory books to category theory in my opinion. It’s notable in assuming relatively little mathematical background and for covering quite a lot of ground in a readable way.
Ed Morehouse’s Category Theory Lecture Notes
Another valuable piece of reading are these lecture notes. They cover a lot of the same areas as “Category Theory” so they can help to reinforce what you learned there as well giving you some of the author’s perspective on how to think about these things.
Gunter’s “Semantics of Programming Language”
While I’m not as big a fan of some of the earlier chapters, the math presented in this book is absolutely topnotch and gives a good understanding of how some cool fields (like domain theory) work.
OPLSS
The Oregon Programming Languages Summer School is a 2 week long bootcamp on PLs held annually at the university of Oregon. It’s a wonderful event to attend but if you can’t make it they record all their lectures anyways! They’re taught be a variety of lecturers but they’re all world class researchers.
So as a follow up to my prior tutorial on JonPRL I wanted to demonstrate a nice example of JonPRL being used to prove something
Unreasonably difficult in Agda or the like
I think I’m asking to be shown up when I say stuff like this…
I would like to implement the conatural numbers in JonPRL but without a notion of general coinductive or even inductive types. Just the natural numbers. The fun bit is that we’re basically going to lift the definition of a coinductively defined set straight out of set theory into JonPRL!
First, let’s go through some math. How can we formalize the notion of an coinductively defined type as we’re used to in programming languages? Recall that something is coinductively if it contains all terms so that we can eliminate the term according to the elimination form for our type. For example, MartinLof has proposed we view functions (Πtypes) as coinductively defined. That is,
x : A ⊢ f(x) : B(x)
————————————————————
f : Π x : A. B(x)
In particular, there’s no assertion that f
needs to be a lambda, just that f(x)
is defined and belongs to the right type. This view of “if we can use it, it’s in the type” applies to more than just functions. Let’s suppose we have a type with the following elimination form
L : List M : A x : Nat, y : List : A
——————————————————————————————————————
case(L; M; x.y.N) : A
This is more familiar to Haskellers as
case L of
[] > M
x :: y > N
Now if we look at the coinductively defined type built from this elimination rule we have not finite lists, but streams! There’s nothing in this elimination rule that specifies that the list be finite in length for it to terminate. All we need to be able to do is evaluate the term to either a ::
of a Nat
and a List
or nil
. This means that
fix x. cons(0; x) : List
Let’s now try to formalize this by describing what it means to build a coinductively type up as a set of terms. In particular the types we’re interested in here are algebraic ones, constructed from sums and products.
Now unfortunately I’m going to be a little handwavy. I’m going to act is if we’ve worked out a careful set theoretic semantics for this programming language (like the one that exists for MLTT). This means that All the equations you see here are across sets and that these sets contain programs so that ⊢ e : τ
means that e ∈ τ
where τ
on the right is a set.
Well we start with some equation of the form
Φ = 1 + Φ
This particular equation a is actually how we would go about defining the natural numbers. If I write it in a more Haskellish notation we’d have
data Φ = Zero  Succ Φ
Next, we transform this into a function. This step is a deliberate move so we can start applying the myriad tools we know of for handling this equation.
Φ(X) = 1 + X
We now want to find some X
so that Φ(X) = X
. If we can do this, then I claim that X
is a solution to the equation given above since
X = Φ(X)
X = 1 + X
precisely mirrors the equation we had above. Such an X
is called a “fixed point” of the function Φ
. However, there’s a catch: there may well be more than one fixed point of a function! Which one do we choose? The key is that we want the coinductively defined version. Coinduction means that we should always be able to examine a term in our type and its outermost form should be 1 + ???
. Okay, let’s optimistically start by saying that X
is ⊤
(the collection of all terms).
Ah okay, this isn’t right. This works only so long as we don’t make any observations about a term we claim is in this type. The minute we pattern match, we might have found we claimed a function was in our type! I have not yet managed to pay my rent by saying “OK, here’s the check… but don’t try to use it and it’s rent”. So perhaps we should try something else. Okay, so let’s not say ⊤
, let’s say
X = ⊤ ⋂ Φ(⊤)
Now, if t ∈ X
, we know that t ∈ 1 + ???
. This means that if we run e ∈ X
, we’ll get the correct outermost form. However, this code is still potentially broken:
This starts off as being well typed, but as we evaluate, it may actually become ill typed. If we claimed that this was a fixed point to our language, our language would be typeunsafe. This is an unappealing quality in a type theory.
Okay, so that didn’t work. What if we fixed this code by doing
X = ⊤ ⋂ Φ(⊤) ⋂ Φ(Φ(⊤))
Now this fixes the above code, but can you imagine a snippet of code where this still gets stuck? So each time we intersect X
with Φ(X)
we get a new type which behaves like the real fixed point so long as we only observe n + 1
times where X
behaves like the fixed point for n
observations. Well, we can only make finitely many observations so let’s just iterate such an intersection
X = ⋂ₙ Φⁿ(⊤)
So if e ∈ X
, then no matter how many times we pattern match and examine the recursive component of e
we know that it’s still in ⋂ₙ Φⁿ(⊤)
and therefore still in X
! In fact, it’s easy to prove that this is the case with two lemmas
X ⊆ Y
then Φ(X) ⊆ Φ(Y)
S
of sets, then ⋂ Φ(S) = Φ(⋂ S)
where we define Φ
on a collection of sets by applying Φ
to each component.These two properties state the monotonicity and cocontinuity of Φ
. In fact, cocontinuity should imply monotonicity (can you see how?). We then may show that
Φ(⋂ₙ Φⁿ(⊤)) = ⋂ₙ Φ(Φⁿ(⊤))
= ⊤ ⋂ (⋂ₙ Φ(Φⁿ(⊤)))
= ⋂ₙ Φⁿ(⊤)
As desired.
Now that we have some idea of how to formalize coinduction, can we port this to JonPRL? Well, we have natural numbers and we can take the intersection of types… Seems like a start. Looking at that example, we first need to figure out what ⊤
corresponds to. It should include all programs, which sounds like the type base
in JonPRL. However, it also should be the case that x = y ∈ ⊤
for all x
and y
. For that we need an interesting trick:
Operator top : ().
[top] =def= [isect(void; _.void)].
In prettier notation,
top ≙ ⋂ x : void. void
Now x ∈ top
if x ∈ void
for all _ ∈ void
. Hey wait a minute… No such _
exists so the if is always satisfied vacuously. Ok, that’s good. Now x = y ∈ top
if for all _ ∈ void
, x = y ∈ void
. Since no such _
exists again, all things are in fact equal in void
. We can even prove this within JonPRL
Theorem topistop :
[isect(base; x.
isect(base; y.
=(x; y; top)))] {
unfold <top>; auto
}.
This proof is really just:
x : void
in my context! Tell me more about that.Now the fact that x ∈ top
is a trivial corollary since our theorem tells us that x = x ∈ top
and the former is just sugar for the latter. With this defined, we can now write down a general operator for coinduction!
Operator corec : (1).
[corec(F)] =def= [isect(nat; n. natrec(n; top; _.x. so_apply(F;x)))].
To unpack this, corec
takes one argument which binds one variable. We then intersect the type natrec(n; top; _.x.so_apply(F;x))
for all n ∈ nat
. That natrec
construct is really saying Fⁿ(⊤)
, it’s just a little obscured. Especially since we have to use so_apply
, a sort of “metaapplication” which lets us apply a term binding a variable to another term. This should look familiar, it’s just how we defined fixed point of a Φ
!
For a fun demo, let’s define an F
so that cofix(F)
will give us the conatural numbers. I know that the natural numbers come from the least fixed point of X ↦ 1 + X
(because I said so above, so it must be so) so let’s define that.
Operator conatF : (0).
[conatF(X)] =def= [+(unit; X)].
This is just that X ↦ 1 + X
I wrote above in JonPRL land instead of math notation. Next we need to actually define conatural numbers using corec
.
Operator conat : ().
[conat] =def= [corec(R. conatF(R))].
Now I’ve defined this, but that’s no fun unless we can actual build some terms so that member(X; conat)
. Specifically I want to prove two things to start
member(czero; conat)
fun(member(M; conat); _.member(csucc(M); conat))
This states that conat
is closed under some zero and successor operations. Now what should those operations be? Remember what I said before, that we had this correspondence?
X ↦ 1 + X
Nat Zero Suc X
Now remember that conat
is isect(nat; n....)
and when constructing a member of isect
we’re not allowed to mention that n
in it (as opposed to fun
where we do exactly that). So that means czero
has to be a member of top
and sum(unit; ...)
. The top
bit is easy, everything is in top
! That diagram above suggests inl
of something in unit
Operator czero : ().
[czero] =def= [inl(<>)].
So now we want to prove that this in fact in conat
.
Theorem zerowf : [member(czero; conat)] {
}.
Okay loading this into JonPRL gives
⊢ czero ∈ conat
From there we start by unfolding all the definitions
{
unfold <czero conat conatF corec top>
}
This gives us back the desugared goal
⊢ inl(<>) ∈ ⋂n ∈ nat. natrec(n; top; _.x.+(unit; x))
Next let’s apply all the obvious introductions so that we’re in a position to try to prove things
unfold <czero conat conatF corec top>; auto
This gives us back
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ natrec(n; top; _.x.+(unit; x))
Now we’re stuck. We want to show inl
is in something, but we’re never going to be able to do that until we can reduce that natrec(n; top; _.x.+(unit; x))
to a canonical form. Since it’s stuck on n
, let’s induct on that n
. After that, let’s immediately reduce.
{
unfold <czero conat conatF corec top>; auto; elim #1; reduce
}
Now we have to cases, the base and inductive case.
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ top
1. [n] : nat
2. n' : nat
3. ih : inl(<>) = inl(<>) ∈ natrec(n'; top; _.x.+(unit; x))
⊢ inl(<>) = inl(<>) ∈ +(unit; natrec(n'; top; _.x.+(unit; x)))
Now that we have canonical terms on the right of the ∈
m, let’s let auto
handle the rest.
Theorem zerowf : [member(czero; conat)] {
unfold <czero conat conatF corec top>; auto; elim #1; reduce; auto
}.
So now we have proven that czero
is in the correct type. Let’s figure out csucc
? Going by our noses, inl
corresponded to czero
and our diagram says that inr
should correspond to csucc
. This gives us
Operator csucc : (0).
[csucc(M)] =def= [inr(M)].
Now let’s try to prove the corresponding theorem for csucc
Theorem succwf : [isect(conat; x. member(csucc(x); conat))] {
}.
Now we’re going to start off this proof like we did with our last one. Unfold everything, apply the introduction rules, and induct on n
.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce
}
Like before, we now have two subgoals:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
⊢ inr(x) = inr(x) ∈ ⋂_ ∈ void. void
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr(x) = inr(x) ∈ +(unit; natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)))
The first one looks pretty easy, that’s just foo ∈ top
, I think auto
should handle that.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto
}
This just leaves one goal to prove
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now, as it turns out, this is nice and easy: look at what our first assumption says! Since x ∈ isect(nat; n.Foo)
and our goal is to show that x ∈ Foo(n')
this should be as easy as another call to elim
.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto
}
Note that the [n']
bit there lets us supply the term we wish to substitute for n
while eliminating. This leaves us here:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now a small hiccup: we know that y = x
is in the right type. so x = x
in the right type. But how do we prove this? The answer is to substitute all occurrences of x
for y
. This is written
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto;
hypsubst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
}
There are three arguments here, a direction to substitute, an index telling us which hypothesis to use as the equality to substitute with and finally, a term [h. ...]
. The idea with this term is that each occurrence of h
tells us where we want to substitute. In our case we used h
in two places: both where we use x
, and the direction says to replace the right hand side of the equality with the left side of the equality.
Actually running this gives
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ y = y ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
7. h : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ h = h ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)) ∈ U{i}
The first goal is the result of our substitution and it’s trivial; auto
will handle this now. The second goal is a little strange. It basically says that the result of our substitution is still a wellformed type. JonPRL’s thought process is something like this
You said you were substituting for things of this type here. However, I know that just because
x : A
doesn’t mean we’re using it in all those spots as if it has typeA
. What if you substitute things equal in top (always equal) for when they’re being used as functions! This would let us prove thatzero ∈ Π(...)
or something silly. Convince me that when we fill in those holes with something of the type you mentioned, the goal is still a type (in a universe).
However, these wellformedness goals usually go away with auto. This completes our theorem in fact.
Theorem succwf : [isect(conat; x. member(csucc(x); conat))] {
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto;
hypsubst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
auto
}.
Okay so we now have something kind of numberish, with zero and successor. But in order to demonstrate that this is the conatural numbers there’s one big piece missing.
The thing that distinguishes the conatural numbers from the inductive variety is the fact that we include infinite terms. In particular, I want to show that Ω (infinitely many csucc
s) belongs in our type.
In order to say Ω in JonPRL we need recursion. Specifically, we want to write
[omega] =def= [csucc(omega)].
But this doesn’t work! Instead, we’ll define the Y combinator and say
Operator omega : ().
[omega] =def= [Y(x.csucc(x))].
So what should this Y
be? Well the standard definition of Y is
Y(F) = (λ x. F (x x)) (λ x. F (x x))
Excitingly, we can just say that in JonPRL; remember that we have a full untyped computation system after all!
Operator Y : (1).
[Y(f)] =def= [ap(lam(x.so_apply(f;ap(x;x)));lam(x.so_apply(f;ap(x;x))))].
This is more or less a direct translation, we occasionally use so_apply
for the reasons I explained above. As a fun thing, try to prove that this is a fixed point, eg that
isect(U{i}; A. isect(fun(A; _.A); f . ceq(Y(f); ap(f; Y(f)))))
The complete proof is in the JonPRL repo under example/computationalequality.jonprl
. Anyways, we now want to prove
Theorem omegawf : [member(omega; conat)] {
}.
Let’s start with the same prelude
{
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
}
Two goals just like before
1. [n] : nat
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(zero; ⋂_ ∈ void. void; _.x.+(unit; x))
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
The goals start to get fun now. I’ve also carefully avoided using reduce
ike we did before. The reason is simple, if we reduce in the second goal, our ih
will reduce as well and we’ll end up completely stuck in a few steps (try it and see). So instead we’re going to finesse it a bit.
First let’s take care of that first goal. We can tell JonPRL to apply some tactics to just the first goal with the focus
tactic
{
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
focus 0 #{reduce 1; auto};
}
Here reduce 1
says “reduce by only one step” since really omega will diverge if we just let it run. This takes care of the first goal leaving just
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Here’s the proof sketch for what’s left
You can stop here or you can see how we actually do this. It’s somewhat tricky. The basic complication is that there’s no builtin tactic for 1. Instead we use a new type called ceq
which is “computational equality”. It ranges between two terms, no types involved here. It’s designed to work thusly if ceq(a; b)
, either
a
and b
run to weakhead normal form (canonical verifications) with the same outermost form, and all the inner operands are ceq
a
and b
both divergeSo if ceq(a; b)
then a
and b
“run the same”. What’s a really cool upshot of this is that if ceq(a; b)
then if a = a ∈ A
and b = b ∈ A
then a = b ∈ A
! ceq
is the strictest equality in our system and we can rewrite with it absolutely everywhere without regards to types. Proving this requires showing the above definition forms a congruence (two things are related if their subcomponents are related).
This was a big deal because until Doug Howe came up with this proof NuPRL/CTT was awash with rules trying to specify this idea chunk by chunk and showing those rules were valid wasn’t trivial. Actually, you should read that paper: it’s 6 pages and the proof concept comes up a lot.
So, in order to do 1. we’re going to say “the goal and the goal if we step it twice are computationally equal” and then use this fact to substitute for the stepped version. The tactic to use here is called csubst
. It takes two arguments
ceq
we’re assertingh.
term to guide the rewrite {
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
focus 0 #{reduce 1; auto};
csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
[h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
}
This leaves us with two goals
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ ceq((λx. inr(x[x]))[λx. inr(x[x])]; inr((λx. inr(x[x]))[λx. inr(x[x])]))
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr((λx. inr(x[x]))[λx. inr(x[x])]) = inr((λx. inr(x[x]))[λx. inr(x[x])]) ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Now we have two goals. The first is that ceq
proof obligation. The second is our goal postsubstitution. The first one can easily be dispatched by step
. step
let’s us prove ceq
by saying
ceq(a; b)
holds ifa
steps to a'
in one stepceq(a'; b)
This will leave us with ceq(X; X)
which auto
can handle. The second term is.. massive. But also simple. We just need to step it once and we suddenly have inr(X) = inr(X) ∈ sum(_; A)
where X = X ∈ A
is our assumption! So that can also be handled by auto
as well. That means we need to run step
on the first goal, reduce 1
on the second, and auto
on both.
Theorem omegawf : [member(omega; conat)] {
unfolds; unfold <omega Y>; auto; elim #1;
focus 0 #{reduce 1; auto};
csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
[h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
[step, reduce 1]; auto
}.
And we’ve just proved that omega ∈ conat
, a term that is certainly the canonical (heh) example of coinduction in my mind.
Whew, I actually meant for this to be a short blog post but that didn’t work out so well. Hopefully this illustrated a cool trick in computer science (intersect your way to coinduction) and in JonPRL.
Funnily enough before this was written no one had actually realized you could do coinduction in JonPRL. I’m still somewhat taken with the fact that a very minimal proof assistant like JonPRL is powerful enough to let you do this by giving you such general purpose tools as family intersection and a full computation system to work with. Okay that’s enough marketing from me.
Cheers.
Huge thanks to Jon Sterling for the idea on how to write this code and then touching up the results
comments powered by Disqus ]]>JonPRL switched to ASCII syntax so I’ve updated this post accordingly
I was just over at OPLSS for the last two weeks. While there I finally met Jon Sterling in person. What was particularly fun is that for that last few months he’s been creating a proof assistant called JonPRL in the spirit of Nuprl. As it turns out, it’s quite a fun project to work on so I’ve implemented a few features in it over the last couple of days and learned more or less how it works.
Since there’s basically no documentation on it besides the readme and of course the compiler so I thought I’d write down some of the stuff I’ve learned. There’s also a completely separate post on the underlying type theory for Nuprl and JonPRL that’s very interesting in its own right but this isn’t it. Hopefully I’ll get around to scribbling something about that because it’s really quite clever.
Here’s the layout of this tutorial
JonPRL is pretty easy to build and install and having it will make this post more enjoyable. You’ll need smlnj
since JonPRL is currently written in SML. This is available in most package managers (including homebrew) otherwise just grab the binary from the website. After this the following commands should get you a working executable
git clone ssh://git@github.com/jonsterling/jonprl
cd jonprl
git submodule init
git submodule update
make
(This is excitingly fast to run)make test
(If you’re doubtful)You should now have an executable called jonprl
in the bin
folder. There’s no prelude for jonprl so that’s it. You can now just feed it files like any reasonable compiler and watch it spew (currently difficulttodecipher) output at you.
If you’re interested in actually writing JonPRL code, you should probably install David Christiansen’s Emacs mode. Now that we’re up and running, let’s actually figure out how the language works
JonPRL is composed of really 3 different sorts of minilanguages
In Coq, these roughly correspond to Gallina, Ltac, and Vernacular respectively.
The term language is an untyped language that contains a number of constructs that should be familiar to people who have been exposed to dependent types before. The actual concrete syntax is composed of 3 basic forms:
op(arg1; arg2; arg3)
.x
.x.e
. JonPRL has one construct for binding x.e
built into its syntax, that things like lam
or fun
are built off of.An operator in this context is really anything you can imagine having a node in an AST for a language. So something like lam is an operator, as is if
or pair
(corresponding to (,)
in Haskell). Each operator has a piece of information associated with it, called its arity. This arity tells you how many arguments an operator takes and how many variables x.y.z. ...
each is allowed to bind. For example, with lam has an arity is written (1)
since it takes 1 argument which binds 1 variable. Application (ap
) has the arity (0; 0)
. It takes 2 arguments neither of which bind a variable.
So as mentioned we have functions and application. This means we could write (lamx.x) y
in JonPRL as ap(lam(x.x); y)
. The type of functions is written with fun
. Remember that JonPRL’s language has a notion of dependence so the arity is (0; 1)
. The construct fun(A; x.B)
corresponds to (x : A) → B
in Agda or forall (x : A), B
in Coq.
We also have dependent sums as well (prod
s). In Agda you would write (M , N)
to introduce a pair and prod A lam x → B
to type it. In JonPRL you have pair(M; N)
and prod(A; x.B)
. To inspect a prod
we have spread
which let’s us eliminate a pair pair. Eg spread(0; 2)
and you give it a prod
in the first spot and x.y.e
in the second. It’ll then replace x
with the first component and y
with the second. Can you think of how to write fst
and snd
with this?
There’s sums, so inl(M)
, inr(N)
and +(A; B)
corresponds to Left
, Right
, and Either
in Haskell. For case analysis there’s decide
which has the arity (0; 1; 1)
. You should read decide(M; x.N; y.P)
as something like
In addition we have unit
and <>
(pronounced axe for axiom usually). Neither of these takes any arguments so we write them just as I have above. They correspond to Haskell’s type and valuelevel ()
respectively. Finally there’s void
which is sometimes called false
or empty
in theorem prover land.
You’ll notice that I presented a bunch of types as if they were normal terms in this section. That’s because in this untyped computation system, types are literally just terms. There’s no typing relation to distinguish them yet so they just float around exactly as if they were lam or something! I call them types because I’m thinking of later when we have a typing relation built on top of this system but for now there are really just terms. It was still a little confusing for me to see fun(unit; _.unit)
in a language without types, so I wanted to make this explicit.
Now we can introduce some more exotic terms. Later, we’re going to construct some rules around them that are going to make it behave that way we might expect but for now, they are just suggestively named constants.
U{i}
, the ith level universe used to classify all types that can be built using types other than U{i}
or higher. It’s closed under terms like fun
and it contains all the types of smaller universes=(0; 0; 0)
this is equality between two terms at a type. It’s a proposition that’s going to precisely mirror what’s going on later in the type theory with the equality judgmentmember(0; 0)
this is just like =
but internalizes membership in a type into the system. Remember that normally “This has that type” is a judgment but with this term we’re going to have a propositional counterpart to use in theorems.In particular it’s important to distinguish the difference between ∈
the judgment and member the term. There’s nothing inherent in member
above that makes it behave like a typing relation as you might expect. It’s on equal footing with flibbertyjibberty(0; 0; 0)
.
This term language contains the full untyped lambda calculus so we can write all sorts of fun programs like
lam(f.ap(lam(x.ap(f;(ap(x;x)))); lam(x.ap(f;(ap(x;x)))))
which is just the Y combinator. In particular this means that there’s no reason that every term in this language should normalize to a value. There are plenty of terms in here that diverge and in principle, there’s nothing that rules out them doing even stranger things than that. We really only depend on them being deterministic, that e ⇒ v
and e ⇒ v'
implies that v = v'
.
The other big language in JonPRL is the language of tactics. Luckily, this is very familiarly territory if you’re a Coq user. Unluckily, if you’ve never heard of Coq’s tactic mechanism this will seem completely alien. As a quick high level idea for what tactics are:
When we’re proving something in a proof assistant we have to deal with a lot of boring mechanical details. For example, when proving A → B → A
I have to describe that I want to introduce the A
and the B
into my context, then I have to suggest using that A
the context as a solution to the goal. Bleh. All of that is pretty obvious so let’s just get the computer to do it! In fact, we can build up a DSL of composable “proof procedures” or /tactics/ to modify a particular goal we’re trying to prove so that we don’t have to think so much about the low level details of the proof being generated. In the end this DSL will generate a proof term (or derivation in JonPRL) and we’ll check that so we never have to trust the actual tactics to be sound.
In Coq this is used to great effect. In particular see Adam Chlipala’s book to see incredibly complex theorems with oneline proofs thanks to tactics.
In JonPRL the tactic system works by modifying a sequent of the form H ⊢ A
(a goal). Each time we run a tactic we get back a list of new goals to prove until eventually we get to trivial goals which produce no new subgoals. This means that when trying to prove a theorem in the tactic language we never actually see the resulting evidence generated by our proof. We just see this list of H ⊢ A
s to prove and we do so with tactics.
The tactic system is quite simple, to start we have a number of basic tactics which are useful no matter what goal you’re attempting to prove
id
a tactic which does nothingt1; t2
this runs the t1
tactic and runs t2
on any resulting subgoals*{t}
this runs t
as long as t
does something to the goal. If t
ever fails for whatever reason it merely stops running, it doesn’t fail itself?{t}
tries to run t
once. If t
fails nothing happens!{t}
runs t
and if t
does anything besides complete the proof it fails. This means that !{id}
for example will always fail.t1  t2
runs t1
and if it fails it runs t2
. Only one of the effects for t1
and t2
will be shown.t; [t1, ..., tn]
first runs t
and then runs tactic ti
on subgoal i
th subgoal generated by t
trace "some words"
will print some words
to standard out. This is useful when trying to figure out why things haven’t gone your way.fail
is the opposite of id
, it just fails. This is actually quite useful for forcing backtracking and one could probably implement a makeshift !{}
as t; fail
.It’s helpful to see this as a sort of tree, a tactic takes one goal to a list of a subgoals to prove so we can imagine t
as this part of a tree
H

———–————————— (t)
H' H'' H'''
If we have some tactic t2
then t; t2
will run t
and then run t2
on H
, H'
, and H''
. Instead we could have t; [t1, t2, t3]
then we’ll run t
and (assuming it succeeds) we’ll run t1
on H'
, t2
on H''
, and t3
on H'''
. This is actually how things work under the hood, composable fragments of trees :)
Now those give us a sort of bedrock for building up scripts of tactics. We also have a bunch of tactics that actually let us manipulate things we’re trying to prove. The 4 big ones to be aware of are
intro
elim #NUM
eqcd
memcd
The basic idea is that intro
modifies the A
part of the goal. If we’re looking at a function, so something like H ⊢ fun(A; x.B)
, this will move that A
into the context, leaving us with H, x : A ⊢ B
.
If you’re familiar with sequent calculus intro
runs the appropriate right rule for the goal. If you’re not familiar with sequent calculus intro
looks at the outermost operator of the A
and runs a rule that applies when that operator is to the right of a the ⊢
.
Now one tricky case is what should intro
do if you’re looking at a prod? Well now things get a bit dicey. We’d might expect to get two subgoals if we run intro
on H ⊢ prod(A; x.B)
, one which proves H ⊢ A
and one which proves H ⊢ B
or something, but what about the fact that x.B
depends on whatever the underlying realizer (that’s the program extracted from) the proof of H ⊢ A
! Further, Nuprl and JonPRL are based around extractstyle proof systems. These mean that a goal shouldn’t depend on the particular piece of evidence proving of another goal. So instead we have to tell intro
up front what we want the evidence for H ⊢ A
to be is so that the H ⊢ B
section may use it.
To do this we just give intro an argument. For example say we’re proving that · ⊢ prod(unit; x.unit)
, we run intro [<>]
which gives us two subgoals · ⊢ member(<>; unit)
and · ⊢ unit
. Here the []
let us denote the realizer we’re passing to intro
. In general any term arguments to a tactic will be wrapped in []
s. So the first goal says “OK, you said that this was your realizer for unit
, but is it actually a realizer for unit
?” and the second goal substitutes the given realizer into the second argument of prod
, x.unit
, and asks us to prove that. Notice how here we have to prove member(<>; unit)
? This is where that weird member
type comes in handy. It let’s us sort of play type checker and guide JonPRL through the process of type checking. This is actually very crucial since type checking in Nuprl and JonPRL is undecidable.
Now how do we actually go about proving member(<>; unit)
? Well here memcd
has got our back. This tactic transforms member(A; B)
into the equivalent form =(A; A; B)
. In JonPRL and Nuprl, types are given meaning by how we interpret the equality of their members. In other words, if you give me a type you have to say
Long ago, Stuart Allen realized we could combine the two by specifying a partial equivalence relation for a type. In this case rather than having a separate notion of membership we check to see if something is equal to itself under the PER because when it is that PER behaves like a normal equivalence relation! So in JonPRL member
is actually just a very thin layer of sugar around =
which is really the core defining notion of typehood. To handle =
we have eqcd
which does clever things to handle most of the obvious cases of equality.
Finally, we have elim
. Just like intro
let us simplify things on the right of the ⊢, elim
let’s us eliminate something on the left. So we tell elim
to “eliminate” the nth item in the context (they’re numbered when JonPRL prints them) with elim #n
.
Just like with anything, it’s hard to learn all the tactics without experimenting (though a complete list can be found with jonprl listtactics
). Let’s go look at the command language so we can actually prove some theorems.
So in JonPRL there are only 4 commands you can write at the top level
Operator
[oper] =def= [term]
(A definition)Tactic
Theorem
The first three of these let us customize and extend the basic suite of operators and tactics JonPRL comes with. The last actually lets us state and prove theorems.
The best way to see these things is by example so we’re going to build up a small development in JonPRL. We’re going to show that products are monoid with unit
up to some logical equivalence. There are a lot of proofs involved here
prod(unit; A)
entails A
prod(A; unit)
entails A
A
entails prod(unit; A)
A
entails prod(A; unit)
prod(A; prod(B; C))
entails prod(prod(A; B); C)
prod(prod(A; B); C)
entails prod(A; prod(B; C))
I intend to prove 1, 2, and 5. The remaining proofs are either very similar or fun puzzles to work on. We could also prove that all the appropriate entailments are inverses and then we could say that everything is up to isomorphism.
First we want a new snazzy operator to signify nondependent products since writing prod(A; x.B)
is kind of annoying. We do this using operator
Operator prod : (0; 0).
This line declares prod
as a new operator which takes two arguments binding zero variables each. Now we really want JonPRL to know that prod
is sugar for prod
. To do this we use =def=
which gives us a way to desugar a new operator into a mess of existing ones.
[prod(A; B)] =def= [prod(A; _.B)].
Now we can change any occurrence of prod(A; B)
for prod(A; _.B)
as we’d like. Okay, so we want to prove that we have a monoid here. What’s the first step? Let’s verify that unit
is a left identity for prod
. This entails proving that for all types A
, prod(unit; A) ⊃ A
and A ⊃ prod(unit; A)
. Let’s prove these as separate theorems. Translating our first statement into JonPRL we want to prove
fun(U{i}; A.
fun(prod(unit; A); _.
A))
In Agda notation this would be written
Let’s prove our first theorem, we start by writing
Theorem leftid1 :
[fun(U{i}; A.
fun(prod(unit; A); _.
A))] {
id
}.
This is the basic form of a theorem in JonPRL, a name, a term to prove, and a tactic script. Here we have id
as a tactic script, which clearly doesn’t prove our goal. When we run JonPRL on this file (Cc Cl if you’re in Emacs) you get back
[XXX.jonprl:8.39.1]: tactic 'COMPLETE' failed with goal:
⊢ funA ∈ U{i}. (prod(unit; A)) => A
Remaining subgoals:
⊢ funA ∈ U{i}. (prod(unit; A)) => A
So focus on that Remaining subgoals
bit, that’s what we have left to prove, it’s our current goal. Now you may notice that this outputted goal is a lot prettier than our syntax! That’s because currently in JonPRL the input and outputted terms may not match, the latter is subject to pretty printing. In general this is great because you can read your remaining goals, but it does mean copying and pasting is a bother. There’s nothing to the left of that ⊢ yet so let’s run the only applicable tactic we know. Delete that id
and replace it with
{
intro
}.
The goal now becomes
Remaining subgoals:
1. A : U{i}
⊢ (prod(unit; A)) => A
⊢ U{i} ∈ U{i'}
Two ⊢s means two subgoals now. One looks pretty obvious, U{i'}
is just the universe above U{i}
(so that’s like Set₁ in Agda) so it should be the case that U{i} ∈ U{i'}
by definition! So the next tactic should be something like [???, memcd; eqcd]
. Now what should that ??? be? Well we can’t use elim
because there’s one thing in the context now (A : U{i}
), but it doesn’t help us really. Instead let’s run unfold <prod>
. This is a new tactic that’s going to replace that prod
with the definition that we wrote earlier.
{
intro; [unfold <prod>, memcd; eqcd]
}
Notice here that ,
behinds less tightly than ;
which is useful for saying stuff like this. This gives us
Remaining subgoals:
1. A : U{i}
⊢ (unit × A) => A
We run intro again
{
intro; [unfold <prod>, memcd; eqcd]; intro
}
Now we are in a similar position to before with two subgoals.
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
⊢ A
1. A : U{i}
⊢ unit × A ∈ U{i}
The first subgoal is really what we want to be proving so let’s put a pin in that momentarily. Let’s get rid of that second subgoal with a new helpful tactic called auto
. It runs eqcd
, memcd
and intro
repeatedly and is built to take care of boring goals just like this!
{
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto]
}
Notice that we used what is a pretty common pattern in JonPRL, to work on one subgoal at a time we use []
’s and id
s everywhere except where we want to do work, in this case the second subgoal.
Now we have
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
⊢ A
Cool! Having a pair of unit × A
really ought to mean that we have an A
so we can use elim
to get access to it.
{
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto];
elim #2
}
This gives us
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
3. s : unit
4. t : A
⊢ A
We’ve really got the answer now, #4 is precisely our goal. For this situations there’s assumption
which is just a tactic which succeeds if what we’re trying to prove is in our context already. This will complete our proof
Theorem leftid1 :
[fun(U{i}; A.
fun(prod(unit; A); _.
A))] {
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto];
elim #2; assumption
}.
Now we know that auto
will run all of the tactics on the first line except unfold <prod>
, so what we just unfold <prod>
first and run auto
? It ought to do all the same stuff.. Indeed we can shorten our whole proof to unfold <prod>; auto; elim #2; assumption
. With this more heavily automated proof, proving our next theorem follows easily.
Theorem rightid1 :
[fun(U{i}; A.
fun(prod(A; unit); _.
A))] {
unfold <prod>; auto; elim #2; assumption
}.
Next, we have to prove associativity to complete the development that prod
is a monoid. The statement here is a bit more complex.
Theorem assoc :
[fun(U{i}; A.
fun(U{i}; B.
fun(U{i}; C.
fun(prod(A; prod(B;C)); _.
prod(prod(A;B); C)))))] {
id
}.
In Agda notation what I’ve written above is
Let’s kick things off with unfold <prod>; auto
to deal with all the boring stuff we had last time. In fact, since x
appears in several nested places we’d have to run unfold
quite a few times. Let’s just shorten all of those invocations into *{unfold <prod>}
{
*{unfold <prod>}; auto
}
This leaves us with the state
Remaining subgoals:
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ A
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ B
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ C
In each of those goals we need to take apart the 4th hypothesis so let’s do that
{
*{unfold <prod>}; auto; elim #4
}
This leaves us with 3 subgoals still
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ A
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ B
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ C
The first subgoal is pretty easy, assumption
should handle that. In the other two we want to eliminate 6 and then we should be able to apply assumption. In order to deal with this we use 
to encode that disjunction. In particular we want to run assumption
OR elim #6; assumption
leaving us with
{
*{unfold <prod>}; auto; elim #4; (assumption  elim #6; assumption)
}
This completes the proof!
Theorem assoc :
[fun(U{i}; A.
fun(U{i}; B.
fun(U{i}; C.
fun(prod(A; prod(B;C)); _.
prod(prod(A;B); C)))))] {
*{unfold <prod>}; auto; elim #4; (assumption  elim #6; assumption)
}.
As a fun puzzle, what needs to change in this proof to prove we can associate the other way?
So we just proved a theorem.. but what really just happened? I mean how did we go from “Here we have an untyped computation system which types just behaving as normal terms” to “Now apply auto
and we’re done!”. In this section I’d like to briefly sketch the path from untyped computation to theorems.
The path looks like this
We start with our untyped language and its notion of computation
We already discussed this in great depth before.
We define a judgment a = b ∈ A
.
This is a judgment, not a term in that language. It exists in whatever metalanguage we’re using. This judgment is defined across 3 terms in our untyped language (I’m only capitalizing A
out of convention). This is supposed to represent that a
and b
are equal elements of type A
. This also gives meaning to typehood: something is a type in CTT precisely when we know what the partial equivalence relation defined by  =  ∈ A
on canonical values is.
Notice here that I said partial. It isn’t the case that a = b ∈ A
presupposes that we know that a : A
and b : A
because we don’t have a notion of :
yet!
In some sense this is where we depart from a type theory like Coq or Agda’s. We have programs already and on top of them we define this 3 part judgment which interacts which computation in a few ways I’m not specifying. In Coq, we would specify one notion of equality, generic over all types, and separately specify a typing relation.
From here we can define the normal judgments of Martin Lof’s type theory. For example, a : A
is a = a ∈ A
. We recover the judgment A type
with A = A ∈ U
(where U
here is a universe).
This means that inhabiting a universe A = A ∈ U
, isn’t necessarily inductively defined but rather negatively generated. We specify some condition a term must satisfy to occupy a universe.
Hypothetical judgments are introduced in the same way they would be in MartinLof’s presentations of type theory. The idea being that H ⊢ J
if J
is evident under the assumption that each term in H
has the appropriate type and furthermore that J
is functional (respects equality) with respect to what H
contains. This isn’t really a higher order judgment, but it will be defined in terms of a higher order hypothetical judgment in the metatheory.
With this we have something that walks and quacks like normal type theory. Using the normal tools of our metatheory we can formulate proofs of a : A
and do normal type theory things. This whole development is building up what is called “Computational Type Theory”. The way this diverges from MartinLof’s extensional type theory is subtle but it does directly descend from MartinLof’s famous 1979 paper “Constructive Mathematics and Computer Programming” (which you should read. Instead of my blog post).
Now there’s one final layer we have to consider, the PRL bit of JonPRL. We define a new judgment, H ⊢ A [ext a]
. This is judgment is cleverly set up so two properties hold
H ⊢ A [ext a]
should entail that H ⊢ a : A
or H ⊢ a = a ∈ A
H ⊢ A [ext a]
, a
is an output and H
and A
are inputs. In particular, this implies that in any inference for this judgment, the subgoals may not use a
in their H
and A
.This means that a
is completely determined by H
and A
which justifies my use of the term output. I mean this in the sense of Twelf and logic programming if that’s a more familiar phrasing. It’s this judgment that we see in JonPRL! Since that a
is output we simply hide it, leaving us with H ⊢ A
as we saw before. When we prove something with tactics in JonPRL we’re generating a derivation, a tree of inference rules which make H ⊢ A
evident for our particular H
and A
! These rules aren’t really programs though, they don’t correspond one to one with proof terms we may run like they would in Coq. The computational interpretation of our program is bundled up in that a
.
To see what I mean here we need a little bit more machinery. Specifically, let’s look at the rules for the equality around the proposition =(a; b; A)
. Remember that we have a term <>
lying around,
a = b ∈ A
————————————————————
<> = <> ∈ =(a; b; A)
So the only member of =(a; b; A)
is <>
if a = b ∈ A
actually holds. First off, notice that <> : A
and <> : B
doesn’t imply that A = B
! In another example, lam(x. x) ∈ fun(A; _.A)
for all A
! This is a natural consequence of separating our typing judgment from our programming language. Secondly, there’s not really any computation in the e
of H ⊢ =(a; b; A) (e)
. After all, in the end the only thing e
could be so that e : =(a; b; A)
is <>
! However, there is potentially quite a large derivation involved in showing =(a; b; A)
! For example, we might have something like this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption
Now we write derivations of this sequent up side down, so the thing we want to show starts on top and we write each rule application and subgoal below it (AI people apparently like this?). Now this was quite a derivation, but if we fill in the missing [ext e]
for this derivation from the bottom up we get this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption [ext x]
Notice how at the bottom there was some computational content (That x
signifies that we’re accessing a variable in our context) but than we throw it away right on the next line! That’s because we find that no matter what the extract was that let’s us derive =(b; a; A)
, the only realizer it could possible generate is <>
. Remember our conditions, if we can make evident the fact that b = a ∈ A
then <> ∈ =(b; a; A)
. Because we somehow managed to prove that b = a ∈ A
holds, we’re entitled to just use <>
to realize our proof. This means that despite our somewhat tedious derivation and the bookkeeping that we had to do to generate that program, that program reflects none of it.
This is why type checking in JonPRL is woefully undecidable: in part, the realizers that we want to type check contain none of the helpful hints that proof terms in Coq would. This also means that extraction from JonPRL proofs is built right into the system and we can actually generate cool and useful things! In Nuprlland, folks at Cornell actually write proofs and use this realizers to run real software. From what Bob Constable said at OPLSS they can actually get these programs to run fast (within 5x of naive C code).
So to recap, in JonPRL we
H ⊢ A
In fact, we can see all of this happen if you call JonPRL from the command line or hit Cc Cc in emacs! On our earlier proof we see
Operator prod : (0; 0).
⸤prod(A; B)⸥ ≝ ⸤A × B⸥.
Theorem leftid1 : ⸤⊢ funA ∈ U{i}. (prod(unit; A)) => A⸥ {
funintro(A.funintro(_.prodelim(_; _.t.t); prod⁼(unit⁼; _.hyp⁼(A))); U⁼{i})
} ext {
lam_. lam_. spread(_; _.t.t)
}.
Theorem rightid1 : ⸤⊢ funA ∈ U{i}. (prod(A; unit)) => A⸥ {
funintro(A.funintro(_.prodelim(_; s._.s); prod⁼(hyp⁼(A); _.unit⁼)); U⁼{i})
} ext {
lam_. lam_. spread(_; s._.s)
}.
Theorem assoc : ⸤⊢ funA ∈ U{i}. funB ∈ U{i}. funC ∈ U{i}. (prod(A; prod(B; C))) => prod(prod(A; B); C)⸥ {
funintro(A.funintro(B.funintro(C.funintro(_.independentprodintro(independentprodintro(prodelim(_;
s.t.prodelim(t; _._.s)); prodelim(_; _.t.prodelim(t;
s'._.s'))); prodelim(_; _.t.prodelim(t; _.t'.t')));
prod⁼(hyp⁼(A); _.prod⁼(hyp⁼(B); _.hyp⁼(C)))); U⁼{i}); U⁼{i});
U⁼{i})
} ext {
lam_. lam_. lam_. lam_. ⟨⟨spread(_; s.t.spread(t; _._.s)), spread(_; _.t.spread(t; s'._.s'))⟩, spread(_; _.t.spread(t; _.t'.t'))⟩
}.
Now we can see that those Operator
and ≝
bits are really what we typed with =def=
and Operator
in JonPRL, what’s interesting here are the theorems. There’s two bits, the derivation and the extract or realizer.
{
derivation of the sequent · ⊢ A
} ext {
the program in the untyped system extracted from our derivation
}
We can move that derivation into a different proof assistant and check it. This gives us all the information we need to check that JonPRL’s reasoning and helps us not trust all of JonPRL (I wrote some of it so I’d be a little scared to trust it :). We can also see the computational bit of our proof in the extract. For example, the computation involved in taking A × unit → A
is just lam_. lam_. spread(_; s._.s)
! This is probably closer to what you’ve seen in Coq or Idris, even though I’d say the derivation is probably more similar in spirit (just ugly and beta normal). That’s because the extract need not have any notion of typing or proof, it’s just the computation needed to produce a witness of the appropriate type. This means for a really tricky proof of equality, your extract might just be <>
! Your derivation however will always exactly reflect the complexity of your proof.
OK, so I’ve just dumped about 50 years worth of hard research in type theory into your lap which is best left to ruminate for a bit. However, before I finish up this post I wanted to do a little bit of marketing so that you can see why one might be interested in JonPRL (or Nuprl). Since we’ve embraced this idea of programs first and types as PERs, we can define some really strange types completely seamlessly. For example, in JonPRL there’s a type ⋂(A; x.B)
, it behaves a lot like fun
but with one big difference, the definition of  =  ∈ ⋂(A; x.B)
looks like this
a : A ⊢ e = e' ∈ [a/x]B
————————————————————————
e = e' ∈ ⋂(A; x.B)
Notice here that e
and e'
may not use a
anywhere in their bodies. That is, they have to be in [a/x]B
without knowing anything about a
and without even having access to it.
This is a pretty alien concept that turned out to be new in logic as well (it’s called “uniform quantification” I believe). It turns out to be very useful in PRL’s because it lets us declare things in our theorems without having them propogate into our witness. For example, we could have said
Theorem rightid1 :
[⋂(U{i}; A.
fun(prod(A; unit); _.
A))] {
unfold <prod>; auto; elim #2; assumption
}.
With the observation that our realizer doesn’t need to depend on A
at all (remember, no types!). Then the extract of this theorem is
lamx. spread(x; s._.s)
There’s no spurious lam _. ...
at the beginning! Even more wackily, we can define subsets of an existing type since realizers need not have unique types
e = e' ∈ A [e/x]P [e'/x]P
————————————————————————————
e = e' ∈ subset(A; x.P)
And in JonPRL we can now say things like “all odd numbers” by just saying subset(nat; n. ap(odd; n))
. In intensional type theories, these types are hard to deal with and still the subject of open research. In CTT they just kinda fall out because of how we thought about types in the first place. Quotients are a similarly natural conception (just define a new type with a stricter PER) but JonPRL currently lacks them (though they shouldn’t be hard to add..).
Finally, if you’re looking for one last reason to dig into **PRL, the fact that we’ve defined all our equalities extensionally means that several very useful facts just fall right out of our theory
Theorem funext :
[⋂(U{i}; A.
⋂(fun(A; _.U{i}); B.
⋂(fun(A; a.ap(B;a)); f.
⋂(fun(A; a.ap(B;a)); g.
⋂(fun(A; a.=(ap(f; a); ap(g; a); ap(B; a))); _.
=(f; g; fun(A; a.ap(B;a))))))))] {
auto; ext; ?{elim #5 [a]}; auto
}.
This means that two functions are equal in JonPRL if and only if they map equal arguments to equal output. This is quite pleasant for formalizing mathematics for example.
Whew, we went through a lot! I didn’t intend for this to be a full tour of JonPRL, just a taste of how things sort of hang together and maybe enough to get you looking through the examples. Speaking of which, JonPRL comes with quite a few examples which are going to make a lot more sense now.
Additionally, you may be interested in the documentation in the README which covers most of the primitive operators in JonPRL. As for an exhaustive list of tactics, well….
Hopefully I’ll be writing about JonPRL again soon. Until then, I hope you’ve learned something cool :)
A huge thanks to David Christiansen and Jon Sterling for tons of helpful feedback on this
comments powered by Disqus ]]>Veering wildly onto the theory side compared to my last post, I’d like to look at some more Twelf code today. Specifically, I’d like to prove a fun theorem called cut admissibility (or elimination) for a particular logic: a simple intuitionistic propositional sequent calculus. I chucked the code for this over here.
If those words didn’t make any sense, here’ an incomplete primer on what we’re doing here. First of all we’re working with a flavor of logic called “sequent calculus”. Sequent calculus describes a class of logics characterized by using studying “sequents”, a sequent is just an expression Γ ⊢ A
saying “A
is true under the assumption that the set of propositions, Γ
, are true”. A sequent calculus defines a couple things
What exactly A
is, a calculus defines what propositions it talks about
For us we’re only interested in a few basic connectives, so our calculus can talk about true
, false
, A
and B
(A ∧ B
), A
or B
(A ∨ B
), and A
implies B
(A ⇒ B
).
Rules for inferring Γ ⊢ A
holds. We can use these inference rules to build up proofs of things in our system.
In sequent calculus there are two sorts of inference rules, left and right. A left rule takes a fact that we know and let’s us reason backwards to other things we must know hold. A right rule let’s us take the thing we’re trying to prove and instead prove smaller, simpler things.
More rules will follow in the Twelf code but for a nice example consider the left and right rules for ∧
,
Γ, A, B ⊢ C
———————————————
Γ, A ∧ B ⊢ C
Γ ⊢ A Γ ⊢ B
———————————————
Γ ⊢ A ∧ B
The left rule says if we know that A ∧ B
is true, we can take it apart and try to prove our goal with assumptions that A
and B
are true. The right rule says to prove that A ∧ B
is true we need to prove A
is true and B
is true. A proof in this system is a true of these rules just like you’d expect in a type theory or natural deduction.
We also tacitly assume a bunch of boring rules called structural rules about our sequents hold, so that we can freely duplicate, drop and swap assumptions in Γ
. For a less awful introduction to sequent calculus Frank Pfenning has some good notes.
Now we want to prove a particular (meta)theorem about sequent calculus
Γ ⊢ A
Γ, A ⊢ B
Γ ⊢ B
This theorem means a couple different things for example, our system is consistent and our system also admits lemmas. As it turns out, proving this theorem is hard. The basic complication is that we don’t know what form either of the first two proofs.
We now formalize our sequent calculus in Twelf. First we declare a type and some constants to represent propositions.
prop : type.
=> : prop > prop > prop. %infix right 4 =>.
true : prop.
false : prop.
/\ : prop > prop > prop. %infix none 5 /\.
\/ : prop > prop > prop. %infix none 5 \/.
Notice here that we use infix
to let us write A /\ B => C
. Having specified these we now define what a proof is in this system. This is structured a little differently than you’d be led to believe from the above. We have an explicit type proof
which is inhabited by “proof terms” which serve as a nice shortcut to those trees generated by inference rules. Finally, we don’t explicitly represent Γ
, instead we have this thing called hyp
which is used to represent a hypothesis in Γ
. Left rules manipulate use these hypotheses and introduce new ones. Pay attention to /\/l
and /\/r
since you’ve seen the handwritten equivalents.
proof : type.
hyp : type.
init : hyp > proof.
=>/r : (hyp > proof) > proof.
=>/l : (hyp > proof) > proof > hyp > proof.
true/r : proof.
false/l : hyp > proof.
/\/r : proof > proof > proof.
/\/l : (hyp > hyp > proof) > hyp > proof.
\//r1 : proof > proof.
\//r2 : proof > proof.
\//l : (hyp > proof) > (hyp > proof) > hyp > proof.
The right rules are at least a little intuitive, the left rules are peculiar. Essentially we have a weird CPSy feel going on here, to decompose a hypothesis you hand the hyp
to the constant along with a continuation which takes the hypotheses you should get out of the decomposition. For example for \/
we have to right rules (think Left
and Right
), then one left rule which takes two continuations and one hyp
(think either
). Finally, that init
thing is the only way to actually take a hypothesis and use it as a proof.
We now want to unite these two pieces of syntax with a typing judgment letting us say that a proof
proves some particular prop
.
of : proof > prop > type.
hof : hyp > prop > type.
of/init : of (init H) A
< hof H A.
of/=>/r : of (=>/r F) (A => B)
< ({h} hof h A > of (F h) B).
of/=>/l : of (=>/l C Arg F) U
< hof F (A => B)
< of Arg A
< ({h} hof h B > of (C h) U).
of/true/r : of true/r true.
of/false/l : of (false/l H) A
< hof H false.
of//\/r : of (/\/r R L) (A /\ B)
< of L A
< of R B.
of//\/l : of (/\/l C H) U
< hof H (A /\ B)
< ({h}{h'} hof h A > hof h' B > of (C h h') U).
of/\//r1 : of (\//r1 L) (A \/ B)
< of L A.
of/\//r2 : of (\//r2 R) (A \/ B)
< of R B.
of/\//l : of (\//l R L H) C
< hof H (A \/ B)
< ({h} hof h A > of (L h) C)
< ({h} hof h B > of (R h) C).
In order to handle hypotheses we have this hof
judgment which handles typing various hyp
s. We introduce it just like we introduce hyp
s in those continuationy things for left rules. Sorry for dumping so much code on you all at once: it’s just a lot of machinery we need to get working in order to actually start formalizing cut.
I would like to point out a few things about this formulation of sequent calculus though. First off, it’s very Twelfy, we use the LF context to host the context of our logic using HOAS. We also basically just have void
as the type of hypothesis! Look, there’s no way to construct a hypothesis, let alone a typing derivation hof
! The idea is that we’ll just wave our hands at Twelf and say "consider our theorem in a context with hyp
s and hof
s with
%block someHofs : some {A : prop} block {h : hyp}{deriv : hof h A}.
In short, Twelf is nifty.
Now we’re almost in a position to state cut admissibility, we want to say something like
cut : of Lemma A
> ({h} hyp h A > of (F h) B)
> of ??? B
But what should that ??? be? We could just say “screw it it’s something” and leave it as an output of this lemma but experimentally (an hour of teeth gnashing later) it’s absolutely not worth the pain. Instead let’s do something clever.
Let’s first define an untyped version of cut
which works just across proofs without mind to typing derivations. We can’t declare this total because it’s just not going to work for illtyped things, we can give it a mode though (it’s not needed) just as mechanical documentation.
cut : proof > (hyp > proof) > proof > type.
%mode cut +A +B C.
The goal here is we’re going to state our main theorem as
of/cut : {A} of P A
> ({h} hof h A > of (F h) B)
> cut P F C
> of C B
> type.
%mode of/cut +A +B +C D E.
Leaving that derivation of cut
as an output. This let’s us produce not just a random term but instead a proof that that term makes “sense” somehow along with a proof that it’s well typed.
cut
is going to mirror the structure of of/cut
so we now need to figure out how we’re going to structure our proof. It turns out a rather nice way to do this is to organize our cuts into 4 categories. The first one are “principle” cuts, they’re the ones where we have a right rule as our lemma and we immediately decompose that lemma in the other term with the corresponding left rule. This is sort of the case that we drive towards everywhere and it’s where the substitution bit happens.
First we have some simple cases
trivial : cut P' ([x] P) P.
p/init1 : cut (init H) ([x] P x) (P H).
p/init2 : cut P ([x] init x) P.
In trivial
we don’t use the hypothesis at all so we’re just “strengthening” here. p/init1
and p/init2
deal with the init
rule on the left or right side of the cut, if it’s on the left we have a hypothesis of the appropriate type so we just apply the function. If it’s on the left we have a proof of the appropriate type so we just return that. In the more interesting cases we have the principle cuts for some specific connectives.
p/=> : cut (=>/r F) ([x] =>/l ([y] C y x) (Arg x) x) Out'
< ({y} cut (=>/r F) (C y) (C' y))
< cut (=>/r F) Arg Arg'
< cut Arg' F Out
< cut Out C' Out'.
p//\ : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
< ({x}{y} cut (/\/r R L) (C x y) (C' x y))
< ({x} cut R (C' x) (Out x))
< cut L Out Out'.
p/\//1 : cut (\//r1 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
< ({x} cut (\//r1 L) (C1 x) (C1' x))
< cut L C1' Out.
p/\//2 : cut (\//r2 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
< ({x} cut (\//r2 L) (C2 x) (C2' x))
< cut L C2' Out.
Let’s take a closer look at p/=>
, the principle cut for =>
. First off, our inputs are =>/r F
and ([x] =>/l ([y] C y x) (Arg x) x)
. The first one is just a “function” that we’re supposed to substitute into the second. Now the second is comprised of a continuation and an argument. Notice that both of these depend on x
! In order to handle this the first two lines of the proof
< ({y} cut (=>/r F) (C y) (C' y))
< cut (=>/r F) Arg Arg'
Are to remove that dependence. We get back a C'
and an Arg'
which doesn’t use the hyp
(x
). In order to do this we just recurse and cut the =>/r F
out of them. Notice that both the type and the thing we’re substituting are the same size, what decreases here is what we’re substituting into. Now we’re ready to actually do some work. First we need to get a term representing the application of F
to Arg'
. This is done with cut since it’s just substitution
< cut Arg' F Out
But this isn’t enough, we don’t need the result of the application, we need the result of the continuation! So we have to cut the output of the application through the continuation
< cut Out C' Out'.
This code is kinda complicated. The typed version of this took me an hour since after 2am I am charitably called an idiot. However this same general pattern holds with all the principle cuts
=>
since it’s just lying about in an input.Try to work through the case for /\
now
p//\ : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
< ({x}{y} cut (/\/r R L) (C x y) (C' x y))
< ({x} cut R (C' x) (Out x))
< cut L Out Out'.
After principle cuts we really just have a number of boring cases whose job it is to recurse. The first of these is called rightist substitution because it comes up if the term on the right (the part using the lemma) has a right rule first. This means we have to hunt in the subterms to go find where we’re actually using the lemma.
r/=> : cut P ([x] (=>/r ([y] F y x))) (=>/r ([y] F' y))
< ({x} cut P (F x) (F' x)).
r/true : cut P ([x] true/r) true/r.
r//\ : cut P ([x] /\/r (R x) (L x)) (/\/r R' L')
< cut P L L'
< cut P R R'.
r/\/1 : cut P ([x] \//r1 (L x)) (\//r1 L')
< cut P L L'.
r/\/2 : cut P ([x] \//r2 (R x)) (\//r2 R')
< cut P R R'.
Nothing here should be surprising keeping in mind that all we’re doing here is recursing. The next set of cuts is called leftist substitution. Here we are actually recursing on the term we’re trying to substitute.
l/=> : cut (=>/l ([y] C y) Arg H) ([x] P x) (=>/l ([x] C' x) Arg H)
< ({x} cut (C x) P (C' x)).
l/false : cut (false/l H) P (false/l H).
l//\ : cut (/\/l ([x][y] C x y) H) P (/\/l ([x][y] C' x y) H)
< ({x}{y} cut (C x y) P (C' x y)).
l/\/ : cut (\//l ([y] R y) ([y] L y) H) ([x] P x)
(\//l ([y] R' y) ([y] L' y) H)
< ({x} cut (L x) P (L' x))
< ({x} cut (R x) P (R' x)).
It’s the same game but just a different target, we’re now recursing on the continuation because that’s where we somehow created a proof of A
. This means that on l/=>
we’re substation left term which has three parts
hyp B
to proof
of C
A
A > B
Now we’re only interesting in how we created that proof of C
, that’s the only relevant part of this substitution. The output of this case is going to have that left rule, =>/l ??? Arg H
so we have where ???
is a replacement of C
that we get by cutting C
through P
“pointwise”. This comes through on the recursive call
< ({x} cut (C x) P (C' x)).
For one more case, consider the left rule for \/
l/\/ : cut (\//l R L H) P
We start by trying to cut a left rule into P
so we need to produce a left rule in the output with different continuations, something like
(\//l R' L' H)
Now what should R'
and L'
be? In order to produce them we’ll throw up a pi so we can get L x
, a proof with the appropriate type to cut again. With that, we can recurse and get back the new continuation we want.
< ({x} cut (L x) P (L' x))
< ({x} cut (R x) P (R' x)).
There’s one last class of cuts to worry about, think about the cases we’ve covered so far
So what happens if we have a left rule on the right and a right rule on the left, but they don’t “match up”. By this I mean that the left rule in that right term works on a different hypothesis than the one that the function it’s wrapped in provides. In this case we just have to recurse some more
lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
< cut P Arg Arg'
< ({y} cut P (C y) (C' y)).
lr//\ : cut P ([x] /\/l ([y][z] C y z x) H) (/\/l C' H)
< ({y}{z} cut P (C y z) (C' y z)).
lr/\/ : cut P ([x] \//l ([y] R y x) ([y] L y x) H) (\//l R' L' H)
< ({y} cut P (L y) (L' y))
< ({y} cut P (R y) (R' y)).
When we have such an occurrence we just do like we did with right rules.
Okay, now that we’ve handled all of these cases we’re ready to type the damn thing.
of/cut : {A} of P A
> ({h} hof h A > of (F h) B)
> cut P F C
> of C B
> type.
%mode of/cut +A +B +C D E.
Honestly this is less exciting than you’d think. We’ve really done all the creative work in constructing the cut
type family. All that’s left to do is check that this is correct. As an example, here’s a case that exemplifies how we verify all leftright commutative cuts.
 : of/cut _ P ([x][h] of/=>/l ([y][yh] C y yh x h) (A x h) H)
(lr/=> CC CA) (of/=>/l C' A' H)
< of/cut _ P A CA A'
< ({y}{yh} of/cut _ P (C y yh) (CC y) (C' y yh)).
We start by trying to show that
lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
< cut P Arg Arg'
< ({y} cut P (C y) (C' y)).
Is type correct. To do this we have a derivation P
that the left term is welltyped. Notice that I’ve overloaded P
here, in the rule lr/=>
P
was a term and now it’s a typing derivation for that term. Next we have a typing derivation for [x] =>/l ([y] C y x) (Arg x) H
. This is a function which takes two arguments. x
is a hypothesis, the same as in lr/=>
, however now we have h
which is a hof
derivation that h
has a type. There’s only one way to type a usage of the left rule for =>
, with of/=>/l
so we have that next.
Finally, our output is on the next line in two parts. First we have a derivation for cut
showing how to construct the “cut out term” in this case. Next we have a new typing derivation that again uses of/=>/l
. Notice that both of these depend on some terms we get from the recursive calls here.
Since we’ve gone through all the cases already and done all the thinking, I’m not going to reproduce it all here. The intuition for how cut works is really best given by the untyped version with the understand that we check that it’s correct with this theorem as we did above.
To recap here’s what we did
Hope that was a little fun, cheers!
comments powered by Disqus ]]>It’s been a while since I did one of these “read a package and write about it” posts. Part of this is that it turns out that most software is awful and writing about code I read just makes me grumpy. However I found something nice to write about! In this post I’d like to close a somewhat embarrassing gap in my knowledge: we’re going to walk through streaming library.
I know that both lists and lazyIO are kind of.. let’s say fragile but have neglected learning one of these fancy libraries that aim to solve those problems. Today we’ll be looking at one of those libraries, pipes!
Pipes provides one core type Proxy
and a few operations on it, like await
and yield
. We can pair together a pipeline of operations which can send data to their neighbors and request more data from them as they need them. With these coroutine like structures we can nicely implement efficient, streaming computations.
As always this starts by getting our hands on the code with the
~ $ cabal get pipes
~ $ cd pipes4.1.5/
Now from here we can query all the available files to see what we’re up against
~/pipes4.1.5 $ wc l **/*.hs  sort nr
4796 total
1513 src/Pipes/Tutorial.hs
854 src/Pipes/Core.hs
836 src/Pipes/Prelude.hs
517 src/Pipes.hs
380 src/Pipes/Lift.hs
272 tests/Main.hs
269 src/Pipes/Internal.hs
85 benchmarks/PreludeBench.hs
68 benchmarks/LiftBench.hs
2 Setup.hs
So the first thing I notice is that there’s this great honking module called Pipes.Tutorial
which houses a brief introduction to the pipes package. I skimmed this before starting but it doesn’t really seem to explain the implementation details.. If you don’t really know what pipes is, read this tutorial now. After doing so you have exactly my knowledge of pipes!
The next interesting module here is Pipes.Internal
. I’ve found that .Internal
modules seem to house the fundamental bits of the package so we’ll start there.
This module starts with an emphatic warning
{ This is an internal module, meaning that it is unsafe to
import unless you understand the risks. }
So this seems like a perfect place to start without really understanding this library :D It exports a few different functions and one type:
I recognize one of those types: Proxy
as the central type behind the whole pipes concept, it is the type of component in the pipe line. Let’s look at how it’s actually implemented
data Proxy a' a b' b m r
= Request a' (a > Proxy a' a b' b m r )
 Respond b (b' > Proxy a' a b' b m r )
 M (m (Proxy a' a b' b m r))
 Pure r
So two of those constructors, M
and Pure
, look pretty vanilla. The first one let’s us lift an action in the underlying monad m
, into Proxy
. It’s a little bit weird instead of having M (m r)
we instead have M (m (Proxy ...))
however this doesn’t seem like a big deal because we have Pure
to promote an r
to a Proxy .... r
. So we can lift some m r
to Proxy a' a b' b m r
with M . fmap Pure
. It’s still not clear to me why this is a benefit though.
The first two constructors are really cool though, Request
and Respond
. The first thing that pops into my head is that this looks like a sort of freemonad pattern. Look how we’ve got
Request
and Respond
are definitely actions)This would make a lot of sense, free monad transformers nicely give rise to coroutines which are very much in line with pipes. Because of this free monad like shape, I expect that the monad instance will be like free monads and behave “like substitution”. We should chase down the leaves of this Proxy
(including under lambdas) and replace each Pure r
with f r
for >>=
and Pure (f a)
for fmap
.
To check if we’re right, we go down one line
instance Monad m => Functor (Proxy a' a b' b m) where
fmap f p0 = go p0 where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > Pure (f r)
This looks like what I had in mind, we run down p
and in the first 3 branches we recurse. I’ll admit it looks a little intimidating but after staring at it for a bit I realized that the first 3 lines are all just variations on fmap go
! Indeed, we can rewrite this to
go p = case p of
Request a' fa > Request a' (fmap go fa)
Respond b fb' > Respond b (fmap go fb')
M m > M (fmap go m)
Pure r > Pure (f r)
This makes the idea a bit clearer in my mind. Let’s look at the applicative instance next!
instance Monad m => Applicative (Proxy a' a b' b m) where
pure = Pure
pf <*> px = go pf where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure f > fmap f px
(*>) = (>>)
First note that pure = Pure
which isn’t a stunner just from the naming. In <*>
we have the same sort of pattern as in fmap
. We race down the “function” side of <*>
and whenever we reach a Pure
we have a function from a > b
, with that function we call fmap
on the structure on the “argument” side. So we’re kind of gluing that px
onto that pf
by changing each Pure f
to fmap f px
.
Finally we have the monad instance. Of course the return
implementation is the same as for pure
but (>>=) = _bind
so the implementation of _bind
has been chucked out of the instance itself. It turns out there’s a good reason for that: _bind
has a bunch of rewrite rules attached to it.
_bind
:: Monad m
=> Proxy a' a b' b m r
> (r > Proxy a' a b' b m r')
> Proxy a' a b' b m r'
p0 `_bind` f = go p0 where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > f r
Now excitingly the implementation of bind
is almost exactly what we had before! Now instead of Pure f > fmap f px
it’s Pure r > f r
so we have something more like substitution than gluing.
Now that Proxy
is a monad, we can make it a monad transformer!
So we need to take an m a
and return Proxy a' a b' b m a
, we want to use M :: m (Proxy a' a b' b m a)
but we have an m a
, by fmap
ing Pure
we’re good to go.
From here on out it’s just a series of not so exciting MTL instances so we’ll skip those.. There’s a couple interesting things left though! Before we get to them recall the monad transformer laws
return = lift . return
m >>= f = lift m >>= (lift . f)
In other words, lift
should “commute” with the two operations of the monad type class. This isn’t actually true by default with Proxy
, for example
return a = Pure a
lift (return a) = M (fmap Pure (return a)) = M (return (Pure a))
To solve this we have observe
. This function is supposed to normalize a Proxy
so that these laws hold.
observe :: Monad m => Proxy a' a b' b m r > Proxy a' a b' b m r
observe p0 = M (go p0) where
go p = case p of
Request a' fa > return (Request a' (\a > observe (fa a )))
Respond b fb' > return (Respond b (\b' > observe (fb' b')))
M m' > m' >>= go
Pure r > return (Pure r)
Note that go
takes a Proxy a' a b' b m r
and returns m (Proxy a' a b' b m r)
. By doing this, we can route stick everything in m
with return
except for M m'
which we just unwrap and keep going. This means return (Pure a) = go (Pure a)
which is what is required for the monad transformer laws to hold.
Finally, the last thing in this file is X
which is used to represent the type for communication that cannot happen. So if we have a pipe at the beginning of the pipeline, it shouldn’t be able to ask for input from another pipe.
And there are no nonbottom expressions which occupy this type so we’re good. Now that we’ve seen the internal implementation of most of Proxy
we can go look at the infrastructure pipes provides around this. Again going by the names, now that we’ve covered the internals it makes sense to move onto Pipes.Core
.
Pipes.Core
seems much closer to the actual user interface of the library, we can see that it exports a bunch of familiar sounding names:
module Pipes.Core (
Proxy
, runEffect
, respond
, (/>/)
, (//>)
, request
, (\>\)
, (>\\)
, push
, (>~>)
, (>>~)
, pull
, (>+>)
, (+>>)
, reflect
, X
, Effect
, Producer
, Pipe
, Consumer
, Client
, Server
, Effect'
, Producer'
, Consumer'
, Client'
, Server'
, (\<\)
, (/</)
, (<~<)
, (~<<)
, (<+<)
, (<\\)
, (//<)
, (<<+)
, closed
) where
Now a few of those we’ve seen before, namely Proxy
, X
, and closed
. Notice that Proxy
is exported abstractly here so we can’t write code which violates the monad transformer laws using this module.
The first new function is called runEffect
, but it has the type
Which sounds great! I however have no clue what an effect is so let’s dig around the type exports first. There are a few type synonyms here
type Effect = Proxy X () () X
type Producer b = Proxy X () () b
type Pipe a b = Proxy () a () b
type Consumer a = Proxy () a () X
type Client a' a = Proxy a' a () X
type Server b' b = Proxy X () b' b
type Effect' m r = forall x' x y' y . Proxy x' x y' y m r
type Producer' b m r = forall x' x . Proxy x' x () b m r
type Consumer' a m r = forall y' y . Proxy () a y' y m r
type Server' b' b m r = forall x' x . Proxy x' x b' b m r
type Client' a' a m r = forall y' y . Proxy a' a y' y m r
Even though this looks like a lot, about half of these are actually duplicates which just use XRankNTypes
instead of explicitly using X
. An Effect
as seen above is Proxy X () () X
.. I had to double check this but proxy takes 6 type arguments, here they are in order
a'
is the type of things that we can send up a Request
a
is the type of things which a request will returnb'
is what we may be sent to respond tob
is what we may respond withm
is the underlying monad we may use for effectsr
is the return valueSo an Effect
can only request things if it can produce an X
, and it will get back a ()
from its requests, and it can only respond with an X
and will get back a ()
after responding. Since we can never produce an X
an Effect
can never request or respond.
Similarly, a Producer
can respond
to things with b
s, but it will only ever get back a ()
after a response and it can never request
something. A Consumer
is the dual, never responding but can request
, it can only hand the code responding a ()
though.
Also in there are Client
s and Server
s which seem to be like a Consumer
and a Producer
but that can actually send meaningful messages with a request
and receive something interesting with a respond
instead of just ()
.
Okay, with these type synonyms in mind let’s go look at some code! Since an Effect
can’t request or respond, it’s really equivalent to just some monadic action.
runEffect :: Monad m => Effect m r > m r
runEffect = go
where
go p = case p of
Request v _ > closed v
Respond v _ > closed v
M m > m >>= go
Pure r > return r
This let’s us write runEffect
which just uses the absurdity of producing a v :: X
in order to turn a Proxy
into an m
.
runEffect
is also the first function we’ve seen to actually escape the Proxy
monad as well! It let’s us convert a selfcontained pipeline into just an effect which should mean it comes up basically everywhere, just like runStateT
.
Since the Proxy
monad is abstract, we need some functions to actually be able to request things. Thus we have respond
This is actually pretty trivial, we have a constructor after all whose job it is to Respond
to things so we just use that with the a
we have as a response. Since we have no interesting continuation yet, but we need something of type a' > Proxy x' x a' a m a'
we just use Pure
. This should be very familiar to users of free monads (remember that Pure
= return
)!
Next is something interesting, we’ve seen a lot of ways of manipulating a pipe, but never actually a way of combining two pipes so that they interact, our next function does that.
(/>/)
:: Monad m
=> (a > Proxy x' x b' b m a')
> (b > Proxy x' x c' c m b')
> (a > Proxy x' x c' c m a')
(fa />/ fb) a = fa a //> fb
Here we have two arguments, both functions to pipelines and we return a pipeline as output. Notice here that the first Proxy
is something which is going to respond
with things of type b
and expect something of type b'
in return and our second function is going to map b
s to a Proxy
which returns a b'
. This means we can replace each Respond
in the first with a call to the second function and pipe the output into our continuation for that Respond
. Indeed this matches up with the return type so I anticipate that it what shall happen. However, this function shells out to another right below it so we’ll have to look at it to confirm.
(//>)
:: Monad m
=> Proxy x' x b' b m a'
> (b > Proxy x' x c' c m b')
> Proxy x' x c' c m a'
p0 //> fb = go p0
where
go p = case p of
Request x' fx > Request x' (\x > go (fx x))
Respond b fb' > fb b >>= \b' > go (fb' b')
M m > M (m >>= \p' > return (go p'))
Pure a > Pure a
The interesting line here is Respond b fb' > ...
which does exactly what I thought it ought to (I feel clever). In that line we run the function we have in the second argument with the data the first argument was Respond
ing with. We sort of “intercept” a message intended for downstream and just handle it right there. Since we do this for all things Respond
ing with b
s we now only respond with c
s hence the change in type. It doesn’t effect the upstream type, but we can now take something producing values and transform them to instead run some other computation (perhaps producing something else).
In a limited case we can do something like
intercept :: Monad m
=> (b > c)
> Proxy a' a b' b m r
> Proxy a' a b' c m r
intercept f p = p //> respond . f
Cool! Now up next seems to be the dual of what we’ve just looked at.
This is just what we had with respond
but using Request
instead. Similarly we ahve a counterpart for />/
. It again shells out to a similar, pointful, function >\\
(\>\)
:: Monad m
=> (b' > Proxy a' a y' y m b)
> (c' > Proxy b' b y' y m c)
> (c' > Proxy a' a y' y m c)
(fb' \>\ fc') c' = fb' >\\ fc' c'
(>\\)
:: Monad m
=> (b' > Proxy a' a y' y m b)
> Proxy b' b y' y m c
> Proxy a' a y' y m c
fb' >\\ p0 = go p0
where
go p = case p of
Request b' fb > fb' b' >>= \b > go (fb b)
Respond x fx' > Respond x (\x' > go (fx' x'))
M m > M (m >>= \p' > return (go p'))
Pure a > Pure a
I’d expect that this function does sort of what the other did before. It’ll take Request
s and “answer” them inline by replacing it with a call to the other function. In fact, when you think about what the hell is the difference between a request and a response? They’re completely symmetric! They both transmit information sending one type in one direction and one type in another. So we should have exactly the same code that just happens to use Request
instead of Respond
. which is indeed what we have.
The only real difference here is in the argument order which hints at the fact that we’re going to break symmetry sooner or later, it just hasn’t happened yet.
Next up is
push
takes a seed a
and chucks it down the pipeline. Once it gets a response, it throws it up the pipeline with Request
and when it gets a response (something of type a
) it starts the whole process over again. Now the process starts by sending values down, there’s no reason why we can’t do the reverse and start by asking for a value
which conveniently is right near by. Now push
and pull
each give rise to a form of composition which takes two Proxy
s and glues them together. The first is
(>~>)
:: Monad m
=> (_a > Proxy a' a b' b m r)
> ( b > Proxy b' b c' c m r)
> (_a > Proxy a' a c' c m r)
This takes two Proxy
s which can communicate with each other and gives back a Proxy
which has internalized this dialogue. This shells out to the pointful version, >>~
(>>~)
:: Monad m
=> Proxy a' a b' b m r
> (b > Proxy b' b c' c m r)
> Proxy a' a c' c m r
p >>~ fb = case p of
Request a' fa > Request a' (\a > fa a >>~ fb)
Respond b fb' > fb' +>> fb b
M m > M (m >>= \p' > return (p' >>~ fb))
Pure r > Pure r
For this code we walk down the tree and recurse in all cases except where we have a Response
. This should send some information to that function we got as an argument and then use that response to continue, so we want some way of taking a Proxy b' b c' c m r
and a b' > Proxy a' a b' b m r
and giving back a Proxy a' a c' c m r
. This looks like the exact dual to >>~
and indeed is the equivalent in the pull
version.
(+>>)
:: Monad m
=> (b' > Proxy a' a b' b m r)
> Proxy b' b c' c m r
> Proxy a' a c' c m r
fb' +>> p = case p of
Request b' fb > fb' b' >>~ fb
Respond c fc' > Respond c (\c' > fb' +>> fc' c')
M m > M (m >>= \p' > return (fb' +>> p'))
Pure r > Pure r
This does the exact opposite of >>~
. It walks around recursing until we get to a Request
, this should transfer control up to that function b' > Proxy ...
and it does by calling >>~
. So these two operators >>+
and >>~
work together to join up to Proxy
functions by using one to answer the other’s Request
and Respond
s. The symmetry breaking here is who should we inspect “first” so to speak. If we start with the upstream one than the second one is only run when a value is push
ed down to it and if we start with the former we only run the upstream version when we pull
something from it. Nifty.
One thing to note, what happens when one of these Proxy
s give up and return
? This potential situation is reflected in the fact that both of these Proxy
s must return an r
. Therefore, whenever one of these returns and we’re currently running it (the upstream for >>~
, downstream for >>+
) we can just return the value and be done with the whole thing. In this sense composing a Proxy
has this short circuiting property, at any point in the pipeline you can just give up and return
something!
Remember before how I was ranting about how Request
and Respond
were really the same damn thing, it turns out I’m not the only one who thought that
reflect :: Monad m => Proxy a' a b' b m r > Proxy b b' a a' m r
reflect = go
where
go p = case p of
Request a' fa > Respond a' (\a > go (fa a ))
Respond b fb' > Request b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > Pure r
Looking at the type here is really telling, all we do to switch the upstream and downstream ends is swap the constructors Request
and Respond
! That actually wraps up the core of pipes, the rest is just a bunch of synonyms with the arguments flipped!
Now that we’ve finished up Pipes.Core
it’s not clear where to go so I decided to go look at the top level Pipes
module since between the .Internal
and .Core
modules we should have covered a lot of it. It turns out the top level only imports those two modules so we can now go through that!
Really the top level package Pipes
just re exports some stuff and defines some thin layers of the rest
module Pipes (
Proxy
, X
, Effect
, Effect'
, runEffect
, Producer
, Producer'
, yield
, for
, (~>)
, (<~)
, Consumer
, Consumer'
, await
, (>~)
, (~<)
, Pipe
, cat
, (>>)
, (<<)
, ListT(..)
, runListT
, Enumerable(..)
, next
, each
, every
, discard
, module Control.Monad
, module Control.Monad.IO.Class
, module Control.Monad.Trans.Class
, module Control.Monad.Morph
, module Data.Foldable
)
Now what haven’t we seen, the first thing is this yield
construct which turns out to be a snazzier name for respond
with a nicer type.
Similarly, for
is just a synonym (//>)
(first joiner we went through) and ~>
is the point free version. On the other end we have stuff overlaying request
and friends but they’re not quite symmetric
await :: Monad m => Consumer' a m a
await = request ()
(>~)
:: Monad m
=> Proxy a' a y' y m b
> Proxy () b y' y m c
> Proxy a' a y' y m c
p1 >~ p2 = (\() > p1) >\\ p2
So we need to cope with the fact request
can actually transfer interesting data down as well as up, in the basic case though we just assume that we’re dealing with ()
s. Also note that >~
is biased to the downstream Proxy
, it starts by running it and whenever we actually request something (by sending up a ()
) we run p1
. This function lets us compose Proxy
s, not functions to Proxy
s so that’s one nice effect.
Finally, we see our first example of a pipe
cat
works by requesting something upstream immediately and passing it downstream. Nothing interesting except that it combines great with other Proxy
s. Say for example we have a random number generator, we can easily create a Proxy
producing random numbers with
we use >~
to replace each request
in cat
with a call to getRandomNumber
which will be immediately pushed downstream. Similarly, we can use cat
to push everything into some computation. If we want to debug a pipe by just printing everything we could say
So cat
is a nice way of lifting something to work across Proxy
s of values if nothing else.
Next is the common case of composing to Proxy
s,
(>>)
:: Monad m
=> Proxy a' a () b m r
> Proxy () b c' c m r
> Proxy a' a c' c m r
p1 >> p2 = (\() > p1) +>> p2
>>
makes it easy to join up to Proxy
s that don’t send any interesting data “up” with requests. >>
starts by running p2
using +>>
and whenever p2
requests something it goes and runs p1
for a while. This function lets us connect a Pipe
to Pipe
or Producer
to Consumer
for example.
Finally, we wrap up this module with the definition of ListT
inside it. Using Producer
we can define a nonbroken version of ListT
newtype ListT m a = Select { enumerate :: Producer a m () }
instance (Monad m) => Functor (ListT m) where
fmap f p = Select (for (enumerate p) (\a > yield (f a)))
instance (Monad m) => Applicative (ListT m) where
pure a = Select (yield a)
mf <*> mx = Select (
for (enumerate mf) (\f >
for (enumerate mx) (\x >
yield (f x) ) ) )
instance (Monad m) => Monad (ListT m) where
return a = Select (yield a)
m >>= f = Select (for (enumerate m) (\a > enumerate (f a)))
fail _ = mzero
What’s kinda nifty here is we just use a Producer
returning a ()
to represent our list. Here we can use for
to access every yield x
which corresponds to our “list” having an entry x
! From there this is really just the standard set of instances for a list! In particular >>=
is mapConcat
for producers. That about wraps up this module and I’ll end the blog post with it.
I didn’t actually go through all of pipes
here, just the “core operations” which everything else is built on top of. In particular, I urge you to go read how Pipes.Prelude
is implemented. Just like implementing the Haskell prelude is a good exercise the same is true of pipes.
It turned out that pipes
isn’t all that awful on the inside, it’s a library built around a specific freemonad like structure which a couple different methods of joining two computations together. In particular there were a few different notions of composition which really define pipes
>>=
Respond
s with another function using //>
or for
in noninfix speakRequest
s with another function using >\\
>>+
and >>~
Hope you learned as much as I did, cheers.
comments powered by Disqus ]]>I’m a fan of articles like this one which set out to explain a really complicated subject in 600 words or less. I wanted to write one with a similar goal for compiling a language like Haskell. To help with this I’ve broken down what most compilers for a lazy language do into 5 different phases and spent 200 words explaining how they work. This isn’t really intended to be a tutorial on how to implement a compiler, I just want to make it less magical.
I assume that you know how a lazy functional language looks (this isn’t a tutorial on Haskell) and a little about how your machine works since I make a few references to how some lower level details are compiled. These will make more sense if you know such things, but they’re not necessary.
And the wordcountclock starts… now.
Our interactions with compilers usually involve treating them as a huge function from string to string. We give them a string (our program) and it gives us back a string (the compiled code). However, on the inside the compiler does all sorts of stuff to that string we gave it and most of those operations are inconvenient to do as string operations. In the first part of the compiler, we convert the string into an abstract syntax tree. This is a data structure in the compiler which represents the string, but in
The process of going String > AST is called “parsing”. It has a lot of (kinda stuffy IMO) theory behind it. This is the only part of the compiler where the syntax actually matters and is usually the smallest part of the compiler.
Examples:
Now that we’ve constructed an abstract syntax tree we want to make sure that the program “makes sense”. Here “make sense” just means that the program’s types are correct. The process for checking that a program type checks involves following a bunch of rules of the form “A has type T if B has type T1 and C has type…”. All of these rules together constitute the type system for our language. As an example, in Haskell f a
has the type T2
if f
has the type T1 > T2
and a
has the type T1
.
There’s a small wrinkle in this story though: most languages require some type inference. This makes things 10x harder because we have to figure the types of everything as we go! Type inference isn’t even possible in a lot of languages and some clever contortions are often needed to be inferrable.
However, once we’ve done all of this the program is correct enough to compile. Past type checking, if the compiler raises an error it’s a compiler bug.
Examples:
Now that we’re free of the constraints of having to report errors to the user things really get fun in the compiler. Now we start simplifying the language by converting a language feature into a mess of other, simpler language features. Sometimes we convert several features into specific instances of one more general feature. For example, we might convert our big fancy pattern language into a simpler one by elaborating each case
into a bunch of nested case
s.
Each time we remove a feature we end up with a slightly different language. This progression of languages in the compiler are called the “intermediate languages” (ILs). Each of these ILs have their own AST as well! In a good compiler we’ll have a lot of ILs as it makes the compiler much more maintainable.
An important part of choosing an IL is making it amenable to various optimizations. When the compiler is working with each IL it applies a set of optimizations to the program. For example
1 + 1
to 2
during compile timeExamples:
At some point in the compiler, we have to deal with the fact we’re compiling a lazy language. One nice way is to use a spineless, tagless, graph machine (STG machine).
How an STG machine works is a little complicated but here’s the gist
During this portion of the compiler, we’d transform out last IL into a Clike language which actually works in terms of pushing, popping, and entering closures.
The key idea here that makes laziness work is that a closure defers work! It’s not a value, it’s a recipe for how to compute a value when we need it. Also note, all calls are tail calls since function calls are just a special case of entering a closure.
Another really beautiful idea in the STG machine is that closures evaluate themselves. This means closures present a uniform interface no matter what, all the details are hidden in that bundled up code. (I’m totally out of words to say this, but screw it it’s really cool).
Examples:
Finally, after converting to compiling STG machine we’re ready to output the target code. This bit is very dependent on what exactly we’re targeting.
If we’re targeting assembly, we have a few things to do. First, we have to switch from using variables to registers. This process is called register allocation and we basically slot each variable into an available register. If we run out, we store variables in memory and load it in as we need it.
In addition to register allocation, we have to compile those Clike language constructs to assembly. This means converting procedures into a label and some instructions, pattern matches into something like a jump table and so on. This is also where we’d apply lowlevel, bittwiddling optimizations.
Examples:
Okay, clock off.
Hopefully that was helpful even if you don’t care that much about lazy languages (most of these ideas apply in any compiler). In particular, I hope that you now believe me when I say that lazy languages aren’t magical. In fact, the worry of how to implement laziness only really came up in one section of the compiler!
Now I have a question for you dear reader, what should I elaborate on? With summer ahead, I’ll have some free time soon. Is there anything else that you would like to see written about? (Just not parsing please)
comments powered by Disqus ]]>An important property in any term rewriting system, a system of rules for saying one term can be rewritten into another, is called confluence. In a term rewriting system more than one rule may apply at a time, confluence states that it doesn’t matter in what order we apply these rules. In other words, there’s some sort of diamond property in our system
Starting Term
/ \
/ \
Rule 1 / \ Rule 2
/ \
/ \
B C
\ /
A bunch \ of / rules later
\ /
\ /
\ /
Same end point
In words (and not a crappy ascii picture)
A
A
to B
A
to C
Then two things hold
B
to D
in some number of rewritesC
to D
with a different series of rewritesIn the specific case of lambda calculus, confluence is referred to as the “ChurchRosser Theorem”. This theorem has several important corollaries, including that the normal forms of any lambda term is unique. To see this, remember that a normal form is always “at the bottom” of diamonds like the one we drew above. This means that if some term had multiple steps to take, they all must converge before one of them reaches a normal form. If any of them did hit a normal form first, they couldn’t complete the diamond.
In this post I’d like to go over a proof of the Church Rosser theorem in Twelf, everyone’s favorite mechanized metalogic. To follow along if you don’t know Twelf, perhaps some shameless self linking will help.
We need to start by actually defining lambda calculus. In keeping with Twelf style, we laugh at those restricted by the bounds of inductive types and use higher order abstract syntax to get binding for free.
term : type.
ap : term > term > term.
lam : (term > term) > term.
We have to constructors, ap
, which applies one term to another. The interesting one here is lam
which embeds the LF function space, term > term
into term
. This actually makes sense because term
isn’t an inductive type, just a type family with a few members. There’s no underlying induction principle with which we can derive contradictions. To be perfectly honest I’m not sure how the proof of soundness of something like Twelf %total mechanism proceeds. If a reader is feeling curious, I believe this is the appropriate paper to read.
With this, something like λx. x x
as lam [x] ap x x
.
Now on to evaluation. We want to talk about things as a term rewriting system, so we opt for a small step evaluation approach.
step : term > term > type.
step/b : step (ap (lam F) A) (F A).
step/ap1 : step (ap F A) (ap F' A)
< step F F'.
step/ap2 : step (ap F A) (ap F A')
< step A A'.
step/lam : step (lam [x] M x) (lam [x] M' x)
< ({x} step (M x) (M' x)).
step* : term > term > type.
step*/z : step* A A.
step*/s : step* A C
< step A B
< step* B C.
We start with the 4 sorts of steps you can make in this system. 3 of them are merely “if you can step somewhere else, you can pull the rewrite out”, I’ve heard these referred to as compatibility rules. This is what ap1
, ap2
and lam
do, lam
being the most interesting since it deals with going under a binder. Finally, the main rule is step/b
which defines beta reduction. Note that HOAS gives us this for free as application.
Finally, step*
is for a series of steps. We either have no steps, or a step followed by another series of steps. Now we want to prove a couple theorems about our system. These are mostly the lifting of the “compatibility rules” up to working on step*
s. The first is the lifting of ap1
.
step*/left : step* F F' > step* (ap F A) (ap F' A) > type.
%mode +{F : term} +{F' : term} +{A : term} +{In : step* F F'}
{Out : step* (ap F A) (ap F' A)} (step*/left In Out).
 : step*/left step*/z step*/z.
 : step*/left (step*/s S* S) (step*/s S'* (step/ap1 S))
< step*/left S* S'*.
%worlds (lamblock) (step*/left _ _).
%total (T) (step*/left T _).
Note, the mode specification I’m using a little peculiar. It needs to be this verbose because otherwise A
modeerrors. Type inference is peculiar.
The theorem says that if F
steps to F'
in several steps, for all A
, ap F A
steps to ap F' A
in many steps. The actual proof is quite boring, we just recurse and apply step/ap1
until everything type checks.
Note that the world specification for step*/left
is a little strange. We use the block lamblock
because later one of our theorem needs this. The block is just
%block lamblock : block {x : term}.
We need to annotate this on all our theorems because Twelf’s world subsumption checker isn’t convinced that lamblock
can subsume the empty worlds we check some of our theorems in. Ah well.
Similarly to step*/left
there is step*/right
. The proof is 1 character off so I won’t duplicate it.
step*/right : step* A A' > step* (ap F A) (ap F A') > type.
Finally, we have step/lam
, the lifting of the compatibility rule for lambdas. This one is a little more fun since it actually works by pattern matching on functions.
step*/lam : ({x} step* (F x) (F' x))
> step* (lam F) (lam F')
> type.
%mode step*/lam +A B.
 : step*/lam ([x] step*/z) step*/z.
 : step*/lam ([x] step*/s (S* x) (S x))
(step*/s S'* (step/lam S))
< step*/lam S* S'*.
%worlds (lamblock) (step*/lam _ _).
%total (T) (step*/lam T _).
What’s fun here is that we’re inducting on a dependent function. So the first case matches [x] step*/z
and the second [x] step*/s (S* x) (S x)
. Other than that we just use step/lam
to lift up S
and recurse to lift up S*
in the second case.
We need one final (more complicated) lemma about substitution. It states that if A
steps to A'
, then F A
steps to F A'
in many steps for all F
. This proceeds by induction on the derivation that A
steps to A'
. First off, here’s the formal statement in Twelf
This is the lemma that actually needs the world with lamblock
s
subst : {F} step A A' > step* (F A) (F A') > type.
%mode subst +A +B C.
Now the actual proof. The first two cases are for constant functions and the identity function
 : subst ([x] A) S step*/z.
 : subst ([x] x) S (step*/s step*/z S).
In the case of the constant functions the results of F A
and F A'
are the same so we don’t need to step at all. In the case of the identity function we just step with the step from A
to A'
.
In the next case, we deal with nested lambdas.
 : subst ([x] lam ([y] F y x)) S S'*
< ({y} subst (F y) S (S* y))
< step*/lam S* S'*.
Here we recurse, but we carefully do this under a pi type. The reason for doing this is because we’re recursing on the open body of the inner lambda. This has a free variable and we need a pi type in order to actually apply F
to something to get at the body. Otherwise this just uses step*/lam
to lift the step across the body to the step across lambdas.
Finally, application.
 : subst ([x] ap (F x) (A x)) S S*
< subst F S F*
< subst A S A*
< step*/left F* S1*
< step*/right A* S2*
< join S1* S2* S*.
This looks complicated, but isn’t so bad. We first recurse, and then use various compatibility lemmas to actually plumb the results of the recursive calls to the right parts of the final term. Since there are two individual pieces of stepping, one for the argument and one for the function, we use join
to slap them together.
With this, we’ve got all our lemmas
%worlds (lamblock) (subst _ _ _).
%total (T) (subst T _ _).
Now that we have all the pieces in place, we’re ready to state and prove confluence. Here’s our statement in Twelf
confluent : step A B > step A C > step* B D > step* C D > type.
%mode confluent +A +B C D.
Unfortunately, there’s a bit of a combinatorial explosion with this. There are approximately 3 * 3 * 3 + 1 = 10 cases for this theorem. And thanks to the lemmas we’ve proven, they’re all boring.
First we have the cases where step A B
is a step/ap1
.
 : confluent (step/ap1 S1) (step/ap1 S2) S1'* S2'*
< confluent S1 S2 S1* S2*
< step*/left S1* S1'*
< step*/left S2* S2'*.
 : confluent (step/ap1 S1) (step/ap2 S2)
(step*/s step*/z (step/ap2 S2))
(step*/s step*/z (step/ap1 S1)).
 : confluent (step/ap1 (step/lam F) : step (ap _ A) _) step/b
(step*/s step*/z step/b) (step*/s step*/z (F A)).
In the first case, we have two ap1
s. We recurse on the smaller S1
and S2
and then immediately use one of our lemmas to lift the results of the recursive call, which step the function part of the the ap
we’re looking at, to work across the whole ap
term. In the second case there, we’re stepping the function in one, and the argument in the other. In order to bring these to a common term we just apply the first step to the resulting term of the second step and vice versa. This means that we’re doing something like this
F A
/ \
S1 / \ S2
/ \
F' A F A'
\ /
S2 \ / S1
\ /
F' A'
This clearly commutes so this case goes through. For the final case, we’re applying a lambda to some term so we can beta reduce. On one side we step the body of the lambda some how, and on the other we immediately substitute. Now we do something clever. What is a proof that lam A
steps to lam B
? It’s a proof that for any x
, A x
steps to B x
. In fact, it’s just a function from x
to such a step A x
to B x
. So we have that lying around in F
. So to step from the betareduced term G A
to G' A
all we do is apply F
to A
! The other direction is just betareducing ap (lam G') A
to the desired G' A
.
In the next set of cases we deal with ap2
!
 : confluent (step/ap2 S1) (step/ap2 S2) S1'* S2'*
< confluent S1 S2 S1* S2*
< step*/right S1* S1'*
< step*/right S2* S2'*.
 : confluent (step/ap2 S1) (step/ap1 S2)
(step*/s step*/z (step/ap1 S2))
(step*/s step*/z (step/ap2 S1)).
 : confluent (step/ap2 S) (step/b : step (ap (lam F) _) _)
(step*/s step*/z step/b) S1*
< subst F S S1*.
The first two cases are almost identical to what we’ve seen before. The key difference here is in the third case. This is again where we’re stepping something on one side and betareducing on the other. We can’t use the nice free stepping provided by F
here since we’re stepping the argument, not the function. For this we appeal to subst
which let’s us step F A
to F A'
using S1*
exactly as required. The other direction is trivial just like it was in the ap1
case, we just have to step ap (lam F) A'
to F A'
which is done with beta reduction.
I’m not going to detail the cases to do with step/b
as the first argument because they’re just mirrors of the cases we’ve looked at before. That only leaves us with one more case, the case for step/lam
.
 : confluent (step/lam F1) (step/lam F2) F1'* F2'*
< ({x} confluent (F1 x) (F2 x) (F1* x) (F2* x))
< step*/lam F1* F1'*
< step*/lam F2* F2'*.
This is just like all the other “diagonal” cases, like confluent (ap1 S1) (ap1 S2) ...
. We first recurse (this time using a pi to unbind the body of the lambda) and then use compatibility rules in order to get something we can give back from confluent
. And with this, we can actually prove that lambda calculus is confluent.
%worlds (lamblock) (confluent _ _ _ _).
%total (T) (confluent T _ _ _).
We went through a fairly significant proof here, but the end results were interesting at least. One nice thing this proof illustrates is how well HOAS lets us encode these proofs. It’s a very Twelfy approach to use lambdas to represent bindings. All in all, it’s a fun proof.
comments powered by Disqus ]]>It’s well known that lambda calculus is an extremely small, Turing Complete language. In fact, most programming languages over the last 5 years have grown some (typed and or broken) embedding of lambda calculus with aptly named lambdas.
This is wonderful and everything but lambda calculus is actually a little complicated. It’s centred around binding and substituting for variables, while this is elegant it’s a little difficult to formalize mathematically. It’s natural to wonder whether we can avoid dealing with variables by building up all our lambda terms from a special privileged few.
These systems (sometimes called combinator calculi) are quite pleasant to model formally, but how do we know that our system is complete? In this post I’d like to go over translating any lambda calculus program into a particular combinator calculus, SK calculus.
SK combinator calculus is a language with exactly 3 types of expressions.
e e
,s
k
Besides the obvious ones, there are two main rules for this system:
s a b c
= (a c) (b c)
k a b
= a
And that’s it. What makes SK calculus so remarkable is how minimal it is. We now show that it’s Turing complete by translating lambda calculus into it.
First things first, let’s just define how to represent both SK calculus and lambda calculus in our Haskell program.
Now we begin by defining a translation from a simplified lambda calculus to SK calculus. This simplified calculus is just SK supplemented with variables. By defining this step, the actual transformation becomes remarkably crisp.
Note that while SKH
has variables, but no way to bind them. In order to remove a variable, we have bracket
. bracket
has the property that replacing Var 0
in a term, e
, with a term, e'
, is the same as SKAp (bracket e) e'
.
 Remove one variable
bracket :: SKH > SKH
bracket (Var' 0) = SKAp' (SKAp' S' K') K'
bracket (Var' i) = Var' (i  1)
bracket (SKAp' l r) = SKAp' (SKAp' S' (bracket l)) (bracket r)
bracket x = x
If we’re at Var 0
we replace the variable with the term s k k
. This has the property that (s k k) A = A
. It’s traditional to abbreviate s k k
as i
(leading to the name SKI calculus) but i
is strictly unnecessary as we can see.
If we’re at an application, we do something really clever. We have two terms which both have a free variable, so we bracket them and use S
to supply the free variable to both of them! Remember that
s (bracket A) (bracket B) C = ((bracket A) C) ((bracket B) C)
which is exactly what we require by the specification of bracket
.
Now that we have a way to remove free variables from an SKH
term, we can close off a term with no free variables to give back a normal SK
term.
close :: SKH > SK
close (Var' _) = error "Not closed"
close S' = S
close K' = K
close (SKAp' l r) = SKAp (close l) (close r)
Now our translator can be written nicely.
l2h :: Lam > SKH
l2h (Var i) = Var' i
l2h (Ap l r) = SKAp' (l2h l) (l2h r)
l2h (Lam h) = bracket (l2h h)
translate :: Lam > SK
translate = close . l2h
l2h
is the main worker in this function. It works across SKH
’s because it needs to deal with open terms during the translation. However, during the process we repeatedly call bracket so every time we go under a binder we call bracket
afterwards, removing the free variable we just introduced.
This means that if we call l2h
on a closed lambda term we get back a closed SKH
term. This justifies using close
after the toplevel call to l2h
in translate
which wraps up our conversion.
For funsies I decided to translate the Y combinator and got back this mess
(s ((s ((s s) ((s k) k))) ((s ((s s) ((s ((s s) k)) k))) ((s ((s s) k)) k))))
((s ((s s) ((s k) k))) ((s ((s s) ((s ((s s) k)) k))) ((s ((s s) k)) k)))
Completely useless, but kinda fun to look at. More interestingly, the canonical nonterminating lambda term is λx. x x
which gives back s i i
, much more readable.
Now that we’ve performed this translation we have a very nice proof of the turing completeness of SK calculus. This has some nice upshots, folks who study things like realizability models of constructive logics use Partial Combinatory Algebras a model of computation. This is essentially an algebraic model of SK calculus.
If nothing else, it’s really quite crazy that such a small language is possible of simulating any computable function across numbers.
comments powered by Disqus ]]>Hello folks. It’s been a busy month so I haven’t had much a chance to write but I think now’s a good time to talk about another compiler related subject: continuation passing style conversion.
When you’re compiling a functional languages (in a sane way) your compiler mostly consists of phases which run over the AST and simplify it. For example in a language with pattern matching, it’s almost certainly the case that we can write something like
Wonderfully concise code. However, it’s hard to compile nested patterns like that. In the compiler, we might simplify this to
note to future me, write a pattern matching compiler
We’ve transformed our large nested pattern into a series of simpler, unnested patterns. The benefit here is that this maps more straight to a series of conditionals (or jumps).
Now one of the biggest decisions in any compiler is what to do with expressions. We want to get rid of complicated nested expressions because chances are our compilation target doesn’t support them. In my second to last post we transformed a functional language into something like SSA. In this post, we’re going to walk through a different intermediate representation: CPS.
CPS is a restriction of how a functional language works. In CPS we don’t have nested expressions anymore. We instead have a series of lets which telescope out and each binds a “flat” expressions. This is the process of “removing expressions” from our language. A compiler probably is targeting something with a much weaker notion of expressions (like assembly) and so we change our tree like structure into something more linear.
Additionally, no functions return. Instead, they take a continuation and when they’re about to return they instead pass their value to it. This means that conceptually, all functions are transformed from a > b
to (a, b > void) > void
. Logically, this is actually a reasonable thing to do. This corresponds to mapping a proposition b
to ¬ ¬ b
. What’s cool here is that since each function call calls a continuation instead of returning its result, we can imagine that each function just transferring control over to some other part of the program instead of returning. This leads to a very slick and efficient way of implementing CPSed function calls as we’ll see.
This means we’d change something like
into
To see what’s going on here we
fact
, its return continuationn  1
into a flat let bindingn
(Note here that we did close over n
, this lambda is a real lambda)k
.The only treestylenesting here comes from the top if
expression, everything else is completely linear.
Let’s formalize this process by converting Simply Typed Lambda Calculus (STLC) to CPS form.
First things first, we specify an AST for normal STLC.
data Tp = Arr Tp Tp  Int deriving Show
data Op = Plus  Minus  Times  Divide
 The Tp in App refers to the return type, it's helpful later
data Exp a = App (Exp a) (Exp a) Tp
 Lam Tp (Scope () Exp a)
 Num Int
 No need for binding here since we have Minus
 IfZ (Exp a) (Exp a) (Exp a)
 Binop Op (Exp a) (Exp a)
 Var a
We’ve supplemented our lambda calculus with natural numbers and some binary operations because it makes things a bit more fun. Additionally, we’re using bound to deal with bindings for lambdas. This means there’s a terribly boring monad instance lying around that I won’t bore you with.
To convert to CPS, we first need to figure out how to convert our types. Since CPS functions never return we want them to go to Void
, the unoccupied type. However, since our language doesn’t allow Void
outside of continuations, and doesn’t allow functions that don’t go to Void
, let’s bundle them up into one new type Cont a
which is just a function from a > Void
. However, this presents us with a problem, how do we turn an Arr a b
into this style of function? It seems like our function should take two arguments, a
and b > Void
so that it can produce a Void
of its own. However, this requires products since currying isn’t possible with the restriction that all functions return Void
! Therefore, we supplement our CPS language with pairs and projections for them.
Now we can write the AST for CPS types and a conversion between Tp
and it.
data CTp = Cont CTp  CInt  CPair CTp CTp
cpsTp :: Tp > CTp
cpsTp (Arr l r) = Cont $ CPair (cpsTp l) (Cont (cpsTp r))
cpsTp Int = CInt
The only interesting thing here is how we translate function types, but we talked about that above. Now for expressions.
We want to define a new data type that encapsulates the restrictions of CPS. In order to do this we factor out our data types into “flat expressions” and “CPS expressions”. Flat expressions are things like values and variables while CPS expressions contain things like “Jump to this continuation” or “Branch on this flat expression”. Finally, there’s let expressions to perform various operations on expressions.
data LetBinder a = OpBind Op (FlatExp a) (FlatExp a)
 ProjL a
 ProjR a
 Pair (FlatExp a) (FlatExp a)
data FlatExp a = CNum Int  CVar a  CLam CTp a (CExp a)
data CExp a = Let a (LetBinder a) (CExp a)
 CIf (FlatExp a) (CExp a) (CExp a)
 Jump (FlatExp a) (FlatExp a)
 Halt (FlatExp a)
Let
s let us bind the results of a few “primitive operations” across values and variables to a fresh variable. This is where things like “incrementing a number” happen. Additionally, in order to create a pair or access its elements we need to us a Let
.
Notice that here application is spelled Jump
hinting that it really is just a jmp
and not dealing with the stack in any way. They’re all jumps we can not overflow the stack as would be an issue with a normal calling convention. To seal of the chain of function calls we have Halt
, it takes a FlatExp
and returns it as the result of the program.
Expressions here are also parameterized over variables but we can’t use bound with them (for reasons that deserve a blogposty rant :). Because of this we settle for just ensuring that each a
is globally unique.
So now instead of having a bunch of nested Exp
s, we have flat expressions which compute exactly one thing and linearize the tree of expressions into a series of flat ones with let binders. It’s still not quite “linear” since both lambdas and if branches let us have something treelike.
We can now define conversion to CPS with one major helper function
This takes an expression, a “continuation” and produces a CExp
. We have some monadgen stuff going on here because we need unique variables. The “continuation” is an actual Haskell function. So our function breaks an expression down to a FlatExp
and then feeds it to the continuation.
The first two cases are easy since variables and numbers are already flat expressions, they go straight into the continuation.
For IfZ
we first recurse on the i
. Then once we have a flattened computation representing i
, we use CIf
and recurse.
cps (Binop op l r) c =
cps l $ \fl >
cps r $ \fr >
gen >>= \out >
Let out (OpBind op fl fr) <$> c (CVar out)
Like before, we use cps
to recurse on the left and right sides of the expression. This gives us two flat expressions which we use with OpBind
to compute the result and bind it to out
. Now that we have a variable for the result we just toss it to the continuation.
cps (Lam tp body) c = do
[pairArg, newCont, newArg] < replicateM 3 gen
let body' = instantiate1 (Var newArg) body
cbody < cps body' (return . Jump (CVar newCont))
c (CLam (cpsTp tp) pairArg
$ Let newArg (ProjL pairArg)
$ Let newCont (ProjR pairArg)
$ cbody)
Converting a lambda is a little bit more work. It needs to take a pair so a lot of the work is projecting out the left component (the argument) and the right component (the continuation). With those two things in hand we recurse in the body using the continuation supplied as an argument. The actual code makes this process look a little out of order. Remember that we only use cbody
once we’ve bound the projections to newArg
and pairArg
respectively.
cps (App l r tp) c = do
arg < gen
cont < CLam (cpsTp tp) arg <$> c (CVar arg)
cps l $ \fl >
cps r $ \fr >
gen >>= \pair >
return $ Let pair (Pair fr cont) (Jump fl (CVar pair))
For application we just create a lambda for the current continuation. We then evaluate the left and right sides of the application using recursive calls. Now that we have a function to jump to, we create a pair of the argument and the continuation and bind it to a name. From there, we just jump to fl
, the function. Turning the continuation into a lambda is a little strange, it’s also we needed an annotation for App
. The lambda uses the return type of the application and constructs a continuation that maps a
to c a
. Note that c a
is a Haskell expressions with the type CExp a
.
With this, we’ve written a nice little compiler pass to convert expressions into their CPS forms. By doing this we’ve “eliminated expressions”. Everything is now flat and evaluation basically proceeds by evaluating one small computation and using the result to compute another and another.
There’s still some things left to compile out before this is machine code though
Once we’ve done these steps we’ve basically written a compiler. However, they’re all influenced by the fact that we’ve compiled out expressions and (really) function calls with our conversion to CPS, it makes the process much much simpler.
CPS conversion is a nice alternative to something like STG machines for lazy languages or SSA for imperative ones. As far as I’m aware the main SML interpreter (SML/NJ) compiles code in this way. As does Ur/Web if I’m not crazy. Additionally, the course entitled “Higher Order, Typed Compilation” which is taught here at CMU uses CPS conversion to make compiling SML really quite pleasant.
In fact, someone (Andrew Appel?) once wrote a paper that noted that SSA and CPS are actually the same. The key difference is that in SSA we merge multiple blocks together using the phi function. In CPS, we just let multiple source blocks jump to the same destination block (continuation). You can see this in our conversion of IfZ
to CPS, instead of using phi
to merge in the two branches, they both just use the continuation to jump to the remainder of the program. It makes things a little simpler because no one person needs to worry about
Finally, if you’re compiling a language like Scheme with call/cc, using CPS conversion makes the whole thing completely trivial. All you do is define call/cc
at the CPS level as
call/cc (f, c) = f ((λ (x, c') → c x), c)
So instead of using the continuation supplied to us in the expression we give to f
, we use the one for the whole call/cc
invocation! This causes us to not return into the body of f
but instead to carry on the rest of the program as if f
had returned whatever value x
is. This is how my old Scheme compiler did things, I put figuring out how to implement call/cc
off for a week before I realized it was a 10 minute job!
Hope this was helpful!
comments powered by Disqus ]]>Inspired by ezyang’s OCaml for Haskellers I decided to write something similar for SML. If you already know OCaml I also recommend Adam Chlipala’s guide
I’ll follow mostly the same structure as Edward’s article so we’ll have
{ Haskell }
(* SML *)
SML and Haskell have quite a lot in common
Common types:
()  Int  Integer  Char  Bool  String  Double  (A, B, C)
unit  int  IntInf.int  char  bool  string  real  (A * B * C)
Literals:
()  1  'a'  True  "hello"  3.14  (1, 2, 3)
()  1  #'a'  true  "hello"  3.14  (1, 2, 3)
Common operators
==  /=  not  &&    ++  !!
=  <>  not  andalso  orelse  ^  String.sub
Type variables:
a b
'a 'AnythingGoes
Function application:
f x y z
f x y z
Lambdas:
\x > ...
fn x => ...
If:
if True then 1 else 0
if true then 1 else 0
Pattern matching
case x of
Nothing > ...
Just a > ...
case x of
NONE => ...
 SOME a => ...
Top level functions support pattern matching in both:
factorial 0 = 1
factorial n = n * factorial (n  1)
fun factorial 0 = 1
 factorial n = n * factorial (n  1)
Top level bindings can be declared without the sugar for currying as well.
f = \x > \y > x
val f = fn x => fn y => x
We can have top level patterns in both as well:
(a, b) = (1, 2)
val (a, b) = (1, 2)
Type synonyms:
type Three a b = (a, a, b)
type ('a, 'b) three = 'a * 'a * 'b
Data types:
data List a = Cons a (List a)  Nil
datatype 'a list = Cons of 'a * 'a list  Nil
Notice that in ML data type constructors can only take on argument. This means they often end up taking a tuple (or record). They are however normal functions unlike in OCaml.
Type annotations:
f :: a > a
f x = x
fun f (x : 'a) : 'a = x
Type annotations for expressions:
(1 + 1 :: Int)
(1 + 1 : int)
Let bindings:
let x = 1 in x + x
let val x = 1 in x + x end
Declare a new mutable reference:
newIORef True
ref true
Modify a mutable reference:
setIORef r False
r := false
Read a mutable reference:
readIORef r
! r
Making exceptions:
data MyExn = Exn String; instance Exception ... where
exception Exn of string
Raising an exception:
throw (Exn "uh oh")
raise Exn "uh oh"
Catching an exception:
catch e $ \(Exn s) > s
e handle Exn s => s
Since SML isn’t a purely functional language, none of the last couple of constructs listed live in anything monadic like their Haskell siblings. The type of r := false
is just unit
, not IO ()
or something.
Aside from the obvious things, like SML being strict so it’s missing pervasive lazy evaluation, SML is missing some things from Haskell.
the biggest gap I stumble across in SML is the lack of higher kinded polymorphism:
data Fix f = Fix (f (Fix f))
datatype 'f fix = Fix of ('f fix) 'f (* Urk, syntax error *)
Even applying a type variable is a syntax error! As this might suggest to you, SML’s type system is much simpler than what we have in Haskell. It doesn’t have a notion of type families, GADTs, fancy kinds, data type promotion, etc, etc. SML is really limited to the areas of the Haskell type system you’d be accustomed to after reading Learn You A Haskell! Just algebraic data types, functions, and polymorphism.
Aside from this, SML doesn’t have guards, nor a lot of syntactic sugar that Haskell has. A nice exception to this is lambda cases, which is written
fn 0 => 1
 1 => 2
 n => 0
Additionally, SML doesn’t have significant indentation which means that occasionally awkward parenthesis is necessary. For example
case x of
true => (case !r of
x => x + 1)
 false => (r := 1; 2)
The parenthesis are mandatory.
On the stranger side, SML has records (discussed later) but they don’t have a functional updating operation. This is a pain to be honest. Also related, SML has a somewhat nasty habit of allowing for adhoc overloading in the way most languages do: certain expressions are “blessed” with unutterable types that must be inferred from context. There are only a few of these, +
, *
, and record accessors being among them. I’m personally not a huge fan, but in practice this is almost never an issue.
Finally ML doesn’t have Haskellstyle type classes. I don’t miss them, some people would.
Aside from the obvious things, like Haskell being lazy so it’s missing pervasive eager evaluation, SML does have a couple of interesting things.
Of course SML has actual modules. I’ve explained a bit about them earlier. This alone is reason enough to write some ML. Additionally, SML has a saner notion of records. Records are a type in and of themselves. This means we can have something like
type coord = {x : int, y : int}
However, since this is just a type synonym we don’t actually need to declare it. Accessors are written #x
to access the field x
from a record. SML doesn’t have a very advanced record system so #x
isn’t typeable. It’s overloaded to access a field from some record and the concrete record must be inferrable from context. This often means that while we can have free floating records, the inference woes make us want to wrap them in constructors like so
data coord = Coord of {x : int, y : int}
This has the nice upshot that record accessors aren’t horrible broken with multiple constructors. Let’s say we had
datatype user = Person {firstName : string, lastName : string}
 Robot {owner : string, guid : int}
We can’t apply #firstName
to an expression of type user
. It’s illtyped since user
isn’t a record, it has a constructor which contains a record. In order to apply #firstName
we have to pattern match first.
Finally, SML has a real, honest to goodness specification. In fact, SML is so well specified it’s been completely mechanized. There is an actual mechanized proof that SML is typesafe. The practical up shot of this is that SML is rock solid. There’s a definitive right answer to what a program should do and that answer is “whatever that one true implementation does”. In fact, there are actually a lot of SML compilers and they’re all reasonably compliant. Two noteworthy ones
Since SML is fully standardized, I general develop with NJ and eventually feed the program into mlton if I intend the thing to run fast.
Also, modules are amazing, have I mentioned modules yet?
So now that we’ve gone through most of the basic syntactic constructs of SML, most ML code should be readable. This is great because there’s some interesting pieces of ML code to read. In particular, these wonderful books are written with ML
I recommend all three of these books heartily. If you’re looking to learn about compilers, the last one in particular is the best introduction I’m aware of. The second one is an in depth look at a trick for compiling strict functional language.
Other general books on ML if you decide you want to give SML a more serious look
The course I’m TAing currently is based around the last one and it’s freely available online which is nice.
Cheers,
comments powered by Disqus ]]>I’m taking the undergraduate course on programming languages at CMU. For the record, I still get really excited about the novelty of taking a class (at school!) on programming languages. I’m easily made happy.
We started talking about System F and before long we touched on the value restriction. Specifically, how most people think of the value restriction incorrectly. To understand why this is, let’s first define the value restriction since it’s probably new to you if you don’t use SML.
In SML there are value level declarations just like in Haskell. We can write things like
and we end up with x
bound to 1
and y
bound to 2
. Note that SML is strict so these bindings are evaluated right as we reach them. Also like in Haskell, SML has polymorphism, so we can write map
And it gets the type ('a > 'b) > ('a list > 'b list)
. Aside from minor syntatic differences, this is pretty much identical to what we’d write in Haskell. The value restriction concerns the intersection of these two things. In SML, the following should not compile under the standard
This is because SML requires that all polymorphic val
bindings be values! In practice all implementations will do something besides this but we’ll just focus on what the standard says. Now the reason for this value restriction is widely misunderstood. Most people believe that the value restrictions
This seems to illustrate a pretty big issue for SML! We’re filling in polymorphic reference with one type and unboxing it with a different one! Clearly this would segfault without the value restriction. However, there’s a catch.
SML is based on System F (did you really think I could get through a blog post without some theory?) which is sometimes called the “polymorphic lambda calculus”. It’s the minimal language with polymorphism and functions. In this language there’s a construct for making polymorphic things: Λ.
In this language we write polymorphism explicitly by saying Λ τ. e
which has the type ∀ t. T
. So for example we write the identity function as
Now SML (and vanilla Haskell) have a limited subset of the power of Λ. Specifically all these lambdas have to appear at the start of a toplevel term. Meaning that they have to be of the form
This is called “prenex” form and is what makes type inference for SML possible. Now since we don’t show anyone the hidden Λs it doesn’t make sense to show them the type application that comes with them and SML infers and adds those for us too. What’s particularly interesting is that SML is often formalized as having this property: values start with Λ and are implicitly applied to the appropriate types where used. Even more interesting, how do you suppose we should evaluate a Λ? What for example, should this code do
val x = Λ τ. raise[τ] Fail (* Meaning raise an exception and say
we have type τ *)
val () = print "I made it here"
val () = x[unit]
It seems clear that Λ should be evaluated just like how we evaluate λ, when we apply it. So I’d (and the formalization of SML) would expect this to print "I made it here"
before throwing that exception. This might now surprise you just by parallels with code like this
However, what about when those lambdas are implicit? In the actual source language of ML our code snippet would be
Uhoh, this really looks like it ought to throw an exception but it apparently doesn’t! More worringly, what about when we have something like
Since x
is never specialized, this doesn’t even throw an error! Yikes! Clearly this is a little confusing. It is however, type safe. Consider our original motivation for the value restriction. With explicit type application
val r = Λ τ. ref[τ] NONE
val () = r[int] := SOME 1
val _ = case !(r[string]) of
SOME s => s
 NONE => ""
Since the body of this function is run every time we do something with r
, we’re just creating a whole bunch of new references in this code! There’s no type safety failure since !(r[string])
returns a fresh ref cell, completely different from the one we modified on the line above! This code always runs the NONE
case. In fact, if this did the wrong thing it’s just a canary in the coal mine, a symptom of the fact that our system evaluates under (big) lambda binders.
So the value restriction is really not at all about type safety, it’s about comprehensibility. Mostly since the fact that a polymorphic expression is evaluated at usage rather than location is really strange. Most documentation seems to be wrong about this, everyone here seems agree that this is unfortunate but such is life.
Now let’s talk about the monomorphism restriction. This is better understood but still worth recapping. In Haskell we have type classes. They let us overload function to behave differently on different types. Everyone’s favoriate example is the type class for numbers which let’s us write
And this works for all numbers, not just int
or something. Under the hood, this works by passing a record of functions like *
, fromInteger
, and 
to make the code work. That =>
is really just a sort of function arrow that happens to only take particular “implicit” records as an argument.
Now what do you suppose the most polymorphic type this code is?
It could potentially work on all numbers so it gets the type
However this is really like a function! This means that fact :: Integer
and fact :: Int
evaluate that computation twice. In fact each time we call fact
we supply a new record and end up evaluating again. This is very costly and also very surprising to most folks. After all, why should something that looks like a normal number evaluate every time we use it! The monomorphism restriction is essentially
(C1, C2 ...) => t
=
This is intended to keep us from the surprise of evaluating a seemingly fully reduced term twice.
Sound familiar? Just like with the value restriction the whole point of the monomorphism restriction is to prevent a hidden function, either type abstraction or type constraints, from causing us to silently and dangerously duplicate work. While neither of them are essential to type safety: without it some really simple looking pieces of code become exponential.
That about covers things. It turns out that both of these restrictions are just patches to cover some surprising areas of the semantics but both are easily understood when you look at the elaborated version. I deliberately went a bit faster through the monomorphism restriction since quite a lot of ink has already been spilled on the subject and unlike the value restriction, most of it is correct :)
As one final note, the way that Haskell handles the monomorphism restriction is precisely how OCaml handles the value restriction: weak polymorphism. Both of them mark the type variables they refuse to generalize as weak type variables. Whenever we first instantiate them to something we go back and retroactively modify the definition to pretend we had used this type all along. In this way, we only evaluate things once but can handle a lot of simple cases of binding a value and using it once.
The more you know.
comments powered by Disqus ]]>Hi folks, the last week or so I was a little tired of schoolwork so I decided to scratch out some fun code. The end result is an extremely small compiler for a typed, higher order functional language called PCF to C. In this post I’ll explain attempt to explain the whole thing, from front to back :)
First things first, it’s important to define the language we’re compiling. The language, PCF short for “partial computable functions”, is an extremely small language you generally find in a book on programming languages, it originates with Plotkin if I’m not mistaken.
PCF is based around 3 core elements: natural numbers, functions (closures), and general recursion. There are two constants for creating numbers, Zero
and Suc
. Zero
is self explanatory and Suc e
is the successor of the natural number e
evaluates to. In most programming languages this just means Suc e = 1 + e
but +
isn’t a primitive in PCF (we can define it as a normal function).
For functions, we have lambdas like you’d find in any functional language. Since PCF includes no polymorphism it’s necessary to annotate the function’s argument with it’s type.
Finally, the weird bit: recursion. In PCF we write recursive things with fix x : τ in e
. Here we get to use x
in e
and we should understand that x
“stands for” the whole expression, fix ...
. As an example, here’s how we define +
.
plus =
fix rec : nat > nat > nat in
λ m : nat.
λ n : nat.
ifz m {
Zero => n
 Suc x => Suc (rec x n)
}
Now compilation is broken up into a bunch of phases and intermediate languages. Even in this small of a compiler there are 3 (countem) languages so along with the source and target language there are 5 different languages running around inside of this compiler. Each phase with the exception of typechecking is just translating one intermediate language (IL) into another and in the process making one small modification to the program as a whole.
This compiler starts with an AST, I have no desire to write a parser for this because parsers make me itchy. Here’s the AST
data Ty = Arr Ty Ty
 Nat
deriving Eq
data Exp a = V a
 App (Exp a) (Exp a)
 Ifz (Exp a) (Exp a) (Scope () Exp a)
 Lam Ty (Scope () Exp a)
 Fix Ty (Scope () Exp a)
 Suc (Exp a)
 Zero
deriving (Eq, Functor, Foldable, Traversable)
What’s interesting here is that our AST uses bound
to manage variables. Unfortunately there really isn’t time to write both a bound tutorial and a PCF compiler one. I’ve written about using bound before here otherwise you can just check out the official docs. The important bits here are that Scope () ...
binds one variable and that a
stands for the free variables in an expression. 3 constructs bind variables here, Ifz
for pattern matching, Fix
for recursive bindings, and Lam
for the argument. Note also that Fix
and Lam
both must be annotated with a type otherwise stuff like fix x in x
and fn x => x
are ambiguous.
First up is type checking. This should be familiar to most people we’ve written a type checker before since PCF is simply typed. We simply have a Map
of variables to types. Since we want to go under binders defined using Scope
we’ll have to use instantiate
. However this demands we be able to create fresh free variables so we don’t accidentally cause clashes. To prevent this we use monadgen to generate fresh free variables.
To warm up, here’s a helper function to check that an expression has a particular type. This uses the more general typeCheck
function which actually produces the type of an expression.
type TyM a = MaybeT (Gen a)
assertTy :: (Enum a, Ord a) => M.Map a Ty > Exp a > Ty > TyM a ()
assertTy env e t = (== t) <$> typeCheck env e >>= guard
This type checks the variable in an environment (something that stores the types of all of the free variables). Once it receives that it compares it to the type we expected and chucks the resulting boolean into guard. This code is used in places like Ifz
where we happen to know that the first expression has the type Nat
.
Now on to the main code, typeCheck
typeCheck :: (Enum a, Ord a) => M.Map a Ty > Exp a > TyM a Ty
typeCheck _ Zero = return Nat
typeCheck env (Suc e) = assertTy env e Nat >> return Nat
The first two cases for typeCheck
are nice and straightforward. All we if we get a Zero
then it has type Nat
. If we get a Suc e
we assert that e
is an integer and then the whole thing has the type Nat
.
For variables we just look things up in the environment. Since this returns a Maybe
it’s nice and easy to just jam it into our MaybeT
.
typeCheck env (App f a) = typeCheck env f >>= \case
Arr fTy tTy > assertTy env a fTy >> return tTy
_ > mzero
Application is a little more interesting. We recurse over the function and make sure it has an actual function type. If it does, we assert the argument has the argument type and return the domain. If it doesn’t have a function type, we just fail.
typeCheck env (Lam t bind) = do
v < gen
Arr t <$> typeCheck (M.insert v t env) (instantiate1 (V v) bind)
typeCheck env (Fix t bind) = do
v < gen
assertTy (M.insert v t env) (instantiate1 (V v) bind) t
return t
Type checking lambdas and fixpoints is quite similar. In both cases we generate a fresh variable to unravel the binder with. We know what type this variable is supposed to have because we required explicit annotations so we add that to the map constituting our environment. Here’s where they diverge.
For a fixpoint we want to make sure that the body has the type as we said it would so we use assertTy
. For a lambda we infer the body type and return a function from the given argument type to the body type.
typeCheck env (Ifz i t e) = do
assertTy env i Nat
ty < typeCheck env t
v < gen
assertTy (M.insert v Nat env) (instantiate1 (V v) e) ty
return ty
For Ifz
we want to ensure that we actually are casing on a Nat
so we use assertTy
. Next we figure out what type the zero branch returns and make sure that the else branch has the same type.
All in all this type checker is not particularly fascinating since all we have are simple types. Things get a bit more interesting with polymorphism. I’d suggest looking at that if you want to see a more interesting type checker.
Now for our first interesting compilation phase, closure conversion. In this phase we make closures explicit by annotating lambdas and fixpoints with the variables that they close over. Those variables are then explicitly bound in the scope of the lambda. With these changes, our new syntax tree looks like this
 Invariant, Clos only contains VCs, can't be enforced statically due
 to annoying monad instance
type Clos a = [ExpC a]
data ExpC a = VC a
 AppC (ExpC a) (ExpC a)
 LamC Ty (Clos a) (Scope Int ExpC a)
 FixC Ty (Clos a) (Scope Int ExpC a)
 IfzC (ExpC a) (ExpC a) (Scope () ExpC a)
 SucC (ExpC a)
 ZeroC
deriving (Eq, Functor, Foldable, Traversable)
The interesting parts are the additions of Clos
and the fact that the Scope
for a lambda and a fixpoint now binds an arbitrary number of variables instead of just one. Here if a lambda or fixpoint binds n
variables, the first n  1
are stored in the Clos
and the last one is the “argument”. Closure conversion is thus just the process of converting an Exp
to an ExpC
.
closConv :: Ord a => Exp a > Gen a (ExpC a)
closConv (V a) = return (VC a)
closConv Zero = return ZeroC
closConv (Suc e) = SucC <$> closConv e
closConv (App f a) = AppC <$> closConv f <*> closConv a
closConv (Ifz i t e) = do
v < gen
e' < abstract1 v <$> closConv (instantiate1 (V v) e)
IfzC <$> closConv i <*> closConv t <*> return e'
Most of the cases here are just recursing and building things back up applicatively. There’s the moderately interesting case where we instantiate the else branch of an Ifz
with a fresh variable and then recurse, but the interesting cases are for fixpoints and lambdas. Since they’re completely identical we only present the case for Fix
.
closConv (Fix t bind) = do
v < gen
body < closConv (instantiate1 (V v) bind)
let freeVars = S.toList . S.delete v $ foldMap S.singleton body
rebind v' = elemIndex v' freeVars <>
(guard (v' == v) *> (Just $ length freeVars))
return $ FixC t (map VC freeVars) (abstract rebind body)
There’s a lot going on here but it boils down into three parts.
n
where n
is the number of free variables.The first is accomplished in much the same way as in the above cases. To gather the number of free variables all we need to is use the readily available notion of a monoid on sets. The whole process is just foldMap S.singleton
! There’s one small catch: we don’t want to put the argument into the list of variables we close over so we carefully delete it from the closure. We then convert it to a list which gives us an actual Clos a
. Now for the third step we have rebind
.
rebind
maps a free variable to Maybe Int
. It maps a free variable to it’s binding occurrence it has one here. This boils down to using elemIndex
to look up somethings position in the Clos
we just built up. We also have a special case for when the variable we’re looking at is the “argument” of the function we’re fixing. In this case we want to map it to the last thing we’re binding, which is just length n
. To capture the “try this and then that” semantics we use the alternative instance for Maybe
which works wonderfully.
With this, we’ve removed implicit closures from our language: one of the passes on our way to C.
Next up we remove both fixpoints and lambdas from being expressions. We want them to have an explicit binding occurrence because we plan to completely remove them from expressions soon. In order to do this, we define a language with lambdas and fixpoints explicitly declared in let expressions. The process of converting from ExpC
to this new language is called “lambda lifting” because we’re lifting things into let bindings.
Here’s our new language.
data BindL a = RecL Ty [ExpL a] (Scope Int ExpL a)
 NRecL Ty [ExpL a] (Scope Int ExpL a)
deriving (Eq, Functor, Foldable, Traversable)
data ExpL a = VL a
 AppL (ExpL a) (ExpL a)
 LetL [BindL a] (Scope Int ExpL a)
 IfzL (ExpL a) (ExpL a) (Scope () ExpL a)
 SucL (ExpL a)
 ZeroL
deriving (Eq, Functor, Foldable, Traversable)
Much here is the same except we’ve romved both lambdas and fixpoints and replaced them with LetL
. LetL
works over bindings which are either recursive (Fix
) or nonrecursive (Lam
). Lambda lifting in this compiler is rather simplistic in how it lifts lambdas: we just boost everything one level up and turn
into
Just like before, this procedure is captured by transforming an ExpC
into an ExpL
.
llift :: Eq a => ExpC a > Gen a (ExpL a)
llift (VC a) = return (VL a)
llift ZeroC = return ZeroL
llift (SucC e) = SucL <$> llift e
llift (AppC f a) = AppL <$> llift f <*> llift a
llift (IfzC i t e) = do
v < gen
e' < abstract1 v <$> llift (instantiate1 (VC v) e)
IfzL <$> llift i <*> llift t <*> return e'
Just like in closConv
we start with a lot of very boring and trivial “recurse and build back up” cases. These handle everything but the cases where we actually convert constructs into a LetL
.
Once again, the interesting cases are pretty much identical. Let’s look at the case for LamC
for variety.
llift (LamC t clos bind) = do
vs < replicateM (length clos + 1) gen
body < llift $ instantiate (VC . (!!) vs) bind
clos' < mapM llift clos
let bind' = abstract (flip elemIndex vs) body
return (LetL [NRecL t clos' bind'] trivLetBody)
Here we first generate a bunch of fresh variables and unbind the body of our lambda. We then recurse on it. We also have to recurse across all of our closed over arguments but since those are variables we know that should be pretty trivial (why do we know this?). Once we’ve straightened out the body and the closure all we do is transform the lambda into a trivial let expression as shown above. Here trivLetBody
is.
Which is just a body that returns the first thing bound in the let. With this done, we’ve pretty much transformed our expression language to C. In order to get rid of the nesting, we want to make one more simplification before we actually generate C.
CWithExpressions is our next intermediate language. It has no notion of nested functions or of fixpoints. I suppose now I should finally fess up to why I keep talking about fixpoints and functions as if they’re the same and why this compiler is handling them identically. The long and short of it is that fixpoints are really a combination of a “fixed point combinator” and a function. Really when we say
It’s as if we had sayed
Where F
is a magical constant with the type
F
calculates the fixpoint of a function. This means that f (F f) = F f
. This formula underlies all recursive bindings (in Haskell too!). In the compiler we basically compile a Fix
to a closure (the runtime representation of a function) and pass it to a C function fixedPoint
which actually calculates the fixed point. Now it might seem dubious that a function has a fixed point. After all, it would seem that there’s no x
so that (λ (x : nat). suc x) = x
right? Well the key is to think of these functions as not ranging over just values in our language, but a domain where infinite loops (bottom values) are also represented. In the above equation, the solution is that x
should be bottom, an infinite loop. That’s why
should loop! There’s actual some wonderful math going on here about how computable functions are continuous functions over a domain and that we can always calculate the least fixed point of them in this manner. The curious reader is encouraged to check out domain theory.
Anyways, so that’s why I keep handling fixpoints and lambdas in the same way, because to me a fixpoint is a lambda + some magic. This is going to become very clear in CWithExpressions (FauxC
from now on) because we’re going to promote both sorts of let bindings to the same thing, a FauxC
toplevel function. Without further ado, here’s the next IL.
 Invariant: the Integer part of a FauxCTop is a globally unique
 identifier that will be used as a name for that binding.
type NumArgs = Int
data BindTy = Int  Clos deriving Eq
data FauxCTop a = FauxCTop Integer NumArgs (Scope Int FauxC a)
deriving (Eq, Functor, Foldable, Traversable)
data BindFC a = NRecFC Integer [FauxC a]
 RecFC BindTy Integer [FauxC a]
deriving (Eq, Functor, Foldable, Traversable)
data FauxC a = VFC a
 AppFC (FauxC a) (FauxC a)
 IfzFC (FauxC a) (FauxC a) (Scope () FauxC a)
 LetFC [BindFC a] (Scope Int FauxC a)
 SucFC (FauxC a)
 ZeroFC
deriving (Eq, Functor, Foldable, Traversable)
The big difference is that we’ve lifted things out of let bindings. They now contain references to some global function instead of actually having the value right there. We also tag fixpoints as either fixing an Int
or a Clos
. The reasons for this will be apparent in a bit.
Now for the conversion. We don’t just have a function from ExpL
to FauxC
because we also want to make note of all the nested lets we’re lifting out of the program. Thus we use WriterT
to gather a lift of toplevel functions as we traverse the program. Other than that this is much like what we’ve seen before.
type FauxCM a = WriterT [FauxCTop a] (Gen a)
fauxc :: ExpL Integer > FauxCM Integer (FauxC Integer)
fauxc (VL a) = return (VFC a)
fauxc (AppL f a) = AppFC <$> fauxc f <*> fauxc a
fauxc ZeroL = return ZeroFC
fauxc (SucL e) = SucFC <$> fauxc e
fauxc (IfzL i t e) = do
v < gen
e' < abstract1 v <$> fauxc (instantiate1 (VL v) e)
IfzFC <$> fauxc i <*> fauxc t <*> return e'
In the first couple cases we just recurse. as we’ve seen before. Things only get interesting once we get to LetL
fauxc (LetL binds e) = do
binds' < mapM liftBinds binds
vs < replicateM (length binds) gen
body < fauxc $ instantiate (VL . (!!) vs) e
let e' = abstract (flip elemIndex vs) body
return (LetFC binds' e')
In this case we recurse with the function liftBinds
across all the bindings, then do what we’ve done before and unwrap the body of the let and recurse in it. So the meat of this transformation is in liftBinds
.
where liftBinds (NRecL t clos bind) = lifter NRecFC clos bind
liftBinds (RecL t clos bind) = lifter (RecFC $ bindTy t) clos bind
lifter bindingConstr clos bind = do
guid < gen
vs < replicateM (length clos + 1) gen
body < fauxc $ instantiate (VL . (!!) vs) bind
let bind' = abstract (flip elemIndex vs) body
tell [FauxCTop guid (length clos + 1) bind']
bindingConstr guid <$> mapM fauxc clos
bindTy (Arr _ _) = Clos
bindTy Nat = Int
To lift a binding all we do is generate a globally unique identifier for the toplevel. Once we have that we that we can unwrap the particular binding we’re looking at. This is going to comprise the body of the TopC
function we’re building. Since we need it to be FauxC
code as well we recurse on it. No we have a bunch of fauxC code for the body of the toplevel function. We then just repackage the body up into a binding (a FauxCTop
needs one) and use tell
to make a note of it. Once we’ve done that we return the stripped down let binding that just remembers the guid that we created for the toplevel function.
In an example, this code transformers
into
With this done our language is now 80% of the way to C!
Converting our fauxC language to actual C has one complication: C doesn’t have let
expressions. Given this, we have to flatten out a fauxC expression so we can turn a let expression into a normal C declaration. This conversion is almost a conversion to single static assignment form, SSA. I say almost because there’s precisely one place where we break the single assignment discipline. This is just because it seemed rather pointless to me to introduce an SSA IL with φ just so I could compile it to C. YMMV.
This is what LLVM uses for its intermediate language and because of this I strongly suspect regearing this compiler to target LLVM should be pretty trivial.
Now we’re using a library called cdsl to make generating the C less painful, but there’s still a couple of things we’d like to add. First of all, all our names our integers so we have i2e
and i2d
for converting an integer into a C declaration or an expression.
i2d :: Integer > CDeclr
i2d = fromString . ('_':) . show
i2e :: Integer > CExpr
i2e = var . fromString . ('_':) . show
We also have a shorthand for the type of all expression in our generated C code.
Finally, we have our writer monad and helper function for implementing the SSA conversion. We write C99 block items and use tellDecl
binding an expression to a fresh variable and then we return this variable.
type RealCM = WriterT [CBlockItem] (Gen Integer)
tellDecl :: CExpr > RealCM CExpr
tellDecl e = do
i < gen
tell [CBlockDecl $ decl taggedTy (i2d i) $ Just e]
return (i2e i)
Next we have the conversion procedure. Most of this is pretty straightforward because we shell out to calls in the runtime system for all the hardwork. We have the following RTS functions
mkZero
, create a zero valueinc
, increment an integer valuedec
, decrement an integer valueapply
, apply a closure to an argumentmkClos
, make a closure with a closing over some valuesEMPTY
, an empty pointer, useful for default valuesisZero
, check if something is zerofixedPoint
, find the fixed point of functionINT_SIZE
, the size of the runtime representation of an integerCLOS_SIZE
, the size of the runtime representation of a closureMost of this code is therefore just converting the expression to SSA form and using the RTS functions to shell do the appropriate computation at each step. Note that cdsl provides a few overloaded string instances and so to generate the C code to apply a function we just use "foo"#[1, "these", "are", "arguments"]
.
The first few cases for conversion are nice and straightforward.
realc :: FauxC CExpr > RealCM CExpr
realc (VFC e) = return e
realc (AppFC f a) = ("apply" #) <$> mapM realc [f, a] >>= tellDecl
realc ZeroFC = tellDecl $ "mkZero" # []
realc (SucFC e) = realc e >>= tellDecl . ("inc"#) . (:[])
We take advantage of the fact that realc
returns it’s result and we can almost make this look like the applicative cases we had before. One particularly slick case is how Suc
works. We compute the value of e
and apply the result to suc
. We then feed this expression into tellDecl
which binds it to a fresh variable and returns the variable. Haskell is pretty slick.
realc (IfzFC i t e) = do
outi < realc i
deci < tellDecl ("dec" # [outi])
let e' = instantiate1 (VFC deci) e
(outt, blockt) < lift . runWriterT $ (realc t)
(oute, blocke) < lift . runWriterT $ (realc e')
out < tellDecl "EMPTY"
let branch b tempOut =
CCompound [] (b ++ [CBlockStmt . liftE $ out < tempOut]) undefNode
ifStat =
cifElse ("isZero"#[outi]) (branch blockt outt) (branch blocke oute)
tell [CBlockStmt ifStat]
return out
In this next case we’re translating Ifz
. For this we obviously need to compute the value of i
. We do that by recursing and storing the result in outi
. Now we want to be able to use 1 less than the value of i
in case we go into the successor branch. This is done by calling dec
on outi
and storing it for later.
Next we do something a little odd. We recurse on the branches of Ifz
but we definitely don’t want to compute both of them! So we can’t just use a normal recursive call. If we did they’d be added to the block we’re building up in the writer monad. So we use lift . runWriterT
to give us back the blocks without adding them to the current one we’re building. Now it’s just a matter of generating the appropriate if
statement.
To do this we add one instruction to the end of both branches, to assign to some output variable. This ensures that no matter which branch we go down we’ll end up the result in one place. This is also the one place where we are no longer doing SSA. Properly speaking we should write this with a φ but who has time for that? :)
Finally we build add the if statement and the handful of declarations that precede it to our block. Now for the last case.
realc (LetFC binds bind) = do
bindings < mapM goBind binds
realc $ instantiate (VFC . (bindings !!)) bind
where sizeOf Int = "INT_SIZE"
sizeOf Clos = "CLOS_SIZE"
goBind (NRecFC i cs) =
("mkClos" #) <$> (i2e i :) . (fromIntegral (length cs) :)
<$> mapM realc cs
>>= tellDecl
goBind (RecFC t i cs) = do
f < ("mkClos" #) <$> (i2e i :) . (fromIntegral (length cs) :)
<$> mapM realc cs
>>= tellDecl
tellDecl ("fixedPoint"#[f, sizeOf t])
For our last case we have to deal with lets. For this we simply traverse all the bindings which are now flat and then flatten the expression under the binder. When we mapM
over the bindings we actually get back a list of all the expressions each binding evaluated to. This is perfect for use with instantiate
making the actual toplevel function quite pleasant. goBind
is slightly less so.
In the nonrecursive case all we have to do is create a closure. So goBind
of a nonrecursive binding shells out to mkClos
. This mkClos
is applied to the number of closed over expressions as well as all the closed over expressions. This is because mkClos
is variadic. Finally we shove the result into tellDecl
as usual. For a recursive call there’s a slight difference, namely after doing all of that we apply fixedPoint
to the output and to the size of the type of the thing we’re fixing. This is why we kept types around for these bindings! With them we can avoid dragging the size with every value since we know it statically.
Next, we have a function for converting a faux C function into an actual function definition. This is the function that we use realc
in.
topc :: FauxCTop CExpr > Gen Integer CFunDef
topc (FauxCTop i numArgs body) = do
binds < gen
let getArg = (!!) (args (i2e binds) numArgs)
(out, block) < runWriterT . realc $ instantiate getArg body
return $
fun [taggedTy] ('_' : show i) [decl taggedTy $ ptr (i2d binds)] $
CCompound [] (block ++ [CBlockStmt . creturn $ out]) undefNode
where indexArg binds i = binds ! fromIntegral i
args binds na = map (VFC . indexArg binds) [0..na  1]
This isn’t the most interesting function. We have one array of arguments to our C function, and then we unbind the body of the FauxC function by indexing into this array. It’s not explicitly stated in the code but the array contains the closed over expressions for the first n  1 entries and the nth is the actual argument to the function. This is inline with how the variables are actually bound in the body of the function which makes unwrapping the body to index into the argument array very simple. We then call realc
which transforms our fauxc expression into a block of actual C code. We add one last statement to the end of the block that returns the final outputted variable. All that’s left to do is bind it up into a C function and call it a day.
Finally, at the end of it all we have a function from expression to Maybe CTranslUnit
, a C program.
compile :: Exp Integer > Maybe CTranslUnit
compile e = runGen . runMaybeT $ do
assertTy M.empty e Nat
funs < lift $ pipe e
return . transUnit . map export $ funs
where pipe e = do
simplified < closConv e >>= llift
(main, funs) < runWriterT $ fauxc simplified
i < gen
let topMain = FauxCTop i 1 (abstract (const Nothing) main)
funs' = map (i2e <$>) (funs ++ [topMain])
(++ [makeCMain i]) <$> mapM topc funs'
makeCMain entry =
fun [intTy] "main"[] $ hBlock ["call"#[i2e entry]]
This combines all the previous compilation passes together. First we typecheck and ensure that the program is a Nat
. Then we closure convert it and immediately lambda lift. This simplified program is then fed into fauxc
giving a fauxc
expression for main and a bunch of functions called by main. We wrap up the main expression in a function that ignores all it’s arguments. We then map realc
over all of these fauxc functions. This gives us actual C code. Finally, we take on a trivial C main to call the generated code and return the whole thing.
And that’s our PCF compiler.
Well if you’ve made it this far congratulations. We just went through a full compiler from a typed higher order language to C. Along the way we ended up implementing
If you’d like to fiddle a bit more, some fun project might be
Cheers,
comments powered by Disqus ]]>In this post I wanted to focus on one particular thing in Twelf: %worlds
declarations. They seems to be the most mysterious. I’ve had a couple people tell me that they just blindly stick %worlds () (x _ _ _)
before every total and pray which is a little concerning..
In this post hopefully we’ll remove some of the “compilenpray” from using Twelf code.
In Twelf we’re interested in proving theorems. These theorems are basically proven by some chunk of code that looks like this.
mycooltyfam : with > some > cool > args > type.
%mode mycooltyfam +A +B +C D.
some : ... > mycooltyfam A B C D.
constructors : ... > mycooltyfam A B C D.
%worlds (...) (mycooltyfam _ _ _ _).
%total (T) (mycooltyfam T _ _ _).
What’s interesting here is the 3 directives we needed
%mode
to specify which arguments of the type family are universally quantified and which are existentially qualified in our theorem. This specifies the “direction” of the type family, + arguments are inputs and  arguments are outputs.%total
which actually goes and proves the theorem by induction on the canonical forms of the term in the parens.%worlds
which specifies the set of contexts to check the totality in. Note that a world is simply a set of contexts.The one we’re interested in talking about here is %worlds
. Everything we want to call %total
has to have on of these and as mentioned above it specifies the contexts to check the theorem in. Remember that total is proven by induction over the canonical forms. One of the canonical forms for every type is off the form
For some
x : ty ∈ Γ
, thenx
is a canonical form ofty
.
This is a little different than in other languages. We could usually just invert upon something in the context. That’s not the case in Twelf, we have to handle variables parametrically (this is critical to admitting HOAS and similar). This means that means we have to extremely careful about what’s in Γ lest we accidentally introduce something canonical form of ty
without any additional information about it. The worlds specification tells us about the forms Γ
can take. Twelf allows us to specify sets of contexts that are “regular”.
So for example remember how plus
might be defined.
plus : nat > nat > nat > type.
%mode plus +N +M P.
plus/z : plus z N N.
plus/s : plus N M P > plus (s N) M (s P).
This is total in the empty context. If we added some b : nat
to our context then we have no way of showing it is either a s
or a z
! This means that there’s a missing case for variables of type nat
in our code. In order to exclude this impossible case we just assert that we only care about plus
’s totality in the empty context. This is what the %worlds
specification for plus
stipulates
%worlds () (plus _ _ _).
should be read as “plus
should only be considered in the empty context” so the only canonical forms of plus
are those specified as constants in our signature. This sort of specification is what we want for most vanilla uses of Twelf.
For most cases we want to be proving theorems in the empty context because we do nothing to extend the context in our constructors. That’s not to say that we can’t specify some nonempty world. We can specify a world where there is a b : nat
, but if such a b
must appear we have a derivation {a} plus b a z
. This way when Twelf goes to check the canonical forms case for something in our context, b : nat
, it knows that there’s a derivation that precisely matches what we need. I’ll circle back to this in a second, but first we have to talk about how to specify fancier worlds.
In Twelf there’s some special syntax for specifying worlds. Basically we can specify a template for some part of the world, called a block. A world declaration is just a conglomeration of blocks and Twelf will interpret this as a world of contexts in which each block may appear zero or more times.
In Twelf code we specify a block with the following syntax
%block block_name : block {a : ty} ... {b : ty'}.
This specifies that if there is an a : ty
in the context, it’s going to be accompanied by a bunch of other stuff including a b : ty'
. Some blocks are pretty trivial. For example, if we wanted to allow plus
to be defined in a context with some a : nat
in the context we might say
%block random_nat : block {b : nat}.
%worlds (random_nat) (plus _ _ _).
This doesn’t work though. If we ask Twelf to check totality it’ll get angry and say
Coverage error  missing cases:
{#random_nat:{b:nat}} {X1:nat} {X2:nat}  plus #random_nat_b X1 X2.
In human,
You awful person Danny! You’re missing the case where you have to random integers and the random natural number
b
from the random_nat block and we want to computeplus b X X'
.
Now there are a few things to do here. The saner person would probably just say “Oh, I clearly don’t want to try to prove this theorem in a nonempty context”. Or we can wildly add things to our context in order to patch this hole. In this case, we need some proof that about adding b
to other stuff. Let’s supplement our block
%block random_nat : block {b : nat}{_:{a} plus b a z}
Such a context is pretty idiotic though since there isn’t a natural number that can satisfy it. It is however enough to sate the totality checker.
%total (T) (plus T _ _).
For a non contrived for example let’s discuss where interesting worlds come into play: with higher order abstract syntax. When we use HOAS we end up embedding the LF function space in our terms. This is important because it means as we go to prove theorems about it we end up recursing on a term under an honest to goodness LF lambda. This means we extend the context at some points in our proof and we can’t just prove theorems in the empty context!
To see this in action here’s an embedding of the untyped lambda calculus in LF
term : type.
lam : (term > term) > term.
app : term > term > term.
Now let’s say we want to determine how many binders are in a lambda term. We start by defining our relation
nbinds : term > nat > type.
%mode nbinds +T N.
We set this type family up so that it has one input (the term) and one output (a nat representing the number of binders). We have two cases to deal with here
nbinds/lam : nbinds (lam F) (s N)
< ({x : term} nbinds (F x) N).
nbinds/app : nbinds (app F A) O
< nbinds F N1
< nbinds A N2
< plus N1 N2 O.
In the lam
case we recurse under the binder. This is the interesting thing here, we stick the recurse call under a pi binder. This gives us access to some term x
which we apply the LF function two. This code in effect says "If for all terms F
has N
binders then lam F
has N + 1
binders. The app
case just sums the two binders.
We can try to world check this in only the empty context but this fails with
Error:
While checking constant nbinds/lam:
World violation for family nbinds: {x:term} </: 1
This says that even though we promised never to extend the LF context we did just that! To fix this we must have a fancier world. We create a block which just talks about adding a term to the context.
%block nbinds_block : block {x : term}.
%worlds (nbinds_block) (nbinds _ _).
This world checks but there’s another issue lurking about. Let’s try to ask Twelf to prove totality.
%total (T) (nbinds T _).
This spits out the error message
Coverage error  missing cases:
{#nbinds_block:{x:term}} {X1:nat}  nbinds #nbinds_block_x X1.
This is the same error as before! Now that we’ve extended our context with a term we need to somehow be able to tell Twelf the height of that term. This smacks of the slightly fishy type of nbinds/lam
: it’s meaning is that F x
has the height N
for any term x
. This seems a little odd, why doesn’t the height of a functions body depend on its argument? We really ought to be specifying that whatever this x
is, we know its height is z
. This makes our new code
nbinds/lam : nbinds (lam F) (s N)
< ({x : term}{_ : nbinds x z} nbinds (F x) N).
Now we specify that the height of x
is zero. This means we have to change our block to
%block nbinds_block : block {x : term}{_ : nbinds x z}.
With this modification else everything goes through unmodified. For fun, we can ask Twelf to actually compute some toy examples.
%solve deriv : nbinds (lam ([x] (lam [y] x))) N.
This gives back that deriv : nbinds (lam ([x] lam ([y] x))) (s (s z))
as we’d hope. It’s always fun to run our proofs.
Hopefully that clears up some of the mystery of worlds in Twelf. Happily this doesn’t come up for a lot of simple uses of Twelf. As far as I know the entire constructive logic course at CMU sidesteps the issue with a quick “Stick %worlds () (...)
before each totality check”.
It is completely invaluable if you’re doing anything under binders which turns out to be necessary for most interesting proofs about languages with binders. If nothing else, the more you know..
Those who enjoyed this post might profit from Dan Licata and Bob Harper’s paper on mechanizing metatheory.
Cheers,
comments powered by Disqus ]]>A couple of days ago I wrote a small implementation of a type inferencer for a mini ML language. It turns out there are very few explanations of how to do this properly and the ones that exist tend to be the really naive, super exponential algorithm. I wrote the algorithm in SML but nothing should be unfamiliar to the average Haskeller.
Type inference breaks down into essentially 2 components
We inspect the program we’re trying to infer a type for and generate a bunch of statements (constraints) which are of the form
This type is equal to this type
These types have “unification variables” in them. These aren’t normal ML/Haskell type variables. They’re generated by the compiler, for the compiler, and will eventually be filled in with either
They should be thought of as holes in an otherwise normal type. For example, if we’re looking at the expression
f a
We first just say that f : 'f
where 'f
is one of those unification variables I mentioned. Next we say that a : 'a
. Since we’re apply f
to a
we can generate the constraints that
'f ~ 'x > 'y
'a ~ 'x
Since we can only apply things with of the form _ > _
. We then unify these constraints to produce f : 'a > 'x
and a : 'a
. We’d then using the surrounding constraints to produce more information about what exactly 'a
and 'x
might be. If this was all the constraints we had we’d then “generalize” 'a
and 'x
to be normal type variables, making our expression have the type x
where f : a > x
and a : a
.
Now onto some specifics
In order to actually talk about type inference we first have to define our language. We have the abstract syntax tree:
type tvar = int
local val freshSource = ref 0 in
fun fresh () : tvar =
!freshSource before freshSource := !freshSource + 1
end
datatype monotype = TBool
 TArr of monotype * monotype
 TVar of tvar
datatype polytype = PolyType of int list * monotype
datatype exp = True
 False
 Var of int
 App of exp * exp
 Let of exp * exp
 Fn of exp
 If of exp * exp * exp
First we have type variables which are globally unique integers. To give us a method for actually producing them we have fresh
which uses a refcell to never return the same result twice. This is probably surprising to Haskellers: SML isn’t purely functional and frankly this is less noisy than using something like monadgen
.
From there we have monotypes. These are normal ML types without any polymorphism. There are type/unification variables, booleans, and functions. Polytypes are just monotypes with an extra forall
at the front. This is where we get polymorphism from. A polytype binds a number of type variables, stored in this representation as an int list. There is one ambiguity here, when looking at a variable it’s not clear whether it’s supposed to be a type variable (bound in a forall) and a unification variable. The idea is that we never ever inspect a type bound under a forall except when we’re converting it to a monotype with fresh unification variables in place of all of the bound variables. Thus, when inferring a type, every variable we come across is a unification variable.
Finally, we have expressions. Aside form the normal constants, we have variables, lambdas, applications, and if. The way we represent variables here is with DeBruijn variables. A variable is a number that tells you how many binders are between it and where it was bound. For example, const
would be written Fn (Fn (Var 1))
in this representation.
With this in mind we define some helpful utility functions. When type checking, we have a context full of information. The two facts we know are
datatype info = PolyTypeVar of polytype
 MonoTypeVar of monotype
type context = info list
Where the ith element of a context indicates the piece of information we know about the ith DeBruijn variable. We’ll also need to substitute a type variable for a type. We also want to be able to find out all the free variables in a type.
fun subst ty' var ty =
case ty of
TVar var' => if var = var' then ty' else TVar var'
 TArr (l, r) => TArr (subst ty' var l, subst ty' var r)
 TBool => TBool
fun freeVars t =
case t of
TVar v => [v]
 TArr (l, r) => freeVars l @ freeVars r
 TBool => []
Both of these functions just recurse over types and do some work at the variable case. Note that freeVars
can contain duplicates, this turns out not to be important in all cases except one: generalizeMonoType
. The basic idea is that given a monotype with a bunch of unification variables and a surrounding context, figure out which variables can be bound up in a polymorphic type. If they don’t appear in the surrounding context, we generalize them by binding them in a new poly type’s forall spot.
fun dedup [] = []
 dedup (x :: xs) =
if List.exists (fn y => x = y) xs
then dedup xs
else x :: dedup xs
fun generalizeMonoType ctx ty =
let fun notMem xs x = List.all (fn y => x <> y) xs
fun free (MonoTypeVar m) = freeVars m
 free (PolyTypeVar (PolyType (bs, m))) =
List.filter (notMem bs) (freeVars m)
val ctxVars = List.concat (List.map free ctx)
val polyVars = List.filter (notMem ctxVars) (freeVars ty)
in PolyType (dedup polyVars, ty) end
Here the bulk of the code is deciding whether or not a variable is free in the surrounding context using free
. It looks at a piece of info to determine what variables occur in it. We then accumulate all of these variables into cxtVars
and use this list to decide what to generalize.
Next we need to take a polytype to a monotype. This is the specialization of a polymorphic type that we love and use when we use map
on a function from int > double
. This works by taking each bound variable and replacing it with a fresh unification variables. This is nicely handled by folds!
fun mintNewMonoType (PolyType (ls, ty)) =
foldl (fn (v, t) => subst (TVar (fresh ())) v t) ty ls
Last but not least, we have a function to take a context and a variable and give us a monotype which corresponds to it. This may produce a new monotype if we think the variable has a polytype.
exception UnboundVar of int
fun lookupVar var ctx =
case List.nth (ctx, var) handle Subscript => raise UnboundVar var of
PolyTypeVar pty => mintNewMonoType pty
 MonoTypeVar mty => mty
For the sake of nice error messages, we also throw UnboundVar
instead of just subscript in the error case. Now that we’ve gone through all of the utility functions, on to unification!
A large part of this program is basically “I’ll give you a list of constraints and you give me the solution”. The program to solve these proceeds by pattern matching on the constraints.
In the empty case, we have no constraints so we give back the empty solution.
fun unify [] = []
In the next case we actually have to look at what constraint we’re trying to solve.
 unify (c :: constrs) =
case c of
If we’re lucky, we’re just trying to unify TBool
with TBool
, this does nothing since these types have no variables and are equal. In this case we just recurse.
(TBool, TBool) => unify constrs
If we’ve got two function types, we just constrain their domains and ranges to be the same and continue on unifying things.
 (TArr (l, r), TArr (l', r')) => unify ((l, l') :: (r, r') :: constrs)
Now we have to deal with finding a variable. We definitely want to avoid adding (TVar v, TVar v)
to our solution, so we’ll have a special case for trying to unify two variables.
 (TVar i, TVar j) =>
if i = j
then unify constrs
else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
This is our first time actually adding something to our solution so there’s several new elements here. The first is this function addSol
. It’s defined as
fun addSol v ty sol = (v, applySol sol ty) :: sol
So in order to make sure our solution is internally consistent it’s important that whenever we add a type to our solution we first apply the solution to it. This ensures that we can substitute a variable in our solution for its corresponding type and not worry about whether we need to do something further. Additionally, whenever we add a new binding we substitute for it in the constraints we have left to ensure we never have a solution which is just inconsistent. This prevents us from unifying v ~ TBool
and v ~ TArr(TBool, TBool)
in the same solution! The actual code for doing this is that substConstr (TVar j) i constrs
bit.
The next case is the general case for unifying a variable with some type. It looks very similar to this one.
 ((TVar i, ty)  (ty, TVar i)) =>
if occursIn i ty
then raise UnificationError c
else addSol i ty (unify (substConstrs ty i constrs))
Here we have the critical occursIn
check. This checks to see if a variable appears in a type and prevents us from making erroneous unifications like TVar a ~ TArr (TVar a, TVar a)
. This occurs check is actually very easy to implement
fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)
Finally we have one last case: the failure case. This is the catchall case for if we try to unify two things that are obviously incompatible.
 _ => raise UnificationError c
All together, that code was
fun applySol sol ty =
foldl (fn ((v, ty), ty') => subst ty v ty') ty sol
fun applySolCxt sol cxt =
let fun applyInfo i =
case i of
PolyTypeVar (PolyType (bs, m)) =>
PolyTypeVar (PolyType (bs, (applySol sol m)))
 MonoTypeVar m => MonoTypeVar (applySol sol m)
in map applyInfo cxt end
fun addSol v ty sol = (v, applySol sol ty) :: sol
fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)
fun unify ([] : constr list) : sol = []
 unify (c :: constrs) =
case c of
(TBool, TBool) => unify constrs
 (TVar i, TVar j) =>
if i = j
then unify constrs
else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
 ((TVar i, ty)  (ty, TVar i)) =>
if occursIn i ty
then raise UnificationError c
else addSol i ty (unify (substConstrs ty i constrs))
 (TArr (l, r), TArr (l', r')) =>
unify ((l, l') :: (r, r') :: constrs)
 _ => raise UnificationError c
The other half of this algorithm is the constraint generation part. We generate constraints and use unify
to turn them into solutions. This boils down to two functoins. The first is to glue together solutions.
fun <+> (sol1, sol2) =
let fun notInSol2 v = List.all (fn (v', _) => v <> v') sol2
val sol1' = List.filter (fn (v, _) => notInSol2 v) sol1
in
map (fn (v, ty) => (v, applySol sol1 ty)) sol2 @ sol1'
end
infixr 3 <+>
Given two solutions we figure out which things don’t occur in the in the second solution. Next, we apply solution 1 everywhere in the second solution, giving a consistent solution wihch contains everything in sol2
, finally we add in all the stuff not in sol2
but in sol1
. This doesn’t check to make sure that the solutions are actually consistent, this is done elsewhere.
Next is the main function here constrain
. This actually generates solution and type given a context and an expression. The first few cases are nice and simple
fun constrain ctx True = (TBool, [])
 constrain ctx False = (TBool, [])
 constrain ctx (Var i) = (lookupVar i ctx, [])
In these cases we don’t infer any constraints, we just figure out types based on information we know previously. Next for Fn
we generate a fresh variable to represent the arguments type and just constrain the body.
 constrain ctx (Fn body) =
let val argTy = TVar (fresh ())
val (rTy, sol) = constrain (MonoTypeVar argTy :: ctx) body
in (TArr (applySol sol argTy, rTy), sol) end
Once we have the solution for the body, we apply it to the argument type which might replace it with a concrete type if the constraints we inferred for the body demand it. For If
we do something similar except we add a few constraints of our own to solve.
 constrain ctx (If (i, t, e)) =
let val (iTy, sol1) = constrain ctx i
val (tTy, sol2) = constrain (applySolCxt sol1 ctx) t
val (eTy, sol3) = constrain (applySolCxt (sol1 <+> sol2) ctx) e
val sol = sol1 <+> sol2 <+> sol3
val sol = sol <+> unify [ (applySol sol iTy, TBool)
, (applySol sol tTy, applySol sol eTy)]
in
(tTy, sol)
end
Notice how we apply each solution to the context for the next thing we’re constraining. This is how we ensure that each solution will be consistent. Once we’ve generated solutions to the constraints in each of the subterms, we smash them together to produce the first solution. Next, we ensure that the subcomponents have the right type by generating a few constraints to ensure that iTy
is a bool and that tTy
and eTy
(the types of the branches) are both the same. We have to carefully apply the sol
to each of these prior to unifying them to make sure our solution stays consistent.
This is practically the same as what the App
case is
 constrain ctx (App (l, r)) =
let val (domTy, ranTy) = (TVar (fresh ()), TVar (fresh ()))
val (funTy, sol1) = constrain ctx l
val (argTy, sol2) = constrain (applySolCxt sol1 ctx) r
val sol = sol1 <+> sol2
val sol = sol <+> unify [(applySol sol funTy,
applySol sol (TArr (domTy, ranTy)))
, (applySol sol argTy, applySol sol domTy)]
in (ranTy, sol) end
The only real difference here is that we generate different constraints: we make sure we’re applying a function whose domain is the same as the argument type.
The most interesting case here is Let
. This implements let generalization which is how we actually get polymorphism. After inferring the type of the thing we’re binding we generalize it, giving us a poly type to use in the body of let. The key to generalizing it is that generalizeMonoType
we had before.
 constrain ctx (Let (e, body)) =
let val (eTy, sol1) = constrain ctx e
val ctx' = applySolCxt sol1 ctx
val eTy' = generalizeMonoType ctx' (applySol sol1 eTy)
val (rTy, sol2) = constrain (PolyTypeVar eTy' :: ctx') body
in (rTy, sol1 <+> sol2) end
We do pretty much everything we had before except now we carefully ensure to apply the solution we get for the body to the context and then to generalize the type with respect to that new context. This is how we actually get polymorphism, it will assign a proper polymorphic type to the argument.
That wraps up constraint generation. Now all that’s left to see if the overall driver for type inference.
fun infer e =
let val (ty, sol) = constrain [] e
in generalizeMonoType [] (applySol sol ty) end
end
So all we do is infer and generalize a type! And there you have it, that’s how ML and Haskell do type inference.
Hopefully that clears up a little of the magic of how type inference works. The next challenge is to figure out how to do type inference on a language with patterns and ADTs! This is actually quite fun, pattern checking involves synthesizing a type from a pattern which needs something like linear logic to handle pattern variables correctly.
With this we’re actually a solid 70% of the way to building a type checker to SML. Until I have more free time though, I leave this as an exercise to the curious reader.
Cheers,
comments powered by Disqus ]]>For the last 3 or so weeks I’ve been writing a bunch of Twelf code for my research (hence my flatlined github punch card). Since it’s actually a lot of fun I thought I’d share a bit about Twelf.
Since Twelf isn’t a terribly well known language it’s worth stating what exactly it is we’re talking about. Twelf is a proof assistant. It’s based on a logic called LF (similarly to how Coq is based on CiC).
Twelf is less powerful than some other proof assistants but by limiting some of its power it’s wonderfully suited to proving certain types of theorems. In particular, Twelf admits true “higher order abstract syntax” (don’t worry if you don’t know what this means) this makes it great for formalizing programming languages with variable bindings.
In short, Twelf is a proof assistant which is very well suited for defining and proving things about programming languages.
It’s much more fun to follow along a tutorial if you actually have a Twelf installation to try out the code. You can download and compile the sources to Twelf with SML/NJ or Mlton. You could also use smackage to get the compiler.
Once you’ve compiled the thing you should be left with a binary twelfserver
. This is your primary way of interacting with the Twelf system. There’s quite a slick Emacs interface to smooth over this process. If you’ve installed twelf into a directory ~/twelf/
all you need is the incantation
(setq twelfroot "~/twelf/")
(load (concat twelfroot "emacs/twelfinit.el"))
Without further ado, let’s look at some Twelf code.
When writing Twelf code we encode the thing that we’re studying, the object language, as a bunch of type families and constructors in Twelf. This means that when we edit a Twelf file we’re just writing signatures.
For example, if we want to encode natural numbers we’d write something like
nat : type.
z : nat.
s : nat > nat.
This is an LF signature, we declare a series of constants with NAME : TYPE.
. Note the period at the end of each declaration. First we start by declaring a type for natural numbers called nat
with nat : type.
Here type
is the base kind of all types in Twelf. Next we go on to declare what the values of type nat
are.
In this case there are two constructors for nat
. We either have zero, z
, or the successor of another value of type nat
, s
. This gives us a canonical forms lemma for natural numbers: All values of type nat
are either
z
s N
for some value N : nat
Later on, we’ll justify the proofs we write with this lemma.
Anyways, now that we’ve encoded the natural numbers I wanted to point out a common point of confusion about Twelf. We’re not writing programs to be run. We’re writing programs exclusively for the purpose of typechecking. Heck, we’re not even writing programs at the term level! We’re just writing a bunch of constants out with their types! More than this even, Twelf is defined so that you can only write canonical forms. This means that if you write something in your program, it has to be in normal form, fully applied! In PL speak it has to be βnormal and ηlong. This precludes actually writing programs for the sake of reducing them. You’re never going to write a web server in Twelf, you even be writing “Hello World”. You might use it to verify the language your writing them in though.
Now that we’ve gotten the awkward bit out the way, let’s now define a Twelf encoding of a judgment. We want to encode the judgment +
which is given by the following rules
—————————
z + n = n
m + n = p
———————————————
s(m) + n = s(p)
In the rest of the world we have this idea that propositions are types. In twelf, we’re worried about defining logics and systems, so we have the metatheoretic equivalent: judgments are types.
So we define a type family plus
.
plus : nat > nat > nat > type
So plus
is a type indexed over 3 natural numbers. This is our first example of dependent types: plus
is a type which depends on 3 terms. Now we can list out how to construct a derivation of plus
. This means that inference rules in a meta theory corresponds to constants in Twelf as well.
plus/z : {n : nat} plus z n n
This is some new syntax, in Twelf {NAME : TYPE} TYPE
is a dependent function type, a pi type. This notation is awfully similar to Agda and Idris if you’re familiar with them. This means that this constructor takes a natural number, n
and returns a derivation that plus z n n
. The fact that the return type depends on what nat
we supply is why this needs a dependent type.
In fact, this is such a common pattern that Twelf has sugar for it. If we write an unbound capital variable name Twelf will automagically introduce a binder {N : ...}
at the front of our type. We can thus write our inference rules as
plus/z : plus z N N
plus/s : plus N M P > plus (s N) M (s P)
These rules together with our declaration of plus
. In fact, there’s something kinda special about these two rules. We know that for any term n : nat
which is in canonical form, there should be an applicable rule. In Twelf speak, we say that this type family is total.
We can ask Twelf to check this fact for us by saying
plus : nat > nat > nat > type.
%mode plus +N +M P.
plus/z : plus z N N.
plus/s : plus N M P > plus (s N) M (s P).
%worlds () (plus _ _ _).
%total (N) (plus N _ _).
We want to show that for all terms n, m : nat
in canonical form, there is a term p
in canonical form so that plus n m p
. This sort of theorem is what we’d call a ∀∃theorem. This is literally because it’s a theorem of the form “∀ something. ∃ something. so that something”. These are the sort of thing that Twelf can help us prove.
Here’s the workflow for writing one of these proofs in Twelf
%mode
specification to say what is bound in the ∀ and what is bound in the ∃.%worlds
, usually we want to say the empty context, ()
%total
where the N
specifies what to induct on.In our case we have a case for each canonical form of nat
so our type family is total. This means that our theorem passes. Hurray!
Believe it or not this is what life is like in Twelf land. All the code I’ve written these last couple of weeks is literally type signatures and 5 occurrences of %total
. What’s kind of fun is how unreasonably effective a system this is for proving things.
Let’s wrap things up by proving one last theorem, if plus A B N
and plus A B M
both have derivations, then we should be able to show that M
and N
are the same. Let’s start by defining what it means for two natural numbers to be the same.
nateq : nat > nat > type.
nateq/r : nateq N N.
nateq/s : nateq N M > nateq (s N) (s M).
I’ve purposefully defined this so it’s amenable to our proof, but it’s still a believable formulation of equality. It’s reflexive and if N
is equal to M
, then s N
is equal to s M
. Now we can actually state our proof.
plusfun : plus N M P > plus N M P' > nateq P P' > type.
%mode plusfun +A +B C.
Our theorem says if you give us two derivations of plus
with the same arguments, we can prove that the outputs are equal. There are two cases we have to cover for our induction so there are two constructors for this type family.
plusfun/z : plusfun plus/z plus/z nateq/r.
plusfun/s : plusfun (plus/s L) (plus/s R) (nateq/s E)
< plusfun L R E.
A bit of syntactic sugar here, I used the backwards arrow which is identical to the normal >
except its arguments are flipped. Finally, we ask Twelf to check that we’ve actually proven something here.
%worlds () (plusfun _ _ _).
%total (P) (plusfun P _ _).
And there you have it, some actual theorem we’ve mechanically checked using Twelf.
I wanted to keep this short, so now that we’ve covered Twelf basics I’ll just refer you to one of the more extensive tutorials. You may be interested in
If you’re interested in learning a bit more about the nice mathematical foundations for LF you should check out “The LF Paper”.
comments powered by Disqus ]]>I write a lot about types. Up until now however, I’ve only made passing references to the thing I’ve actually been studying in most of my free time lately: proof theory. Now I have a good reason for this: the proof theory I’m interested in is undeniably intertwined with type theory and computer science as a whole. In fact, you occasionally see someone draw the triangle
Type Theory
/ \
/ \
Proof Theory  Category Theory
Which nicely summarizes the lay of the land in the world I’m interested in. People will often pick up something will understood on one corner of the triangle and drag it off to another, producing a flurry of new ideas and papers. It’s all very exciting and leads to really cool stuff. I think the most talked about example lately is homotopy type theory which drags a mathematical structure (weak infinite groupoids) and hoists off to type theory!
If you read the [unprofessional, mostly incorrect, and entirely more fun to read] blog posts on these subjects you’ll find most of the lip service is paid to category theory and type theory with poor proof theory shunted off to the side.
In this post, I’d like to jot down my notes on Frank Pfenning’s introduction to proof theory materials to change that in some small way.
The obvious question is just “What is proof theory?”. The answer is that proof theory is the study of proofs. In this world we study proofs as first class mathematical objects which we prove interesting things about. This is the branch of math that formalizes our handwavy notion of a proof into a precise object governed by rules.
We can then prove things like "Given a proof that Γ ⊢ A
and another derivation of Γ, A ⊢ B
, then we can produce a derivation of Γ ⊢ B
. Such a theorem is utterly crazy unless we can formalize what it means to derive something.
From this we grow beautiful little sets of rules and construct derivations with them. Later, we can drag these derivations off to type theory and use them to model all sorts of wonderful phenomena. My most recent personal example was when folks noticed that the rules for modal logic perfectly capture what the semantics of static pointers ought to be.
So in short, proof theory is devoted to answering that question that every single one of your math classes dodged
Professor, what exactly is a proof?
In every logic that we’ll study we’ll keep circling back to two core objects: judgments and propositions. The best explanation of judgments I’ve read comes from Frank Pfenning
A judgment is something we may know, that is, an object of knowledge. A judgment is evident if we in fact know it.
So judgments are the things we’ll structure our logic around. You’ve definitely heard of one judgment: A true
. This judgment signifies whether or not some proposition A
is true. Judgments can be much fancier though: we might have a whole bunch of judgments like n even
, A possible
or A resource
.
These judgments act across various syntactic objects. In particular, from our point of view we’ll understand the meaning of a proposition by the ways we can prove it, that is the proofs that A true
is evident.
We prove a judgment J
through inference rules. An inference rule takes the form
J₁ J₂ .... Jₓ
—————————————
J
Which should be read as “When J₁
, J₂
… and Jₓ
hold, then so does J
”. Here the things above the line are premises and the ones below are conclusions. What we’ll do is define a bunch of these inference rules and use them to construct proofs of judgments. For example, we might have the inference rules
n even
—————— ————————————
0 even S(S(n)) even
for the judgment n even
. We can then form proofs to show that n even
holds for some particular n
.
——————
0 even
————————————
S(S(0)) even
——————————————————
S(S(S(S(0)))) even
This tree for example is evidence that 4 even
holds. We apply second inference rule to S(S(S(S(0))))
first. This leaves us with one premise to show, S(S(0)) even
. For this we repeat the process and end up with the new premise that 0 even
. For this we can apply the first inference rule which has no premises completing our proof.
One judgment we’ll often see is A prop
. It simply says that A
is a well formed proposition, not necessarily true but syntactically well formed. This judgment is defined inductively over the structure of A
. An example judgment would be
A prop B prop
——————————————
A ∧ B prop
Which says that A ∧ B
(A and B) is a well formed proposition if and only if A
and B
are! We can imagine a whole bunch of these rules
A prop B prop
—————— —————— ————————————— ...
⊤ prop ⊥ prop A ∨ B prop
that lay out the propositions of our logic. This doesn’t yet tell us how prove any of these propositions to be true, but it’s a start. After we formally specify what sentences are propositions in our logic we need to discuss how to prove that one is true. We do this with a different judgment A true
which is once again defined inductively.
For example, we might want to give meaning to the proposition A ∧ B
. To do this we define its meaning through the inference rules for proving that A ∧ B true
. In this case, we have the rule
A true B true
—————————————— (∧ I)
A ∧ B true
I claim that this defines the meaning of ∧
: to prove a conjunction to be true we must prove its left and right halves. The rather prooftheoretic thing we’ve done here is said that the meaning of something is what we use to prove it. This is sometimes called the “verificationist perspective”. Finally, note that I annotated this rule with the name ∧ I
simply for convenience to refer it.
Now that we know what A ∧ B
means, what does have a proof of it imply? Well we should be able to “get out what we put in” which would mean we’d have two inference rules
A ∧ B true A ∧ B true
—————————— ——————————
A true B true
We’ll refer to these rules as ∧ E1
and ∧ E2
respectively.
Now for a bit of terminology, rules that let us “introduce” new proofs of propositions are introduction rules. Once we have a proof, we can use it to construct other proofs. The rules for how we do that are called elimination rules. That’s why I’ve been adding I’s and E’s to the ends of our rule names.
How do we convince ourselves that these rules are correct with respect to our understanding of ∧
? This question leads us to our first sort of proofsaboutproofs we’ll make.
What we want to say is that the introduction and elimination rules match up. This should mean that anytime we prove something with an by an introduction rule followed by an elimination rule, we should be able to rewrite to avoid this duplication. This also hints that the rules aren’t too powerful: we can’t prove anything with the elimination rules that we didn’t have a proof for at some point already.
For ∧
this proof looks like this
D E
– –
A B D
—————— ∧I ⇒ ————
A ∧ B A
—————— ∧E 1
A
So whenever we introduce a ∧ and then eliminate it with ∧ E1
we can always rewrite our proof to not use the elimination rules. Here notice that D and E range over derivations in this proof. They represent a chain of rule applications that let us produce an A
or B
in the end. Note I got a bit lazy and started omitting the true
judgments, this is something I’ll do a lot since it’s mostly unambiguous.
The proof for ∧E2
is similar.
D E
– –
A B E
————— ∧I ⇒ ————
A ∧ B B
————— ∧E 2
B
Given this we say that the elimination rules for ∧ are “locally sound”. That is, when used immediately after an elimination rule they don’t let us produce anything truly new.
Next we want to show that if we have a proof of A ∧ B
, the elimination rules give us enough information that we can pick the proof apart and produce a reassembled A ∧ B
.
D D
————– ————–
D A ∧ B A ∧ B
————— ⇒ —————∧E1 ——————∧E2
A ∧ B A B
———————————————— ∧I
A ∧ B
This somewhat confusion derivation takes our original proof of A ∧ B
and pulls it apart into proof of A
and B
and uses these to assemble a new proof of A ∧ B
. This means that our elimination rules give us all the information we put in so we say their locally complete.
The two of these properties combined, local soundness and completeness are how we show that an elimination rule is balanced with its introduction rule.
If you’re more comfortable with programming languages (I am) our local soundness property is equivalent to stating that
fst (a, b) ≡ a
snd (a, b) ≡ b
And local completeness is that
a ≡ (fst a, snd a)
The first equations are reductions and the second is expansion. These actually correspond the eta and beta rules we expect a programming language to have! This is a nice little example of why proof theory is useful, it gives a systematic way to define some parts of the behavior of a program. Given the logic a programming language gives rise to we can double check that all rules are locally sound and complete which gives us confidence our language isn’t horribly broken.
Before I wrap up this post I wanted to talk about one last important concept in proof theory: judgments with hypotheses. This is best illustrated by trying to write the introduction and elimination rules for “implies” or “entailment”, written A ⊃ B
.
Clearly A ⊃ B
is supposed to mean we can prove B true
assume A true
to be provable. In other words, we can construct a derivation of the form
A true
——————
.
.
.
——————
B true
We can notate our rules then as
—————— u
A true
——————
.
.
.
——————
B true A ⊃ B A
—————————— u ——————————
A ⊃ B true B true
This notation is a bit clunky, so we’ll opt for a new one: Γ ⊢ J
. In this notation Γ
is some list of judgments we assume to hold and J
is the thing we want to show holds. Generally we’ll end up with the rule
J ∈ Γ
—————
Γ ⊢ J
Which captures the fact that Γ contains assumptions we may or may not use to prove our goal. This specific rule may vary depending on how we want express how assumptions work in our logic (substructural logics spring to mind here). For our purposes, this is the most straightforward characterization of how this ought to work.
Our hypothetical judgments come with a few rules which we call “structural rules”. They modify the structure of judgment, rather than any particular proposition we’re trying to prove.
Weakening
Γ ⊢ J
—————————
Γ, Γ' ⊢ J
Contraction
Γ, A, A, Γ' ⊢ J
———————————————
Γ, A, Γ' ⊢ J
Exchange
Γ' = permute(Γ) Γ' ⊢ A
————————————————————————
Γ ⊢ A
Finally, we get a substitution principle. This allows us to eliminate some of the assumptions we made to prove a theorem.
Γ ⊢ A Γ, A ⊢ B
————————————————
Γ ⊢ B
These 5 rules define meaning to our hypothetical judgments. We can restate our formulation of entailment with less clunky notation then as
A prop B prop
——————————————
A ⊃ B prop
Γ, A ⊢ B Γ ⊢ A ⊃ B Γ ⊢ A
————————— ——————————————————
Γ ⊢ A ⊃ B Γ ⊢ B
One thing in particular to note here is that entailment actually internalizes the notion of hypothetical judgments into our logic. This the aspect of it that made it behave so differently then the other connectives we looked at.
As an exercise to the reader: prove the local soundness and completeness of these rules.
In this post we’ve layed out a bunch of rules and I’ve hinted that a bunch more are possible. When put together these rules define a logic using “natural deduction”, a particular way of specifying proofs that uses inference rules rather than axioms or something entirely different.
Hopefully I’ve inspired you to poke a bit further into proof theory, in that case I heartily recommend Frank Pfenning’s lectures at the Oregon Summer School for Programming Languages.
Cheers,
comments powered by Disqus ]]>For those who haven’t heard, GHC 7.10 is making a brave foray into the exciting world of distributed computing. To this end, it’s made a new language extension called XStaticPointers
to support Cloud Haskell in a pleasant, first class manner.
If you haven’t heard of static pointers before now, it’s worth glancing through the nice tutorial from ocharles’ 24 days of $(Haskell Related Thing).
The long and short of it is that XStaticPointers
gives us this new keyword static
. We apply static to an expression and if there are no closured variables (to be formalized momentarily) then we get back a StaticPtr a
. This gives us a piece of data that we can" serialize and ship over the wire because it has no dependencies.
Now to expand upon this “no closured variables”. A thing can only be fed to static
if the free variables in the expression are all top level variables. This forbids us from writing something like
Now in all honesty, I’m not super interested in Cloud Haskell. It’s not my area of expertise and I’m already terrified of trying to do things on one machine. What does interest me a lot though is this notion of having “I have no free variables” in the type of an expression. It’s an invariant we didn’t really have before in Haskell.
In fact, as I looked more closely it reminded me of something called box from modal logic.
I’m not trying to give you a full understanding of modal logic, just a brief taste.
Modal logic extends our vanilla logic (in Haskell land this is constructive logic) with modalities. Modalities are little prefixes we tack on the front of a proposition to qualify its meaning slightly.
For example we might say something like
If it is possible that it is raining, then I will need an umbrella.
Here we used then modality possible
to indicate we’re not assuming that it is snowing, only that it’s conceivable that it is. Because I’m a witch and will melt in the rain even the possibility of death raining from the sky will force me to pack my umbrella.
To formalize this a bit, we have our inductive definition of propositions
P = ⊥
 ⊤
 P ∧ P
 P ∨ P
 P ⇒ P
 □ P
This is the syntax of a particular modal logic with one modality. Everything looks quite normal up until the last proposition form, which is the “box” modality applied to some proposition.
The box modality (the one we really care about for later) means “necessarily”. I almost think of it is a truthier truth if you can buy that. □ forbids us from using any hypotheses saying something like A is true
inside of it. Since it represents a higher standard of proof we can’t use the weaker notion that A is true
! The rule for creating a box looks like this to the first approximation
• ⊢ A
———————
Γ ⊢ □ A
So in order to prove a box something under a set of assumptions Γ, we have to prove it assuming none of those assumptions. In fact, we find that this is a slightly overly restrictive form for this judgment, we know that if we have a □ A
we proved it without assumptions so if we have to introduce a □ B
we should be able to use the assumption that A is true
for this proof because we know we can construct one without any assumptions and could just copy paste that in.
This causes us to create a second context, one of the hypotheses that A is valid
, usually notated with a Δ. We then get the rules
Δ; • ⊢ A Δ; Γ ⊢ A valid A valid ∈ Δ
——————————————— —————————————— ———————————————
Δ; Γ ⊢ □ A true Δ; Γ ⊢ A true Δ; Γ ⊢ A valid
Δ; Γ ⊢ □ A Δ, A valid; Γ ⊢ B
——————————————————————————————
Δ; Γ ⊢ B
What you should take away from these scary looking symbols is
A valid
is much stronger than A true
□ A true
is the same as A valid
.This presentation glosses over a fair amount, if your so inclined I’d suggest looking at Frank Pfenning’s lecture notes from his class entitled “Modal Logic”. These actually go at a reasonable pace and introduce the groundwork for someone who isn’t super familiar with logic.
Now that we’ve established that there is an interesting theoretical backing for modal logic, I’m going to drop it on the floor and look at what Haskell actually gives us.
Okay, so how does this pertain to StaticPtr
? Well I noticed that just like how box drops hypotheses that are “merely true”, static
drops variables that are introduced by our local context!
This made me think that perhaps StaticPtr
s are a useful equivalent to the □ modality! This shouldn’t be terribly surprising for PL people, indeed the course I linked to above expressly mentions □ to notate “movable code”. What’s really exciting about this is that there are a lot more applications of □ then just movable code! We can use it to notate staged computation for example.
Alas however, it was not to be. Static pointers are missing one essential component that makes them unsuitable for being □, we can’t eliminate them properly. In modal logic, we have a rule that lets other boxes depend on the contents of some box. The elimination rule is much stronger than just “If you give me a □ A
, I’ll give you an A
” because it’s much harder to construct a □ A
in the first place! It’s this asymmetry that makes static pointers not quite kosher. With static pointers there isn’t a notion that we can take one static pointer and use it in another.
For example, we can’t write
applyS :: StaticPtr a > StaticPtr (a > b) > StaticPtr b
applyS sa sf = static (deRefStaticPtr sf (deRefStaticPtr sa))
My initial reaction was that XStaticPointers
is missing something, perhaps a notion of a “static pattern”. This would let us say something like
applyS :: StaticPtr a > StaticPtr (a > b) > StaticPtr b
applyS sa sf =
let static f = sf
static a = sa
in static (f a)
So this static
pattern in a keyword would allow us to hoist a variable into the realm of things we’re allowed to leave free in a static pointer.
This makes sense from my point of view, but less so from that of Cloud Haskell. The whole point of static pointers is to show a computation is dependency free after all, static patterns introduce a (limited) set of dependencies on the thunk that make our lives complicated. It’s not obvious to me how to desugar things so that static patterns can be compiled how we want them to be, it looks like it would require some runtime code generation which is a nono for Haskell.
My next thought was that maybe Closure
was the answer, but that doesn’t actually work either! We can introduce a closure from an arbitrary serializable term which is exactly what we don’t want from a model of □! Remember, we want to model closed terms so allowing us to accept an arbitrary term defeats the point.
It’s painfully clear that StaticPtr
s are very nearly □s, but not quite! Whatever Box
ends up being, we’d want the following interface
data Box a
intoBox :: StaticPtr a > Box a
closeBox :: Box (a > b) > Box a > Box b
outBox :: Box a > a
The key difference from StaticPtr
’s being closeBox
. Basically this gives us a way to say “I have something that’s closed except for one dependency” and we can fill that dependency with some other closed term.
This turns something like
into
If you read the tutorial, you’ll notice that this is most of the implementation of Closure
! Following our noses we define
This is literally the dumbest implementation of Box
I think is possible, but it actually works just fine.
intoBox = Pure
closeBox = Close
outBox :: Box a > a
outBox (Pure a) = deRefStaticPtr a
outBox (Close f a) = outBox f (outBox a)
which would seem to be modal logic in Haskell.
To be honest, I’m not sure yet how this is useful. I’m kinda swamped with coursework at the moment (new semester at CMU) but it seems like a new and fun thing to play with.
I’ve stuck the code at jozefg/modal if you want to play with it. Fair warning that it only compiles with GHC >= 7.10 because we need static pointers.
Finally, since the idea of modalities for sendable code is not a new one, I should leave these links
Cheers.
comments powered by Disqus ]]>Continuing on my quest of writing about my poorly thought out comments, let’s talk about constructive logic. A lot of people in and around the Haskell/FP community will make statements like
The CurryHoward isomorphism means that you’re proving things in constructive logic.
Usually absent from these remarks is a nice explanation of why constructive logic matches up with the programming we know and love.
In this post I’d like to highlight what constructive logic is intended to capture and why this corresponds so nicely with programming.
First things first, let’s discuss the actual origin of constructive logic. It starts with a mathematician and philosopher named Brouwer. He was concerned trying to give an answer to the question “What does it mean to know something to be true” where something is defined as a mathematical proposition.
He settled on the idea of proof being a sort of subjective and personal thing. I know something is true if and only if I can formulate some intuitive proof of it. When viewed this way, the proof I scribble down on paper doesn’t actually validate something’s truthfulness. It’s merely a serialization of my thought process for validating its truthfulness.
Notice that this line of reasoning doesn’t actually specify a precise definition of what verifying something intuitively means. I interpret this idea as something slightly more meta then any single formal system. Rather, when looking a formal system, you ought to verify that its axioms are admissible by your own intuition and then you may go on to accept proofs built off of these axioms.
Now after Brouwer started talking about these ideas Arend Heyting decided to try to write down a logic that captured this notion of “proof is intuition”. The result was this thing called intuitionistic logic. This logic is part of a broader family of logics called “constructive logics”.
The core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version. In a classical logic each proposition is either true or false, regardless of what we know about it.
In our new constructive system, a formula cannot be assigned either until we have direct evidence of it. It’s not that there’s a magical new boolean value, {true, false, idon’tknow}, it’s just not a meaningful question to ask. It doesn’t make sense in these logics to say “A
is true” without having a proof of A
. There isn’t necessarily this Platonic notion of truthfulness, just things we as logicians can prove. This is sometimes why constructive logic is called “logic for humans”.
The consequences of dealing with things in this way can be boils down to a few things. For example, we now know that
∃x. A(x)
can be proven, then there is some term which we can readily produce t
so that A(t)
is provableA ∨ B
can be proven then either A
or B
is provable and we know which. (note that ∨ is the symbol for OR)These make sense when you realize that ∃x. A(x)
can only be proven if we have a direct example of it. We can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. We actually need to introduce it by proving an example of it. When our logic enforces this of course we can produce that example!
The same goes for A ∨ B
, in our logic the only way to prove A ∨ B
is to either provide a proof of A
or provide a proof of B
. If this is the only way to build a ∨
we can always just point to how it was introduced!
If we extend this to and, ∧
: The only way to prove A ∧ B
is to prove both A
and B
. If this is the only way to get to a proof of A ∧ B
then of course we can get a proof of A
from A ∧ B
. ∧
is just behaving like a pair of proofs.
All of this points at one thing: our logic is structured so that we can only prove something when we directly prove it, that’s the spirit of Brouwer’s intuitionism that we’re trying to capture.
There are a lot of different incarnations of constructive logic, in fact pretty much every logic has a constructive cousin. They all share this notion of “We need a direct proof to be true” however. One thing to note that is that some constructive logics conflict a bit with intuitionism. While intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. For example both Markov’s principle and Church’s thesis state something about all computable functions. While they may be reasonable statements we can’t give a satisfactory proof for them. This is a little confusing I know and I’m only going to talk about constructive logics that Brouwer would approve of.
I encourage the curious reader to poke further at this, it’s rather cool math.
Now while constructive logic probably sounds reasonable, if weird, it doesn’t immediately strike me as particularly useful! Indeed, the main reason why computer science cares about constructivism is because we all use it already.
To better understand this, let’s talk about the CurryHoward isomorphism. It’s that thing that wasn’t really invented by either Curry or Howard and some claim isn’t best seen as an isomorphism, naming is hard. The CurryHoward isomorphism states that there’s a mapping from a type to a logical proposition and from a program to a proof.
To show some of the mappings for types
CH(Either a b) = CH(a) ∨ CH(b)
CH((a, b)) = CH(a) ∧ CH(b)
CH( () ) = ⊤  True
CH(Void) = ⊥  False
CH(a > b) = CH(a) → CH(b)
So a program with the type (a, b)
is really a proof that a ∧ b
is true. Here the truthfulness of a proposition really means that the corresponding type can be occupied by a program.
Now, onto why this logic we get is constructive. Recall our two conditions for a logic being constructive, first is that if ∃x. A(x)
is provable then there’s a specific t
where A(t)
is provable.
Under the Curry Howard isomorphism, ∃ is mapped to existential types (I wonder how that got its name :). That means that a proof of ∃x. A(x)
is something like
 Haskell ex. syntax is a bit gnaryl :/
data Exists f = forall x. Exists f x
ourProof :: Exists F
ourProof = ...
Now we know the only way to construct an Exists F
is to use the constructor Exists
. This constructor means that there is at least one specific type for which we could prove f x
. We can also easily produce this term as well!
We can always access the specific “witness” we used to construct this Exists
type with pattern matching.
The next law is similar. If we have a proof of a ∨ b
we’re supposed to immediately be able to produce a proof of a
or a proof of b
.
In programming terms, if we have a program Either a b
we’re supposed to be able to immediately tell whether this returns Right
or Left
! We can make some argument that one of these must be possible to construct but we’re not sure which since we have to be able to actually run this program! If we evaluate a program with the type Either a b
we’re guaranteed to get either Left a
or Right b
.
There are a few explanations of constructive logic that basically describe it as “Classical logic  the law of excluded middle”. More verbosely, a constructive logic is just one that forbids
∀ A. A ∨ ¬ A
being provable (the law of excluded middle, LEM)∀ A. ¬ (¬ A) → A
being provable (the law of double negation)I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. Indeed this is not uncommon when working in Coq or Agda. They’re just not a readily available tool. Looking at them, this should be apparent as they both let us prove something without directly proving it.
This isn’t really a defining aspect of constructivism, just a natural consequence. If we need a proof of A
to show A
to be true if we admit A ∨ ¬ A
by default it defeats the point. We can introduce A
merely by showing ¬ (¬ A)
which isn’t a proof of A
! Just a proof that it really ought to be true.
In programming terms this is saying we can’t write these two functions.
data Void
doubleNeg :: ((a > Void) > Void) > a
doubleNeg = ...
lem :: Either a (a > Void)
lem = ...
For the first one we have to choices, either we use this (a > Void) > Void
term we’re given or we construct an a
without it. Constructing an arbitrary a
without the function is just equivalent to forall a. a
which we know to be unoccupied. That means we have to use (a > Void) > Void
which means we have to build an a > Void
. We have no way of doing something interesting with that supplied a
however so we’re completely stuck! The story is similar with lem
.
In a lot of ways this definition strikes me in the same way that describing functional programming as
Oh it’s just programming where you don’t have variables or objects.
Or static typing as
It’s just dynamic typed programming where you can’t write certain correct programs
I have a strong urge to say “Well.. yes but no!”.
Hopefully this helps clarify what exactly people mean when they say Haskell corresponds to a constructive logic or programs are proofs. Indeed this constructivism gives rise to a really cool thing called “proof relevant mathematics”. This is mathematics done purely with constructive proofs. One of the latest ideas to trickle from mathematics to computers is homotopy type theory where we take a proof relevant look at identity types.
Before I wrap up I wanted to share one funny little thought I heard. Constructive mathematics has found a home in automated proof systems. Imagine Brouwer’s horror at hearing we do “intuitionist” proofs that no one will ever look at or try to understand beyond some random mechanical proof assistant!
Thanks to Jon Sterling and Darryl McAdams for the advice and insight
comments powered by Disqus ]]>I was having lunch with a couple of Haskell programmers the other day and the subject of the ML family came up. I’ve been writing a lot of ML lately and mentioned that I thought *ML was well worth learning for the average Haskeller. When pressed why the best answer I could come up with was “Well.. clean language, Oh! And an awesome module system” which wasn’t my exactly most compelling response.
I’d like to outline a bit of SML module system here to help substantiate why looking at an ML is A Good Thing. All the code here should be translatable to OCaml if that’s more your taste.
In ML languages modules are a well thought out portion of the language. They aren’t just “Oh we need to separate these names… modules should work”. Like any good language they have methods for abstraction and composition. Additionally, like any good part of an ML language, modules have an expressive type language for mediating how composition and abstraction works.
So to explain how this module system functions as a whole, we’ll cover 3 subjects
Giving a cursory overview of what each thing is and how it might be used.
Structures are the values in the module language. They are how we actually create a module. The syntax for them is
struct
fun flip f x y = f y x
datatype 'a list = Con of ('a * 'a list)  Nil
...
end
A quick note to Haskellers, in ML types are lower case and type variables are written with ’s. Type constructors are applied “backwards” so List a
is 'a list
.
So they’re just a bunch of a declarations stuffed in between a struct
and end
. This is a bit useless if we can’t bind it to a name. For that there’s
structure M = struct val x = 1 end
And now we have a new module M
with a single member, x : int
. This is just like binding a variable in the term language except a “level up” if you like. We can use this just like you would use modules in any other language.
val x' = M.x + 1
Since struct ... end
can contain any list of declarations we can nest module bindings.
structure M' =
struct
structure NestedM = M
end
And access this using .
.
val sum = M'.NestedM.x + M.x
As you can imagine, it would get a bit tedious if we needed to .
our way to every single module access. For that we have open
which just dumps a module’s exposed contents into our namespace. What’s particularly interesting about open
is that it is a “normal” declaration and can be nested with let
.
fun f y =
let open M in
x + y
end
OCaml has gone a step further and added special syntax for small opens. The “local opens” would turn our code into
This already gives us a lot more power than your average module system. Structures basically encapsulate what we’d expect in a module system, but
Up next is a look at what sort of type system we can impose on our language of structures.
Now for the same reason we love types in the term language (safety, readability, insertsemireligiousrant) we’d like them in the module language. Happily ML comes equipped with a feature called signatures. Signature values look a lot like structures
sig
val x : int
datatype 'a list = Cons of ('a * 'a list)  Nil
end
So a signature is a list of declarations without any implementations. We can list algebraic data types, other modules, and even functions and values but we won’t provide any actual code to run them. I like to think of signatures as what most documentation rendering tools show for a module.
As we had with structures, signatures can be given names.
signature MSIG = sig val x : int end
On their own signatures are quite useless, the whole point is that we can apply them to modules after all! To do this we use :
just like in the term language.
structure M : MSIG = struct val x = 1 end
When compiled, this will check that M
has at least the field x : int
inside its structure. We can apply signatures retroactively to both module variables and structure values themselves.
structure M : MSIG = struct val x = 1 end : MSIG
One interesting feature of signatures is the ability to leave certain types abstract. For example, when implementing a map the actual implementation of the core data type doesn’t belong in the signature.
signature MAP =
sig
type key
type 'a table
val empty : 'a table
val insert : key > 'a > 'a table > 'a table
val lookup : key > 'a table > 'a option
end
Notice that the type of keys and tables are left abstract. When someone applies a signature they can do so in two ways, weak or strong ascription. Weak ascription (:
) means that the constructors of abstract types are still accessible, but the signature does hide all unrelated declarations in the module. Strong ascription (:>
) makes the abstract types actually abstract.
Every once in a while we need to modify a signature. We can do this with the keywords where type
. For example, we might implement a specialization of MAP
for integer keys and want our signature to express this
structure IntMap :> MAP where type key = int =
struct ... end
This incantation leaves the type of the table abstract but specializes the keys to an int.
Last but not least, let’s talk about abstraction in module land.
Last but not least let’s talk about the “killer feature” of ML module systems: functors. Functors are the “lifting” of functions into the module language. A functor is a function that maps modules with a certain signature to functions of a different signature.
Jumping back to our earlier example of maps, the equivalent in Haskell land is Data.Map
. The big difference is that Haskell gives us maps for all keys that implement Ord
. Our signature doesn’t give us a clear way to associate all these different modules, one for each Ord
erable key, that are really the same thing. We can represent this relationship in SML with
signature ORD =
sig
type t
val compare : t * t > order
end
functor RBTree (O : ORD) : MAP where type key = O.t =
struct
open O
....
end
Which reads as “For any module implementing Ord
, I can give you a module implementing MAP
which keys of type O.t
”. We can then instantiate these
structure IntOrd =
struct
type t = int
val compare = Int.compare end
end
structure IntMap = RBTree(IntOrd)
Sadly SML’s module language isn’t higher order. This means we can’t assign functors a type (there isn’t an equivalent of >
) and we can’t pass functors to functors. Even with this restriction functors are tremendously useful.
One interesting difference between SML and OCaml is how functors handle abstract types. Specifically, is it the case that
F(M).t = F(M).t
In SML the answer is (surprisingly) no! Applying a functor generates brand new abstract types. This is actually beneficial when you remember SML and OCaml aren’t pure. For example you might write a functor for handling symbol tables and internally use a mutable symbol table. One nifty trick would be to keep of type of symbols abstract. If you only give back a symbol upon registering something in the table, this would mean that all symbols a user can supply are guaranteed to correspond to some entry.
This falls apart however if functors are extensional. Consider the following REPL session
> structure S1 = SymbolTable(WhateverParameters)
> structure S2 = SymbolTable(WhateverParameters)
> val key = S1.register "I'm an entry"
> S2.lookup key
Error: no such key!
This will not work if S1
and S2
have separate key types.
To my knowledge, the general conclusion is that generative functors (ala SML) are good for impure code, but applicative functors (ala OCaml and BackPack) really shine with pure code.
We’ve covered a lot of ground in this post. This wasn’t an exhaustive tour of every feature of ML module systems, but hopefully I got the jist across.
If there’s one point to take home: In a lot of languages modules are clearly a bolted on construction. They’re something added on later to fix “that library problem” and generally consist of the same “module <> file” and “A module imports others to bring them into scope”. In ML that’s simply not the case. The module language is a rich, well thought out thing with it’s own methods of abstraction, composition, and even a notion of types!
I wholeheartedly recommend messing around a bit with OCaml or SML to see how having these things impacts your thought process. I think you’ll be pleasantly surprised.
comments powered by Disqus ]]>In keeping with the rest of the “Examining Hackage” series I’d like to go through the source folds
package today. We’ll try to go through most of the code in an attempt to understand what exactly folds
does and how it does it. To be honest, I hadn’t actually heard of this one until someone mentioned it to me on /r/haskell but it looks pretty cool. It also has the word “comonadic” in the description, how can I resist?
It’s similar to Gabriel’s foldl
library, but it also seems to provide a wider suite of types folds. In retrospect, folds has a general framework for talking about types of folds and composing them where as foldl
defines only 2 types of folds, but defines a whole heap of prebuilt (left) folds.
After grabbing the source and looking at the files we see that folds
is actually reasonable large
~$ cabal get folds && cd folds0.6.2 && ag g "hs$"
src/Data/Fold.hs
src/Data/Fold/L.hs
src/Data/Fold/L'.hs
src/Data/Fold/Class.hs
src/Data/Fold/M1.hs
src/Data/Fold/L1.hs
src/Data/Fold/R.hs
src/Data/Fold/Internal.hs
src/Data/Fold/L1'.hs
src/Data/Fold/R1.hs
src/Data/Fold/M.hs
Setup.lhs
tests/hlint.hs
One that jumps out at me is Internal
since it likely doesn’t depend on anything. We’ll start there.
Looking at the top gives a hint for what we’re in for
{# LANGUAGE FlexibleContexts #}
{# LANGUAGE UndecidableInstances #}
{# LANGUAGE ScopedTypeVariables #}
{# LANGUAGE DeriveDataTypeable #}
module Data.Fold.Internal
( SnocList(..)
, SnocList1(..)
, List1(..)
, Maybe'(..), maybe'
, Pair'(..)
, N(..)
, Tree(..)
, Tree1(..)
, An(..)
, Box(..)
) where
This module seems to be mostly a bunch of (presumably useful) data types + their instances for Foldable
, Functor
, and Traversable
. Since all 3 of these are simple enough you can actually just derive them I’ll elide them in most cases.
First up is SnocList
, if the name didn’t give it away it is a backwards list (snoc is cons backwards)
Then we have the boilerplatey instances for Functor
and Foldable
. What’s a bit odd is that both foldl
and foldMap
are implemented where we only need foldl
. Presumably this is because just foldMap
gives worse performance but that’s a little disappointing.
Next is SnocList1
and List1
which are quite similar.
data SnocList1 a = Snoc1 (SnocList1 a) a  First a
deriving (Eq,Ord,Show,Read,Typeable,Data)
data List1 a = Cons1 a (List1 a)  Last a
If you’ve never seen this before, notice how instead of Nil
we have a constructor which requires an element. This means that no matter how we construct a list we need to supply at least element. Among other things this means that head
would be safe.
We also have a couple strict structures. Notice that these cannot be functors since they break fmap f . fmap g = fmap (f . g)
(why?). We have
And we have the obvious instance for Foldable Maybe'
and Monoid (a, b)
. Now it may seem a little silly to define these types, but from experience I can say anything that makes strictness a bit more explicit is wonderfully helpful. Now we can just use seq
on a Pair'
and know that both components will be forced.
Next we define a type for trees. One thing I noticed was the docs mentioned that this type reflects the structure of a foldMap
When we foldMap
each One
should be an element of the original collection. From there we can fmap
with the map
part of foldMap
, and we can imagine traversing the tree and replacing Two l r
with l <> r
, each Zero
with mempty
, and each One a
with a.
So that’s rather nifty. On top of this we have Foldable
, Traversable
, and Functor
instances.
We also have Tree1
which is similar but elides the Zero
As you’d expect, this implements the same type classes as Tree
.
Now is where things get a bit weird. First up is a type for reifying monoids using reflection
. I actually was thinking about doing a post on it and then I discovered Austin Seipp has done an outstanding one. So we have this N
type with the definition
Now with reflection there are two key components, there’s the type class instance floating around and a fresh type s
that keys it. If we have s
then we can easily demand a specific instance with reflect (Proxy :: Proxy s)
. That’s exactly what we do here. We can create a monoid instance using this trick with
instance Reifies s (a > a > a, a) => Monoid (N a s) where
mempty = N $ snd $ reflect (Proxy :: Proxy s)
mappend (N a) (N b) = N $ fst (reflect (Proxy :: Proxy s)) a b
So at each point we use our s
to grab the tuple of monoid operations we expect to be around and use them in the obvious manner. The only reason I could imagine doing this is if we had a structure which we want to use as a monoid in a number of different ways. I suppose we also could have just passed the dictionary around but maybe this was extremely ugly. We shall see later I suppose.
Last comes two data types I do not understand at all. There’s An
and Box
. The look extremely boring.
Their instances are the same everywhere as well.. I have no clue what these are for. Grepping shows they are used though so hopefully this mystery will become clearer as we go.
Going in order of the module DAG gives us Data.Fold.Class.hs
. This exports two type classes and one function
One thing that worries me a little is that this imports Control.Lens
which I don’t understand nearly as well as I’d like to.. We’ll see how this turns out.
Our first class is
class Choice p => Scan p where
prefix1 :: a > p a b > p a b
postfix1 :: p a b > a > p a b
run1 :: a > p a b > b
interspersing :: a > p a b > p a b
So right away we notice this is a subclass of Choice
which is in turn a subclass of Profunctor
. Choice
captures the ability to pull an Either
through our profunctor.
Note that we can’t do this with ordinary profunctors since we’d need a function from Either a c > a
which isn’t complete.
Back to Scan p
. Scan p
takes a profunctor which apparently represents our folds. We then can prefix the input we supply, postfix the input we supply, and run our fold on a single element of input. This is a bit weird to me, I’m not sure if the intention is to write something like
foldList :: Scan p => [a] > p a b > b
foldList [x] = run1 x
foldList (x : xs) = foldList xs . prefix1 x
or something else entirely. Additionally this doesn’t really conform to my intuition of what a scan is. I’d expect a scan to produce all of the intermediate output involved in folding. At this point, with no instances in scope, it’s a little tricky to see what’s supposed to be happening here.
There are a bunch of defaultsignature based implementations of these methods if your type implements Foldable
. Since this is the next type class in the module let’s look at that and then skip back to the defaults.
class Scan p => Folding p where
prefix :: Foldable t => t a > p a b > p a b
prefixOf :: Fold s a > s > p a b > p a b
postfix :: Foldable t => p a b > t a > p a b
postfixOf :: Fold s a > p a b > s > p a b
run :: Foldable t => t a > p a b > b
runOf :: Fold s a > s > p a b > b
filtering :: (a > Bool) > p a b > p a b
At this point I looked at a few of the types and my first thought was “Oh dammit lens..” but it’s actually not so bad! The first thing to do is ignore the *Of
functions which work across lens’s Fold
type. There seems to be a nice pair for each “running” function where it can work across a Foldable
container or lens’s notion of a fold.
prefix :: Foldable t => t a > p a b > p a b
postfix :: Foldable t => p a b > t a > p a b
run :: Foldable t => t a > p a b > b
The first two functions let us create a new fold that will accept some input and supplement it with a bunch of other inputs. prefix
gives the supplemental input followed by the new input and postfix
does the reverse. We can actually supply input and run the whole thing with run
.
All of these are defined with folded
from lens which reifies a foldable container into a Fold
. so foo = fooOf folded
is the default implementation for all of these. Now for the corresponding fold functions I’m reading them as “If you give me a lens to treat s
as a container that I can get elements from and a fold, I’ll feed the elements of s
into the fold.”
The types are tricky, but this type class seems to capture what it means to run a fold across some type of structure.
Now that we’ve seen how An
comes in handy. It’s used as a single object Foldable
container. Since it’s newtyped, this should basically run the same as just passing a single element in.
So a Scan
here apparently means a fold over a single element at a time. Still not sure why this is deserving of the name Scan
but there you are.
Last but not least we have a notion of dragging a fold through an optic with beneath
.
beneath :: Profunctor p => Optic p Identity s t a b > p a b > p s t
beneath l f = runIdentity #. l (Identity #. f)
Those #.
’s are like lmap
s but only work when the function we apply is a “runtime identity”. Basically this means we should be able to tell whether or not we applied the function or just used unsafeCoerce
when running the code. Otherwise all we do is set up our fold f
to work across Identity
and feed it into the optic.
Now a lot of the rest of the code is implementing those two type classes we went over. To figure out where all these implementations are I just ran
~$ cabal repl
> :info Scan
....
instance Scan R1  Defined at src/Data/Fold/R1.hs:25:10
instance Scan R  Defined at src/Data/Fold/R.hs:27:10
instance Scan M1  Defined at src/Data/Fold/M1.hs:25:10
instance Scan M  Defined at src/Data/Fold/M.hs:33:10
instance Scan L1'  Defined at src/Data/Fold/L1'.hs:24:10
instance Scan L1  Defined at src/Data/Fold/L1.hs:25:10
instance Scan L'  Defined at src/Data/Fold/L'.hs:33:10
instance Scan L  Defined at src/Data/Fold/L.hs:33:10
Looking at the names, I really don’t want to go through each of these with this much detail. Instead I’ll skip all the *1
’s and go over R
, L'
, and M
to get a nice sampling of the sort of folds we get.
Up first is R.hs
. This defines the first type for a fold we’ve seen.
Reading this as “a right fold from a
to b
” we notice a few parts here. It looks like that existential r
encodes our fold’s inner state and r > b
maps the current state into the result of the fold. That leaves a > r > r
as the stepping function. All in all this doesn’t look too different from
The rest of this module is devoted to making a lot of instances for R
. Some of these are really uninteresting like Bind
, but quite a few are enlightening. To start with, Profunctor
.
instance Profunctor R where
dimap f g (R k h z) = R (g . k) (h . f) z
rmap g (R k h z) = R (g . k) h z
lmap f (R k h z) = R k (h . f) z
This should more or less by what you expect since it’s really the only the way to get the types to fit together. We fit the map from b > d
onto the presentation piece of the fold and stick the map from a > c
onto the stepper so it can take the new pieces of input.
Next we have the instance for Choice
.
instance Choice R where
left' (R k h z) = R (_Left %~ k) step (Left z) where
step (Left x) (Left y) = Left (h x y)
step (Right c) _ = Right c
step _ (Right c) = Right c
right' (R k h z) = R (_Right %~ k) step (Right z) where
step (Right x) (Right y) = Right (h x y)
step (Left c) _ = Left c
step _ (Left c) = Left c
This was slightly harder for me to read, but it helps to remember that here _Left %~
and _Right %~
are just mapping over the left and right sides of an Either
. That clears up the presentation bit. For the initial state, when we’re pulling our computation through the left side we wrap it in a Left
, when we’re pulling it through the right, we wrap it in Right
.
The interesting bit is the new step
function. It short circuits if either our state or our new value is the wrong side of an Either
otherwise it just applies our stepping function and wraps it back up as an Either
.
In addition to being a profunctor, R
is also a monad and comonad as well as a whole bunch of more finely grained classes built around those two. I’ll just show the Monad
Applicative
, and Comonad
instance here.
instance Applicative (R a) where
pure b = R (\() > b) (\_ () > ()) ()
R xf bxx xz <*> R ya byy yz = R
(\(Pair' x y) > xf x $ ya y)
(\b ~(Pair' x y) > Pair' (bxx b x) (byy b y))
(Pair' xz yz)
instance Comonad (R a) where
extract (R k _ z) = k z
duplicate (R k h z) = R (R k h) h z
instance Monad (R a) where
return b = R (\() > b) (\_ () > ()) ()
m >>= f = R (\xs a > run xs (f a)) (:) [] <*> m
Looking at the Comonad
instance nesting a fold within a fold doesn’t change the accumulator, only the presentation. A nested fold is one that runs and returns a new fold which is identical except the starting state is the result of the old fold.
The <*>
operator here is kind of nifty. First off it zips both folds together using the strict Pair'
. Finally when we get to the presentation stage we map the final state for the left which gives us a function, and the final state for the right maps to its argument. Applying these two gives us our final result.
Notice that there’s some craziness happening with irrefutable patterns. When we call this function we won’t attempt to force the second argument until bxx
forces x
or byy
forces y
. This is important because it makes sure that <*>
preserves short circuiting.
The monad instance has a suitably boring return
and >>=
is a bit odd. We have one machine which accumulates all the elements it’s given in a list, this is an “identity fold” of sorts. From there our presentation function returns a lambda which expects an a
and runs f a
with all the input we’ve saved. We combine this with m
by running it in parallel with <*>
and feeding the result of m
back into the lambda generated by the right.
Now we’re finally in a position to define our Scan
and Folding
instances. Since the Scan
instance can be determined from the Folding
one I’ll show Folding
.
instance Folding R where
run t (R k h z) = k (foldr h z t)
prefix s = extend (run s)
postfix t s = run s (duplicate t)
runOf l s (R k h z) = k (foldrOf l h z s)
prefixOf l s = extend (runOf l s)
postfixOf l t s = runOf l s (duplicate t)
filtering p (R k h z) = R k (\a r > if p a then h a r else r) z
It took some time, but I understand how this works! The first thing to notice is that actually running a fold just relies on the foldr
we have from Foldable
. Postfixing a fold is particularly slick with right folds. Remember that z
represents the accumulated state for the remainder of the items in our sequence.
Therefore, to postfix a number of elements all we need do is run the fold on the container we’re given and store the results as the new initial state. This is precisely what happens with run s (duplicate t)
.
Now prefix
is the inefficient one here. To prefix an element we want to change how presentation works. Instead of just using the default presentation function, we actually want to take the final state we get and run the fold again using this prefixing sequence and then presenting the result. For this we have another helpful comonandic function, extend
. This leaks because it holds on to the sequence a lot longer than it needs to.
The rest of these functions are basically the same thing except maybe postfixing (ha) a function with Of
here and there.
Next up is (strict) left folds. As with right folds this module is just a data type and a bunch of instances for it.
One thing that surprised me here was that our state r
isn’t stored strictly! That’s a bit odd but presumably there’s a good reason for this. Now all the instances for L'
are the same as those for R
up to isomorphism because the types are well.. isomorphic.
The real difference comes in the instances for Scan
and Folding
. Remember how Folding R
used foldr
, well here we just use foldl'
. This has the upshot that all the strictness and whatnot is handled entirely by the foldable instance!
instance Folding L' where
run t (L' k h z) = k $! foldl' h z t
prefix s = run s . duplicate
postfix t s = extend (run s) t
runOf l s (L' k h z) = k $! foldlOf' l h z s
prefixOf l s = runOf l s . duplicate
postfixOf l t s = extend (runOf l s) t
filtering p (L' k h z) = L' k (\r a > if p a then h r a else r) z
So everywhere we had foldr
we have foldl'
. The other interesting switch is that our definitions of prefix
and postfix
are almost perfectly swapped! This actually makes perfect sense when you think about it. In a left fold the state is propagating from the beginning to the end versus a right fold where it propagates from the end to the beginning! So to prefix something when folding to the left we add it to the initial state and when postfixing we use the presentation function to take our final state and continue to fold with it.
If you check above, you’ll find this to be precisely the opposite of what we had for right folds and since they both have the same comonad instance, we can swap the two implementations.
In fact, having read the implementation for right folds I’m noticing that almost everything in this file is so close to what we had before. It really seems like there is a clever abstraction just waiting to break out.
Now that we’ve seen how left and right folds are more or less the same, let’s try something completely different! M.hs
captures the notion of a foldMap
and looks pretty different than what we’ve seen before.
First things first, here’s the type in question.
We still have a presentation function m > b
, and we still have an internal state m
. However, we also have a conversion function to map our inputted values onto the values we know how to fold together and we have a tensor operation m > m > m
.
Now as before we have a profunctor instance
instance Profunctor M where
dimap f g (M k h m e) = M (g.k) (h.f) m e
rmap g (M k h m e) = M (g.k) h m e
lmap f (M k h m e) = M k (h.f) m e
Which might start to look familiar from what we’ve seen so far. Next we have a Choice
instance which is still a little intimidating.
instance Choice M where
left' (M k h m z) = M (_Left %~ k) (_Left %~ h) step (Left z) where
step (Left x) (Left y) = Left (m x y)
step (Right c) _ = Right c
step _ (Right c) = Right c
right' (M k h m z) = M (_Right %~ k) (_Right %~ h) step (Right z) where
step (Right x) (Right y) = Right (m x y)
step (Left c) _ = Left c
step _ (Left c) = Left c
As before we use prisms and %~
to drag our presentation and conversion functions into Either
, similarly our starting state is wrapped in the appropriate constructor and we define a new stepping function with similar characteristic’s to what we’ve seen before.
As before, we’ve got a wonderful world of monads and comonads to dive into now. We’ll start with monads here to mix it up.
instance Applicative (M a) where
pure b = M (\() > b) (\_ > ()) (\() () > ()) ()
M xf bx xx xz <*> M ya by yy yz = M
(\(Pair' x y) > xf x $ ya y)
(\b > Pair' (bx b) (by b))
(\(Pair' x1 y1) (Pair' x2 y2) > Pair' (xx x1 x2) (yy y1 y2))
(Pair' xz yz)
instance Monad (M a) where
return = pure
m >>= f = M (\xs a > run xs (f a)) One Two Zero <*> m
Our return
/pure
just instantiates a trivial fold that consumes ()
s and outputs the value we gave it. For <*>
we run both machines strictly next to each other and apply the final result of one to the final result of the other.
Bind creates a new fold that creates a tree. This tree contains every input fed to it as it’s folding and stores each merge a node in the tree. While we run this, we also run the original m
we were given. Finally, when we reach the end, we apply f
to the result of m
and run this over the tree we’ve created which is foldable. If you remember back to the comment of Tree a
capturing foldMap
this is what was meant by it: we’re using a tree to suspend a foldMap
until we’re in a position to run it.
Now for comonad.
instance Comonad (M a) where
extract (M k _ _ z) = k z
duplicate (M k h m z) = M (\n > M (k . m n) h m z) h m z
We can be pleasantly surprised that most of this code is the same. Extraction grabs our current state and presents it. Duplication creates a fold which will run and return a new fold. This new fold has the same initial state as the original fold, but when it goes to present its results it will merge it with the final state of the outer fold. This is very different from before and I suspect it will significantly impact our Folding
instance.
instance Folding M where
run s (M k h m (z :: m)) = reify (m, z) $
\ (_ :: Proxy s) > k $ runN (foldMap (N #. h) s :: N m s)
prefix s (M k h m (z :: m)) = reify (m, z) $
\ (_ :: Proxy s) > case runN (foldMap (N #. h) s :: N m s) of
x > M (\y > k (m x y)) h m z
postfix (M k h m (z :: m)) s = reify (m, z) $
\ (_ :: Proxy s) > case runN (foldMap (N #. h) s :: N m s) of
y > M (\x > k (m x y)) h m z
filtering p (M k h m z) = M k (\a > if p a then h a else z) m z
This was a little intimidating so I took the liberty of ignoring *Of
functions which are pretty much the same as what we have here.
To run a fold we use foldMap
, but foldMap
wants to work over monoids and we only have z
and m
. To promote this to a type class we use reify
and N
. Remember N
from way back when? It’s the data type that uses reflection to yank a tuple out of our context and treat it as a monoid instance. In all of this code we use reify
to introduce a tuple to our environment and N
as a pseudomonoid that uses m
and z
.
with this in mind, this code uses N #. h
which uses the normal conversion function to introduce something into the N
monoid. Then foldMap
takes care of the rest and all we need do is call runN
to extract the results.
prefix
and postfix
are actually markedly similar. They both start by running the fold over the supplied structure which reduces it to an m
. From there, we create a new fold which is identical in all respects except the presentation function. The new presentation function uses m
to combine the pre/postfixed result with the new result. If we’re postfixing, the postfixed result is on the right, if we’re prefixing, the left.
What’s particularly stunning is that neither of these leak! We don’t need to hold onto the structure in our new fold so we can prefix and postfix in constant memory.
Now that we’ve gone through a bunch of instances of Folding
and Scanning
, we’re in a position to actually look at what Data.Fold
exports.
module Data.Fold
( Scan(..)
, Folding(..)
, beneath
, L1(..)  lazy Mealy machine
, L1'(..)  strict Mealy machine
, M1(..)  semigroup reducer
, R1(..)  reversed lazy Mealy machine
, L(..)  lazy Moore machine
, L'(..)  strict Moore machine
, M(..)  monoidal reducer
, R(..)  reversed lazy Moore machine
, AsRM1(..)
, AsL1'(..)
, AsRM(..)
, AsL'(..)
) where
So aside from the folds we’ve examined before, there are 4 new classes, AsRM[1]
, and AsL[1]'
. We’ll look at the non1 versions.
So this class covers the class of p
’s that know how to convert themselves to middle and right folds. Most of these instances are what you’d expect if you’ve ever done the “write foldl
as foldr
” trick or similar shenanigans.
For M
asM
is trivially identity and since m
is expected to be associative we don’t really care that R
is going to associate it strictly to the right. We just glue h
onto the front to map the next piece of input into something we know how to merge.
Next is R
For right folds we do something a bit different. We transform each value into a function of type m > m
which is the back half of a folding function. We can compose these associatively with .
since they are just functions. Finally, when we need to present this, we apply this giant pipeline to the initial state and present the result. Notice here how we took a nonassociative function and bludgeoned it into associativity by partially applying it.
For L'
we do something similar
We once again build up a pipeline of functions to make everything associative and apply it at the end. We can’t just use .
though for composition because we need to force intermediate results. That’s why you see \b g x > g $! h x b
, it’s just strict composition.
It makes sense that we’d bundle right and monoidal folds together because every right fold can be converted to a monoidal and every monoidal fold to a right. That means that every time we can satisfy one of these functions we can build the second.
This isn’t the case for left folds because we can’t convert a monoidal or right fold to a left one. For the people who are dubious of this, foldl
doesn’t let us capture the same amount of laziness we need. I forgot about this too and subsequently hung my machine trying to prove Edward Kmett wrong.
This means that the AsL'
is a fairly boring class,
class (AsRM p, AsL1' p) => AsL' p where
asL' :: p a b > L' a b
instance AsL' L where
asL' (L k h z) = L' (\(Box r) > k r) (\(Box r) a > Box (h r a)) (Box z)
Now we finally see the point of Box
, it’s designed to stubbornly block attempts at making its contents strict. You can see this because all the instance for L
does is wrap everything in Box
es! Since L'
is the same as L
with some extra seq
s, we can use Box
to nullify those attempts at strictness and give us a normal left fold.
That’s it! We’re done!
Now that we’ve gone through a few concrete implementations and the overall structures in this package hopefully this has come together for you. I must say, I’m really quite surprised at how effectively comonadic operations can capture compositional folds. I’m certainly going to make an effort to use this package or Gabriel’s foldl a bit more in my random “tiny Haskell utility programs”.
If you’re as entranced by these nice little folding libraries as I am, I’d recommend
Trivia fact: this is the longest article out of all 52 posts on Code & Co.
Update: I decided it might be helpful to write some utility folds for folds. I figured this might be interesting to some.
comments powered by Disqus ]]>In this installment of “jozefg is confused by other people’s code” we turn to operational
. This is a package that’s a little less known than I’d like. It provides a monad for transforming an ADT of instructions, a monad that can be used with do
notation and separates out interpretation.
Most people familiar with free monads are wondering what the difference is between operational’s approach and using free monads. Going into this, I have no clue. Hopefully this will become clear later on.
Let’s get started shall we
~$ cabal get operational
Happily enough, there’s just one (small) file so we’ll go through that.
To start with Control.Monad.Operational
exports
module Control.Monad.Operational (
Program, singleton, ProgramView, view,
interpretWithMonad,
ProgramT, ProgramViewT(..), viewT,
liftProgram,
) where
Like with most “provides a single monad” packages, I’m most interested in how Program
works. Looking at this, we see that it’s just a synonym
Just like the mtl, this is defined in terms of a transformer. So what’s this transformer?
data ProgramT instr m a where
Lift :: m a > ProgramT instr m a
Bind :: ProgramT instr m b > (b > ProgramT instr m a)
> ProgramT instr m a
Instr :: instr a > ProgramT instr m a
So ProgramT
is a GADT, this is actually important because Bind
has an existential type variable: b
. Otherwise this is really just a plain tree, I assume (>>=) = Bind
and return = Lift . return
in the monad instance for this. And finally we can see that instructions are also explicitly supported with Instr
.
We can confirm that the Monad
instance is as boring as we’d expect with
instance Monad m => Monad (ProgramT instr m) where
return = Lift . return
(>>=) = Bind
instance MonadTrans (ProgramT instr) where
lift = Lift
instance Monad m => Functor (ProgramT instr m) where
fmap = liftM
instance Monad m => Applicative (ProgramT instr m) where
pure = return
(<*>) = ap
So clearly there’s no interesting computation happening here. Looking at the export list again, we see that there’s a helpful combinator singleton
for building up these Program[T]
s since they’re kept abstract.
Which once again is very boring.
So this is a lot like free monads it seems since neither one of these actually does much in its monad instance. Indeed the equivalent with free monads would be
data Free f a = Pure a  Free (f (Free f a))
instance Functor f => Monad (Free f) where
return = Pure
Pure a >>= f = f a
(Free a) >>= f = Free (fmap (>>= f) a)
singleton :: Functor f => f a > Free f a
singleton = Free . fmap Pure
The obvious differences is that
Free
requires a functor while Program
doesn’tFree
s monad instance automatically guarantees laws2 is the bigger one for me. Free
has a tighter set of constraints on its f
so it can guarantee the monad laws. This is clearly false with Program
since return a >>= f
introduces an extra Bind
instead of just giving f a
.
This would explain why ProgramT
is kept abstract, it’s hopelessly broken just to expose it in its raw form. Instead what we have to do is somehow partially normalize it before we present it to the user.
Indeed that’s exactly what ProgramViewT
is representing. It’s a simpler data type
data ProgramViewT instr m a where
Return :: a > ProgramViewT instr m a
(:>>=) :: instr b
> (b > ProgramT instr m a)
> ProgramViewT instr m a
This apparently “compiles” a Program
so that everything is either binding an instruction or a pure value. What’s interesting is that this seems to get rid of all Lift
’s as well.
How do we produce one of these? Well that seems to be viewT
’s job.
viewT :: Monad m => ProgramT instr m a > m (ProgramViewT instr m a)
viewT (Lift m) = m >>= return . Return
viewT ((Lift m) `Bind` g) = m >>= viewT . g
viewT ((m `Bind` g) `Bind` h) = viewT (m `Bind` (\x > g x `Bind` h))
viewT ((Instr i) `Bind` g) = return (i :>>= g)
viewT (Instr i) = return (i :>>= return)
Note that this function returns an m (ProgramViewT instr m a)
, not just a plain ProgramViewT
. This makes sense because we have to get rid of the lifts. What I think is particularly interesting here is that the 2nd and 3rd cases are just the monad laws!
The second one says binding to a computation is just applying the function to it in the obvious manner. The third reassociates bind in a way guaranteed by the monad laws.
This means that while ProgramT
isn’t going to satisfy the monad laws, we can’t tell because all the things said to be equal by the monad laws will compile to the same view. Terribly clever stuff.
The rest of the module is mostly boring stuff like Monad*
instances. The last interesting functions is interpretWithMonad
interpretWithMonad :: forall instr m b.
Monad m => (forall a. instr a > m a) > (Program instr b > m b)
interpretWithMonad f = eval . view
where
eval :: forall a. ProgramView instr a > m a
eval (Return a) = return a
eval (m :>>= k) = f m >>= interpretWithMonad f . k
This nicely highlights how you’re supposed to write an interpreter for a Program
. eval
handles the two cases of the view
using the mapping to a monad we provided and view
handles actually compiling the program into these two cases. All in all, not too shabby.
Now I assume that most people didn’t actually download the source to operational, but you really should! Inside you’ll find a whole directory, doc
. It contains a few markdown files with explanations and references to the appropriate papers as well as a couple examples of actually building things with operational
.
Now that you understand how the current implementation works, you should be able to understand most of what is being said there.
So operational
illustrates a neat trick I rather like: using modularity to provide an O(1)
implementation of >>=
and hide its rule breaking with a view.
This package also drops the positivity requirement that Free
implies with its functor constraint. Which I suppose means you could have
Which is potentially useful.
Last but not least, operational
really exemplifies having a decent amount of documentation even though there’s only ~100 lines of code. I think the ratio of documentation : code is something like 3 : 1 which I really appreciate.
So the results from Stephen’s poll are in! Surprisingly, impredicative types topped out the list of type system extensions people want to talk about so I figured I can get the ball rolling.
First things first, all the Haskell code will need the magical incantation
We have a lot of extensions that make polymorphism more flexible in Haskell, RankNTypes
and Rank2Types
spring to mind. However, one important feature lacking is “first class polymorphism”.
With impredicative polymorphism forall
’s become a normal type like any other. We can embed them in structures, toss them into polymorphic functions, and generally treat them like any other type.
Readers with a mathematical background will wonder why these are called “impredicative” types then. The idea is that since we can have polymorphic types embedded in other structures, we could have something like
That a
could assume any time including T
. So each type definition can quantify over itself which nicely corresponds to the mathematical notion of impredicativity.
One simple example where this might come up is when dealing with lenses. Remember lenses have the type
If we were to embed lenses in let’s say a tuple,
type TLens a b = (Lens a a (a, b) (a, b), Lens b b (a, b) (a, b))
foo :: TLens Int Bool
foo = (_1, _2)
We’d need impredicative types because suddenly a polymorphic type has appeared within a structure.
Now that we’ve seen how amazing impredicative polymorphism, let’s talk about how no one uses it. There are two main reasons
Reason 1 isn’t exactly a secret. In fact, SPJ has stated a number of times that he’d like to deprecate the extension since it’s very hard to maintain with everything else going on.
As it stands right now, our only choice is more or less to type check a program and add type signatures when GHC decides to instantiate our beautiful polymorphic type with fresh monomorphic type variables.
For this reason alone, impredicative types aren’t really the most useful thing. The final nail in the coffin is that we can easily make things more reliable by using newtypes. In lens for example we avoid impredicativity with
This means that instead of impredicative types we just need rank N types, which are much more polished.
Well, I’m sorry to be the bearer of bad news for those who filled out XImpredicativeTypes
on the poll, but there you are.
To end on a positive note however, I do know of two example of where impredicative types did save the day. I’ve used impredicative type exactly once to handle church lists properly. Lennart Augustson’s Python DSL makes heavy use of them to present a unified face for variables.
comments powered by Disqus ]]>I like types. If you haven’t figured this out from my blog I really don’t know where you’ve been looking :) If you’ve ever talked to me in real life about why I like types, chances are I mentioned ease of reasoning and correctness.
Instead of showing how to prove parametricity I’d like to show how to rigorously apply parametricity. So we’ll be a step above handwaving and a step below actually proving everything correct.
At a high level parametricity is about the behavior of well typed terms. It basically says that when we have more polymorphic types, there are fewer programs that type check. For example, the type
Tells us everything we need to know about const
. It returns it’s first argument. In fact, if it returns anything (nonbottom) at all, it simply must be its first argument!
Parametricity isn’t limited to simple cases like this however, it can be used to prove that the type
Is completely isomorphic to [a]
!
We can use parametricity to prove free theorems, like if map id = id
then map f . map g = map (f . g)
.
These are nonobvious properties and yet parametricity gives us the power to prove all of them without even looking at the implementation of these functions. That’s pretty cool!
In order to get an idea of how to use parametricity, let’s do some handwavy proofs to get some intuition for how parametricity works.
Start with id
.
We know right away that id
takes some value of type a
and returns another value a
. Most people would safely guess that the returned value is the one we fed it.
In fact, we can kinda see that this is the only thing it could do. If it didn’t, then somehow it’d have to create a value of type a
, but we know that that’s impossible! (Yeah, yeah, I know, bottom. Look the other way for now)
Similarly, if map id
is just id
, then we know that map
isn’t randomly dropping some elements of our list. Since map
isn’t removing elements, in order to take an a
to a b
, map
has to be applying f
to each element! Since that’s true, we can clearly see that
because we know that applying f
and then applying g
is the same as apply f
and g
at the same time!
Now these handwavy statements are all based on one critical point. No matter how we instantiate a type variable, the behaviour we get is related. Instantiating something to Bool
or Int
doesn’t change the fundamental behaviour about what we’re instantiated.
Before we can formally define parametricity we need to flesh out a few things. First things first, we need to actually specify the language we’re working in. For our purposes, we’ll just deal with pure System F.
ty ::= v [Type Variables]
 ty > ty [Function Types]
 forall v. ty [Universal Quantification]
 Bool [Booleans]
exp ::= v [Variables]
 exp exp [Application]
 λv : ty > exp [Abstraction]
 Λv > exp [Type Abstraction]
 exp[ty] [Type Application]
 true [Boolean]
 false [Boolean]
The only real notable feature of our language is that all polymorphism is explicit. In order to have a full polymorphic type we have to use a “big lambda” Λ. This acts just like a normal lambda except instead of abstracting over a term this abstracts over a type.
For example the full term for the identity function is
id = Λ A > \x : A > x
From here we can explicitly specialize a polymorphic type with type application.
id[Bool] true
Aside from this, the typing rules for this language are pretty much identical to Haskell’s. In the interest of brevity I’ll elide them.
Now that we have our language, let’s talk about what we’re interested in proving. Our basic goal is to show that two expressions e1
and e2
are equal. However, we don’t want to use a ==
sort of equality. We really mean that they can’t be distinguished by our programs. That for all programs with a “hole”, filling that hole with e1
or e2
will produce identical results. This is called “observational equivalence” usually and notated with ≅
.
This is a bit more general than just ==
, for example it let’s us say that flip const () ≅ id
. Now let’s define another notion of equality, logical equivalence.
This logical equivalence is an attempt to define equality without just saying “running everything produces the same result”. It turns out it’s really really hard to prove things that aren’t syntactically equivalent will always produce the same result!
Our logical equivalence ~
is defined in a context η : δ ↔ δ'
. The reason for this is that our terms may have free type variables and we need to know how to deal with them. Each δ maps the free types in the types of our terms to a concrete types and η is a relationship for comparing δ(v)
with δ'(v)
.
Put less scarily, η
is a set of rules that say how to compare two terms when the have both are of type v
. This is an important part of our logical relation: it deals with open terms, terms with free variables.
Now η isn’t composed of just any relationship between terms, it has to be “admissible”. Admissibility means that for some relation R, two conditions hold
e R e'
and d ⇒ e
and d' ⇒ e'
, then d R d'
e R e'
and d ≅ e
and d' ≅ e'
, then d R d'
The first rule means that R
is closed under evaluation and the second says that R
respects observational equivalence.
Now we define our logical equivalence in some context δ to be
e, e' : τ
, e ~ e' [η]
if e δ(t) e'
e, e' : Bool
, e ~₂ e' [η]
if e ⇓ v
and e' ⇓ v
f, g : a → b
, f ~ g [η]
if when a ~ b [η]
, f a ~ g b [η]
e e' : ∀ v. t
, e ~ e' [η]
R : p ↔ p'
, e[p] ~ e'[p'] [η[v ↦ R]]
Now this rule has 4 cases, one for each type. That’s the first critical bit of this relation, we’re talking about things by the structure of the type, not the value itself.
Now with this in mind we can state the full parametricity theorem.
For all expressions e and mappings η,
e ~ e [η]
That’s it! Now this is only really useful when we’re talking about polymorphic type, then parametricity states that for any admissible relation R
, two different instantiations are related.
While I won’t go into how to prove it, another important results we’ll use for proofs with parametricity is that (∀η. e ~ e' [η]) ⇔ e ≅ e'
.
Now that I’ve said exactly what parametricity is, I’d like to step through a few proofs. The goal here is to illustrate how we can use this to prove some interesting properties.
First we just have to prove the classic result that any f : forall a. a > a
is equivalent to id = Λa. λx : a. x
.
To prove this we need to show f ~ id [η]
. For this we need to show that for any admissible relation R
between τ
and τ'
, then f[τ] ~ λx : τ'. x [η[a ↦ R]
. Stepping this one more time we end up with the goal that e R e'
then f[τ] e ~ e' ⇔ f[τ] e R e'
Now this is where things get tricky and where we can apply parametricity. We know by definition that f ~ f [η]
. We then choose a new relation S : τ' ↔ τ'
where d S d'
if and only d ≅ e'
and d' ≅ e'
. Exercise to the reader: show admissibility.
From here we know that f[τ] ~ f[τ] [η[a ↦ R]]
and since e S e
then f[τ] e ~ f[τ] e
which implies f[τ] e S f[τ] e
. This means that f[τ] e ≅ e
. From our note above, f[τ] e ~ e
and by transitivity we have f[τ] e R e'
.
Now we can prove something similar, that (f : a → b → a) ≅ const
. The proof is very similar,
f ~ const [η]
f[τ][ν] ~ const[τ'][ν'] [η[a ↦ R][b ↦ S]]
f[τ][ν] a b ~ a' [η[a ↦ R][b ↦ S]] where a R a'
Now we need to show that f a b ≅ a
. For this we define T
to be an admissible relationship where d T d'
if and only if d ≅ a ≅ d'
. From here we also define U
to be an admissible relation where a U b
if and only if a ~ b
.
Now we know that f ~ f [η]
and so
f[τ][ν] ~ f[τ'][ν'] [η[a ↦ T][b ↦ U]]`
And since a T a
and b U b
, we know that
f[τ][ν] a b ~ f[τ'][ν'] a b [η[a ↦ T][b ↦ U]]
this means that f a b ≅ a
and completes our proofs. Hopefully this reinforces the idea of using parametricity and admissible relationships to produces our properties.
Now for something a bit trickier. Church numerals are a classic idea from lambda calculus where
0 ≡ λs. λz. z
1 ≡ λs. λz. s z
2 ≡ λs. λz. s (s z)
And so on. In terms of types,
Now intuitively from this type it seems obvious that this only allows us to apply the first argument n
types to the second, like a church numeral. Because of this we want to claim that we can compose the first argument with itself n
times before applying it to the second or for all c : Nat
, there exists an n
so that compose n ≡ c
.
To prove this we proceed as before and we end up with
compose[τ] s z ~ c[τ'] s' z' [η[c ↦ R]]
Now we define a new relation S
where
a S b
if a ≅ z' ≅ b
a S b
if n S n'
and a ≅ s' n
and b ≅ s' n'
Now we know that c[τ'] s' z' S c[τ'] s' z'
so by inversion on this we can determine that n
applications of s'
followed by z'
.
Set the n
for compose to this new n
. From here our result follows by induction on n
.
This proof means there’s a mapping from c
to n
. The curious reader is encouraged to show this is an invertible mapping and complete the proof of isomorphism.
Now most people in the Haskell community have heard the term “free theorem”, what they may not realize is that free theorems are a direct result of parametricity.
In fact, if you read Wadler’s original paper sections 5 and onwards establish parametricity. What’s interesting here is that Wadler opts to establish it in a similar way to how Reynolds did. He first defines a mathematical structure called a “type frame”.
This structure lets us map a program in something like System F or Haskell into pure math functions. From there it defines relationships in a similar way to our logical relation and shows it’s reflexive.
I didn’t opt for this route because
It’s still definitely worth reading for the curious though.
Now that we’ve defined parametricity and established a few theorems for it, I hope you can start to see the advantage of types to guide our programs. General enough types can give us assurances without every even looking at the code in question.
Aren’t types cool?
comments powered by Disqus ]]>Proving things about programs is quite hard. In order to make it simpler, we often lie a bit. We do this quite a lot in Haskell when we say things like “assuming everything terminates” or “for all sane values”. Most of the time, this is alright. We sometimes need to leave the universe of terminating things, though, whenever we want to prove things about streams or other infinite structures.
In fact, once we step into the wondrous world of the infinite we can’t rely on structural induction anymore. Remember that induction relies on the “well foundedness” of the thing we’re inducting upon, meaning that there can only be finitely many things smaller than any object. However there is an infinite descending chain of things smaller than an infinite structure! For some intuition here, something like foldr
(which behaves just like structural induction) may not terminate on an infinite list.
This is quite a serious issue since induction was one of our few solid tools for proof. We can replace it though with a nifty trick called coinduction which gives rise to a useful notion of equality with bisimulation.
Before we get to proving programs correct, let’s start with proving something simpler. The equivalence of two simple machines. These machines (A and B) have 3 buttons. Each time we push a button the machine reconfigures itself. A nice real world example of such machines would be vending machines. We push a button for coke and out pops a (very dusty) can of coke and the machine is now slightly different.
Intuitively, we might say that two vending machines are equivalent if and only if our interactions with them can’t distinguish one from the other. That is to say, pushing the same buttons on one gives the same output and leaves both machines in equivalent states.
To formalize this, we first need to formalize our notion of a vending machine. A vending machine is a comprised set of states. These states are connected by arrows labeled with a transition. We’ll refer to the start of a transition as its domain and its target as the codomain. This group of transitions and states is called a labeled transition system (LTS) properly.
To recap how this all relates back to vending machines
A
and B
would mean we could push a button on A
and wind up with B
Notice that this view pays no attention to all the mechanics going on behind the scenes of pushing a button, only the end result of the button push. We refer to the irrelevant stuff as the “internal state” of the vending machine.
Let’s consider a relation R
with A R B
if and only if
f
from transitions from A to transitions from B so that x
and f(x)
have the same label.A R B
and A
has a transition x
, then the codomain of x
is related to the codomain of f(x)
.g
satisfying 1. and 2., but from transitions from B to transitions from A.This definition sets out to capture the notion that two states are related if we can’t distinguish between them. The fancy term for such a relation is a bisimulation. Now our notion of equivalence is called bisimilarity and denoted ~
, it is the union of all bisimulations.
Now how could we prove that A ~ B
? Since ~
is the union of all bisimulations, all we need to is construct a bisimulation so that A R B
and hey presto, they’re bisimilar.
To circle back to vending machine terms, if for every button on machine A
there’s a button on B
that produces the same drink and leaves us with related machines then A
and B
are the same.
It’s all very well and good that we can talk about the equality of labeled transition systems, but we really want to talk about programs and pieces of data. How can we map our ideas about LTSs into programs?
Let’s start with everyone’s favorite example, finite and infinite lists. We define our domain of states to be
L(A) = {nil} ∪ {cons(n, xs)  n ∈ ℕ ∧ xs ∈ A}
We have to define this as a function over A
which represents the tail of the list which means this definition isn’t recursive! It’s equivalent to
What we want here is a fixed point of L
, an element X
so that L(X) = X
. This is important because it means
Which is just the type we’d expect cons to have. There’s still a snag here, what fixed point do we want? How do we know one even exists? I’d prefer to not delve into the math behind this (see TAPL’s chapter on infinite types) but the gist of it is, if for any function F
F
is monotone so that x ⊆ y ⇒ F(x) ⊆ F(y)
F
is cocontinuous so that ∩ₓF(x) = F(∩ₓ x)
Then there exists an X = F(X)
which is greater or equal to all other fixpoints. The proof of this isn’t too hard, I encourage the curious reader to go and have a look. Furthermore, poking around why we need cocontinuity is enlightening, it captures the notion of “nice” lazy functions. If you’ve looked at any domain theory, it’s similar to why we need continuity for least fixed pointed (inductive) functions.
This greatest fixed point what we get with Haskell’s recursive types and that’s what we want to model. What’s particularly interesting is that the greatest fixed point includes infinite data which is very different than the least fixed point which is what we usually prefer to think about when dealing with things like Falgebras and proofs by induction.
Now anyways, to show L
has a fixed point we have to show it’s monotone. If X ⊆ Y
then L(X) ⊆ L(Y)
because x ∈ L(X)
means x = nil ∈ L(Y)
or x = cons(h, t)
, but since t ∈ X ⊆ Y
then cons(h, t) ∈ L(Y)
. Cocontinuity is left as an exercise to the reader.
So L
has a greatest fixed point: X
. Let’s define an LTS with states being L(X)
and with the transitions cons(a, x) → x
labeled by a
. What does bisimilarity mean in this context? Well nil ~ nil
since neither have any transitions. cons(h, t) ~ cons(h', t')
if and only if h = h'
and t ~ t'
. That sounds a lot like how equality works!
Demonstrate this let’s define two lists
Let’s prove that foo ~ bar
. Start by defining a relation R
with foo R bar
. Now we must show that each transition from foo
can be matched with one from bar
, since there’s only one from each this is easy. There’s a transition from foo → foo
labeled by 1
and a transition from bar → cons(1, bar)
also labeled by one. Here lies some trouble though, since we don’t know that foo R cons(1, bar)
, only that foo R bar
. We can easily extend R
with foo R cons(1, bar)
though and now things are smooth sailing. The mapping of transitions for this new pair is identical to what we had before and since we know that foo R bar
, our proof is finished.
To see the portion of the LTS our proof was about
foo bar
 1  1
foo cons(1, bar)
 1  1
foo bar
and our bisimulation R
is just given by {(foo, bar), (foo, cons(1, bar))}
.
Now that we’ve seen that we can map our programs into LTSs and apply our usual tricks there, let’s formalize this a bit.
First, what exactly is [co]induction? Coinduction is a proof principle for proving something about elements of the greatest fixed point of a function, F
. We can prove that the greatest fixed point, X
, is the union of all the sets Y
so that Y ⊆ F(Y)
.
If we can prove that there exists an Y ⊆ F(Y)
that captures our desired proposition then we know that Y ⊆ gfp(F)
. That is the principle of coinduction. Unlike the principle of induction we don’t get proofs about all members of a set, rather we get proves that there exists members which satisfy this property. It also should look very similar to how we proved things about ~
.
So now that we’ve defined coinduction across a function, what functions do we want to actually plop into this? We already now what we want for lists,
List(A) = {nil} ∪ {cons(h, t)  h ∈ X ∧ t ∈ A}
But what about everything else? Well, we do know that each value is introduced by a rule. These rules are always of the form
Some Premises Here
——————————————————
conclusion here
So for example, for lists we have
——————————————
nil ∈ List(A)
h ∈ H t ∈ A
————————————–————————
cons(h, t) ∈ List(A)
Now our rules can be converted into a function with a form like
F(A) = ∪ᵣ {conclusion  premises}
So for lists this gives
F(A) = {nil} ∪ {cons(h, t)  h ∈ H ∧ t ∈ A}
as expected. We can imagine generalizing this to other things, like trees for example
————————–—————
leaf ∈ Tree(A)
x ∈ H l ∈ A r ∈ A
————————–—————————————–
node(h, l, r) ∈ Tree(A)
Tree(A) = {leaf} ∪ {node(h, l, r)  x ∈ H ∧ l ∈ A ∧ r ∈ A}
Now the most common thing we want to prove is some notion of equability. This is harder then it seems because the usual notions of equality don’t work.
Instead we can apply bisimulation. Our approach is the same, we define a criteria for what it means to be a bisimulation across a certain type and define ~
as the union of all bisimulations. On lists we wanted the heads to be equal and the tails to be bisimilar, but what about on trees? We can take the same systematic approach we did before by considering what an LTS on trees would look like. leaf
has no information contained in it and therefore no transitions. node(a, l, r)
should have two transitions, left or right. Both of these give you a subtree contained by this node. What should they be labeled with? We can follow our intuitions from lists and label them both with a
.
This leaves us with the following definition, R
is a bisimulation on trees if and only if
leaf R leaf
node(a, l, r) R node(a', l', r')
if and only if a = a'
∧ l R l'
and r R r'
So to prove an equality between trees, all we must do is provide such an R
and then we know R ⊆ ~
!
This describes how we deal with coinductive things in general really. We define what it means for a relation to partially capture what we’re talking about (like a bisimulation) and then the full thing is the union of all of these different views! Sortedness could be expressed as a unitary relation S
where
nil ∈ S
cons(a, xs) ∈ S → xs ∈ S ∧ Just a ≤ head xs
the sorted
predicate is the union of all such relations!
I’d like to reel off some pithy dualities between induction and coinduction
So that about wraps up this post.
We’ve seen how infinite structures demand a fundamentally different approach to proofs then finite ones. It’s not all puppies and rainbows though, considering how we managed to spend nearly 300 lines talking about it, coinduction is a lot less intuitive. It is however, our only choice if we want to have “real proofs” in Haskell (terms and conditions may apply).
comments powered by Disqus ]]>After my last post, I didn’t quite feel like ending there. I was a little dissatisfied with how binding was handled in the type checker, the odd blend of HOAS, GUIDs, and DeBruijn variables was… unique.
In the post I explore 3 versions of the same code
bound
to handle all bindingThere’s a lot of code in this post, enough that I think it’s worth hosting the code on its own. You can find it on github and bitbucket.
I’ve already described most of the original method here. To recap
The issue I had with this is we almost got the worst of all 3 worlds! We were constantly bumping a counter to keep up with the free constants we needed to generate. We had to muddy up the types of values with another notion of free constants so we could actually inspect variables under HOAS binders! And finally, we had to do the painful and tedious substitutions on DeBruijn terms.
On the other hand, if you’d never used any of those binding schemes together, you too can go triple or nothing and try to understand that code :)
What I really wanted was to unify how I represented values and terms. I still wanted a clearly correct notion of equality, but in this way I could probably dodge at least two of the above.
The obvious thing to do would be to stick with DeBruijn variables and just instantiate free variables with constants. This is ugly, but it’s moderately less horrible if we use a library to help us with the process.
bound
So my first stab at this approach was with Edward Kmett’s bound. For those who aren’t familiar with this library, it centers around the data type Scope
. Scope b f a
binds variables of type b
in the structure f
with free variables of type a
. The assumption is that f
will be a monad which represents our AST.
Further, f
is parameterized over variables, it doesn’t attempt to distinguish between bound and free ones however. This means that >>=
corresponds to substitution. Then what Scope
does is instantiate these variables to B b a
which is precisely equivalent to Either b a
.
What this results in is that each free variable is a different type from bound ones. Scope
provides various functions for instantiating bound variables and abstracting over free ones. That’s bound
in a nutshell.
It’s a bit easier to grok this by example, here’s our calculus ported to use Scope
data Expr a = Var a
 App (Expr a) (Expr a)
 Annot (Expr a) (Expr a)
 ETrue
 EFalse
 Bool
 Star
 Pi (Expr a) (Scope () Expr a)
 Lam (Scope () Expr a)
 C String
deriving(Functor, Eq)
So the first major difference is that our polarization between inferrable and checkable terms is gone! This wasn’t something I was happy about, but in order to use Scope
we need a monad instance and we can’t define two mutually dependent monad instances without a function from CExpr > IExpr
, something that clearly doesn’t exist.
Since each binder can only bind one variable at a time, we represent the newly bound variable as just ()
. This would be more complicated if we supported patterns or something similar.
Now in addition to just this, we also need a bunch of boilerplate to define some type class instances for Scope
’s benefit.
instance Eq1 Expr where (==#) = (==)
instance Applicative Expr where
pure = return
(<*>) = ap
instance Monad Expr where
return = Var
Var a >>= f = f a
(App l r) >>= f = App (l >>= f) (r >>= f)
ETrue >>= _ = ETrue
EFalse >>= _ = EFalse
Bool >>= _ = Bool
Star >>= _ = Star
C s >>= _ = C s
Annot l r >>= f = Annot (l >>= f) (r >>= f)
Pi l s >>= f = Pi (l >>= f) (s >>>= f)
Lam e >>= f = Lam (e >>>= f)
That weird >>>=
is just >>=
that works through Scope
s. It’s a little bit frustrating that we need this somewhat boilerplatey monad instance, but I think the results might be worth it.
From here we completely forgo an explicit Val
type. We’re completely scrapping that whole HOAS and VConst
ordeal. Instead we’ll just trust Scope
’s clever Eq
instance to handle alpha conversion. We do need to implement normalization though
type Val = Expr
nf :: Expr a > Val a
nf = \case
(Annot e t) > nf e  Important, nf'd data throws away annotations
(Lam e) > Lam (toScope . nf . fromScope $ e)
(Pi l r) > Pi (nf l) (toScope . nf . fromScope $ r)
(App l r) >
case l of
Lam f > nf (instantiate1 r f)
l' > App l' (nf r)
e > e
What’s interestingly different is actual work is shifted from within the higher order binders we had before into the case expression in App
.
It’s also worth mentioning the few bound specifics here. toScope
and fromScope
expose the underlying f (V b a)
that a Scope
is hiding. We’re then can polymorphically recur (eat your heart out sml) over the now unbound variables and continue on our way.
Again, notice that I’ve defined nothing to do with substitution or scoping, this is all being handled by bound.
Now our actual type checker is still essentially identical. We’re still using monadgen
to generate unique variable names, it’s just that now bound
handles the messy substitution. The lack of distinction between inferrable, checkable, and normalized terms did trip me up once our twice though.
data Env = Env { localVars :: M.Map Int (Val Int)
, constants :: M.Map String (Val Int) }
type TyM = ReaderT Env (GenT Int Maybe)
unbind :: (MonadGen a m, Functor m, Monad f) => Scope () f a > m (a, f a)
unbind scope = ((,) <*> flip instantiate1 scope . return) <$> gen
unbindWith :: Monad f => a > Scope () f a > f a
unbindWith = instantiate1 . return
inferType :: Expr Int > TyM (Val Int)
inferType (Var i) = asks (M.lookup i . localVars) >>= maybe mzero return
inferType (C s) = asks (M.lookup s . constants) >>= maybe mzero return
inferType ETrue = return Bool
inferType EFalse = return Bool
inferType Bool = return Star
inferType Star = return Star
inferType (Lam _) = mzero  We can only check lambdas
inferType (Annot e ty) = do
checkType ty Star
let v = nf ty
v <$ checkType e v
inferType (App f a) = do
ty < inferType f
case ty of
Pi aTy body > nf (App (Lam body) a) <$ checkType a aTy
_ > mzero
inferType (Pi t s) = do
checkType t Star
(newVar, s') < unbind s
local (\e > e{localVars = M.insert newVar (nf t) $ localVars e}) $
Star <$ checkType s' Star
checkType :: Expr Int > Val Int > TyM ()
checkType (Lam s) (Pi t ts) = do
(newVar, s') < unbind s
local (\e > e{localVars = M.insert newVar (nf t) $ localVars e}) $
checkType s' (nf $ unbindWith newVar ts)
checkType e t = inferType e >>= guard . (== t)
I defined two helper functions unbind
and unbindWith
which both ease the process of opening a scope and introducing a new free variable. I actually split these off into a tiny library, but I haven’t uploaded it to hackage yet.
I suppose that 4. would be a nonissue for a lot of people who don’t care about bidirectional type checkers.
Higher order abstract syntax is a really nifty trick. The idea is that Haskell already has a perfectly good notion of variables and substitution lying around! Let’s just use that. We represent our functions with actual >
s and we don’t have a constructor for variables anymore.
The only issue is that Haskell doesn’t let us inspect the bodies of functions. We need to do this, however, for a type checker! To deal with this we dirty our AST a bit and add in IGen
’s, placeholders for where normal Haskell variables would normally go. Our new AST looks like this
data Expr = App Expr Expr
 Annot Expr Expr
 ETrue
 EFalse
 Bool
 Star
 Pi Expr (Expr > Expr)
 Lam (Expr > Expr)
 C String
 IGen Int
type NF = Expr
Notice how both Pi
and Lam
have functions embedded in them. Now normalization is actually quite slick because functions are easy to work with in Haskell
nf :: Expr > NF
nf ETrue = ETrue
nf EFalse = EFalse
nf Bool = Bool
nf Star = Star
nf (C s) = C s
nf (IGen i) = IGen i
nf (Annot l _) = nf l
nf (Pi t f) = Pi (nf t) (nf . f)
nf (Lam f) = Lam (nf . f)
nf (App l r) = case nf l of
Lam f > nf . f $ l
l' > App l' (nf r)
This is actually quite similar to the Val
type we started with. That was also used HOAS and we end up with a similarly structured normalization.
For the same reason, the equivalence checking procedure is pretty much the same thing
eqTerm :: NF > NF > Bool
eqTerm l r = runGenWith (successor s) (IGen 0) $ go l r
where s (IGen i) = IGen (i + 1)
s _ = error "Impossible!"
go Star Star = return True
go Bool Bool = return True
go ETrue ETrue = return True
go EFalse EFalse = return True
go (Annot l r) (Annot l' r') = (&&) <$> go l l' <*> go r r'
go (App l r) (App l' r') = (&&) <$> go l l' <*> go r r'
go (Pi t f) (Pi t' g) =
(&&) <$> go t t' <*> (gen >>= \v > go (f v) (g v))
go (IGen i) (IGen j) = return (i == j)
go _ _ = return False
In fact, the only differences are that
Enum
instanceThe only reason for two is that the amazing maintainer of monadgen
(hi!) rejiggered some the library to not be so Enum
dependent.
Now from here our type checker is basically what we had before. In the interest of saving time, I’ll highlight the interesting bits: the constructors that bind variables.
data Env = Env { localVars :: M.Map Int NF
, constants :: M.Map String NF }
type TyM = GenT Int (ReaderT Env Maybe)
inferType :: Expr > TyM NF
inferType (Pi t f) = do
checkType t Star
let t' = nf t
i < gen
local (\e > e{localVars = M.insert i t' $ localVars e}) $
Star <$ checkType (f $ IGen i) Star
checkType :: Expr > NF > TyM ()
checkType (Lam f) (Pi t g) = do
i < gen
let t' = nf t
rTy = nf (g $ IGen i)
local (\e > e{localVars = M.insert i t' $ localVars e}) $
checkType (f $ IGen i) rTy
At this point you may have started to notice the pattern, the only real difference here is that substitution is completely free. Otherwise, I don’t really have much to say about HOAS.
In conclusion, I think we can all agree that the original version of this type checker was unpleasant to say the least. It did considerably improve with bound
mostly because the normalizeandcompare equivalence checking is really easy since bound
handles alpha conversion. On the other hand, actually doing work beneath a binder is a bit of a pain since we have to take care to never unwrap a binder with a previously bound variable. We handled this with a hacky little trick with monadgen
, but a permanent and clean solution still seems hard.
We can avoid this fully by hitching a ride on Haskell’s variables and substitution using HOAS, this is wonderful until it’s not. The issue is that comparing functions for equality is still a pain so we ended up with an equivalence check much like what we had in the original version.
In the future it’d be interesting to try this with unbound
, a library in the same domain as bound
with a very different approach.
It’s been a while since I posted about some code I’ve been reading, but today I found a little gem: concurrentsupply. This package sets out to provide fast way to generate unique identifiers in a way that’s splittable and supports concurrency.
What’s particularly cool about this package is that the code is only about ~100 lines and a goodly chunk of that is pragramas to tell GHC to actually inline trivial functions.
The API is just 5 functions
type Supply
newSupply :: IO Supply
freshId :: Supply > (Int, Supply)
splitSupply :: Supply > (Supply, Supply)
freshId# :: Supply > (# Int, Supply #)
splitSupply# :: Supply > (# Int, Supply #)
Supply
is the type for well.. supplies of fresh integers. We can grab an Int
out of a supply producing a new supply as well. We can also split a supply so that we have two new supplies that will produce disjoint identifiers.
The idea here is that we can have supplies that are used from multiple concurrent threads and they won’t ever
It does go without saying that eventually we run out of ints, so I suppose if you sit and prod a supply for a very long time, something bad will happen.
With that in mind, let’s take a look at the imports for Control.Concurrent.Supply
.
import Data.Hashable
import Data.IORef
import Data.Functor ((<$>))
import Data.Monoid
import GHC.IO (unsafeDupablePerformIO, unsafePerformIO)
import GHC.Types (Int(..))
import GHC.Prim (Int#)
So you can see that some interesting stuff is going to happen, we have both unboxed ints, and unsafe*PerformIO
s. As a quick review, unsafeDupablePerformIO
is for IO
actions which are okay being forced at the same time by different threads which unsafePerformIO
is a little bit more modest and ensures we only force things from one thread at a time.
With this in mind, the code starts with the classic definition of streams in Haskell.
This is followed with some rather a few definitions,
instance Functor Stream where
fmap f (a : as) = f a : fmap f as
extract :: Stream a > a
extract (a : _) = a
units :: Stream ()
units = () : units
{# NOINLINE units #}
Do note that units
won’t be inlined, this is unfortunately important when we’re thinking about with unsafe functions.
Now on top of streams we can define a rather important type, blocks.
data Block = Block Int !(Stream Block)
instance Eq Block where
Block a (Block b _ : _) == Block c (Block d _ : _) = a == c && b == d
instance Ord Block where
Block a (Block b _ : _) `compare` Block c (Block d _ : _) = compare a c `mappend` compare b d
instance Show Block where
showsPrec d (Block a (Block b _ : _)) = showParen (d >= 10) $
showString "Block " . showsPrec 10 a . showString " (Block "
. showsPrec 10 b . showString " ... : ...)"
instance Hashable Block where
hashWithSalt s (Block a (Block b _ : _)) = s `hashWithSalt` a `hashWithSalt` b
So a block is an integer and an infinite number of other blocks. Notice that block identity is purely determined by the first two ints. This is contingent on the fact that all blocks are made with
blockSize :: Int
blockSize = 1024
{# INLINE blockSize #}
 Minimum size to be worth splitting a supply rather than
 just CAS'ing twice to avoid multiple subsequent biased splits
blockCounter :: IORef Int
blockCounter = unsafePerformIO (newIORef 0)
{# NOINLINE blockCounter #}
modifyBlock :: a > IO Int
modifyBlock _ =
atomicModifyIORef blockCounter $ \ i >
let i' = i + blockSize in i' `seq` (i', i)
{# NOINLINE modifyBlock #}
gen :: a > Block
gen x = Block (unsafeDupablePerformIO (modifyBlock x)) (gen <$> units)
{# NOINLINE gen #}
newBlock :: IO Block
newBlock = return $! gen ()
{# NOINLINE newBlock #}
This is the first bit of unsafe code, so let’s look at what’s going on. We have a normal constant blockSize
which represents something, it’s not immediately clear what yet. There’s a global mutable variable blockCounter
starting from zero. From there, we have gen
which creates a block by making a thunk which unsafely bumps the block counter by 1024, returning its previous size. To get the stream of blocks we fmap
units
.
It’s worth wondering why we need this polymorphic argument. I’m reasonable certain it’s to prevent GHC from being clever and sharing that (unsafeDupablePerformIO ...)
between blocks. That would be very bad. It might not do that if we where to use ()
instead of a
but there’s no reason a future optimization (if it doesn’t exist already) wouldn’t figure out that there’s only one possible result type and reduce the whole thing to a CAF.
Now a newBlock
wraps all this unsafe updating in IO
and returns the application of gen ()
.
So what does all of this mean? Well each block thunk is going to have its own unique ID, separated by 1024 and only claimed whenever we actually force its first component. We have this gnarly chunk of mutable shared memory that we only ever modify with atomicModifyIORef
, we actually touch it whenever we inspect the first thunk in a Block
. What’s particularly interesting is that this can happen in pure code! By putting off this costly operation as long as possible we amortize the cost of all that contention.
Now we also have to support split, luckily it’s easy to split blocks since we have an infinite number of them nested!
It becomes a bit clearer now why we can completely determine blocks by their “first two” elements. The head is completely unique to each sequence so we know at minimum that if i == j
in Block i xs
and Block j ys
then either xs
or ys
is the tail of the other. This is an invariant we maintain throughout the code not exposing Block
and by ensuring we never :
any new ones onto its internal stream. If these streams have the same head (also unique) then they must be the same sequence so the original blocks are equivalent. Nifty.
Now this still isn’t quite enough, we need one final data type: Supply
data Supply = Supply {# UNPACK #} !Int {# UNPACK #} !Int Block
deriving (Eq,Ord,Show)
blockSupply :: Block > Supply
blockSupply (Block i bs) = Supply i (i + blockSize  1) (extract bs)
{# INLINE blockSupply #}
A supply should be seen almost an iterator over a chunk of a number line. We know that each block is 1024 away from each other and a supply is almost an iterator from the blocks starting value over the next 1023 elements. We know that Supply
s could intersect because the blocks are spaced this far apart.
Once we run out of those elements though, we need to get more. For this we have another block hidden in the back of the supply. It’s kept lazily so that it won’t fire of its first thunk to go bump our global store. When we run out of things to enumerate we call blockSupply
, which will force i
which will go bother the global counter for another chunk of 1024 unique values.
With this understanding, splitSupply
and freshId
are quite easy.
  An unboxed version of freshId
freshId# :: Supply > (# Int#, Supply #)
freshId# (Supply i@(I# i#) j b)
 i /= j = (# i#, Supply (i + 1) j b #)
 otherwise = (# i#, blockSupply b #)
{# INLINE freshId# #}
  An unboxed version of splitSupply
splitSupply# :: Supply > (# Supply, Supply #)
splitSupply# (Supply i k b) = case splitBlock# b of
(# bl, br #)
 k  i >= minSplitSupplySize
, j < i + div (k  i) 2 >
(# Supply i j bl, Supply (j + 1) k br #)
 Block x (l : r : _) < bl
, y < x + div blockSize 2
, z < x + blockSize  1 >
(# Supply x (y  1) l, Supply y z r #)
{# INLINE splitSupply# #}
freshId#
is more or less what we’d expect for an iterator. It returns the lower bound and returns the new supply with the lower bound bumped by one. Notice how cheap this is. In particular, since we haven’t forced b
anywhere we’ve just copied a couple of words. The expensive bit is when we actually run out of values in our range, in this case we return our final value and force operation to produce a new supply. This goes off and hammers on blockCounter
. Happily we only end up doing this 1/1024th of the time.
splitSupply#
is a bit more complicated. When we go to split a supply we’re going to partition its range of values into two separate ranges. However, we want to watch out for splitting extremely small ranges. In this case, it’s slightly more efficient to just bite the bullet and incur the cost of hitting the blockCounter
.
The way we determine this is to split the block b
, giving us two new blocks. If we have more in the current set of ids then minimumSplitSize
all we give the two blocks to two new supplies, each with one half of the original range.
If the block size is indeed two small, we poke the first block in the pair. This causes it to go hammer blockCounter
and from there we divide the range we got back into two and return these smaller supplies over the new range. Notice that we’ve completely tossed the remaining elements in the supply on the floor since there weren’t that many. More interestingly, we completely ignored the second result of our split! The idea is that the most expensive operation we can do here is force that first thunk in a block. However, is long as we don’t force their first components blocks are dirt cheap! Hence it’s cheaper to accept that we only get half of blockSize
on each Supply
but we only had to perform one CAS to get them.
So now that we’ve done all of that, all that’s left in the module is the paperthin wrappers over these functions so we don’t always have to use unboxed tuples
  Obtain a fresh Id from a Supply.
freshId :: Supply > (Int, Supply)
freshId s = case freshId# s of
(# i, s' #) > (I# i, s')
{# INLINE freshId #}
  Split a supply into two supplies that will return disjoint identifiers
splitSupply :: Supply > (Supply, Supply)
splitSupply s = case splitSupply# s of
(# l, r #) > (l, r)
{# INLINE splitSupply #}
And that’s all. I’ll hope this illustrated a fairly unique mix of laziness in side effects to help reduce contention for a difficult concurrent problem.
Cheers
comments powered by Disqus ]]>This week I learned that my clever trick for writing a type checker actually has a proper name: bidirectional type checking. In this post I’ll explain what exactly that is and we’ll use it to write a few fun type checkers.
First of all, let’s talk about one of the fundamental conflicts when designing a statically typed language: how much information need we demand from the user? Clearly we can go too far in either direction. Even people who are supposedly against type inference support at least some inference. I’m not aware of a language that requires you to write something like
my_function((my_var : int) + (1 : int) : int) : string
Clearly inferring the types of some expressions are necessary. On the other hand, if we leave out all type annotations then it becomes a lot harder for a human reader to figure out what’s going on! I at least, need to see signatures for top level functions or I become grumpy.
So inside a type checker we always have two sort of processes
In a bidirectional type checker, we acknowledge these two phases by explicitly separating the type checker into two functions
Our type checker thus has two directions, one where we use the type to validate the expression (the type flows in) or we synthesize the type form the expression (the type flows out). That’s all that this is!
It turns out that a technique like this is surprisingly robust. It handles everything from subtyping to simple dependent types! To see how this actually plays out I think it’d be best to just dive in and do something with it.
Now when we’re building a bidirectional type checker we really want our AST to explicitly indicate inferrable vs checkable types. Clearly the parser might not care so much about this distinction, but prior to type checking it’s helpful to create this polarized tree.
For a simple language you can imagine
data Ty = Bool
 Arr Ty Ty
deriving(Eq, Show)
data IExpr = Var Int
 App IExpr CExpr
 Annot CExpr Ty
 If CExpr IExpr IExpr
 ETrue
 EFalse
data CExpr = Lam CExpr
 CI IExpr
This is just simply typed lambda calculus with booleans. We’re using DeBruijn indices so we need not specify a variable for Lam
. The IExpr
type is for expressions we can infer types for, while CExpr
is for types we can check.
Much this isn’t checking, we can always infer the types of variables, inferring the types of lambdas is hard, etc. Something worth noting is CI
. For any inferrable type, we can make it checkable by inferring a type and checking that it’s equal to what we expected. This is actually how Haskell works, GHC is just inferring type without bothering with your signature and then just checks you were right in the first place!
Now that we’ve separated out our expressions, we can easily define our type checker.
type Env = [Ty]
(?!) :: [a] > Int > Maybe a
xs ?! i = if i < length xs then Just (xs !! i) else Nothing
inferType :: Env > IExpr > Maybe Ty
inferType env (Var i) = env ?! i
inferType env (App l r) =
case inferType env l of
Just (Arr lTy rTy) > checkType env r lTy >> return rTy
_ > Nothing
inferType env (Annot e an) = checkType env e an >> return an
inferType _ ETrue = return Bool
inferType _ EFalse = return Bool
inferType env (If i t e) = do
checkType env i Bool
lTy < inferType env t
rTy < inferType env e
guard (lTy == rTy)
return lTy
checkType :: Env > CExpr > Ty > Maybe ()
checkType env (Lam ce) (Arr l r) = checkType (l : env) ce r
checkType env (CI e) t = inferType env e >>= guard . (t ==)
checkType _ _ _ = Nothing
So our type checker doesn’t have many surprises in it. The environment is easy to maintain since DeBruijn indices are easily stored in a list.
Now that we’ve seen how a bidirectional type checker more or less works, let’s kick it up a notch.
Type checking a simple dependently typed language is actually not nearly as bad as you’d expect. The first thing to realize is that since dependent types have only one syntactic category.
We maintain the distinction between inferrable and checkable values, resulting in
data IExpr = Var Int
 App IExpr CExpr
 Annot CExpr CExpr
 ETrue
 EFalse
 Bool
 Star  New stuff starts here
 Pi CExpr CExpr
 Const String
 Free Int
deriving (Eq, Show, Ord)
data CExpr = Lam CExpr
 CI IExpr
deriving (Eq, Show, Ord)
So you can see we’ve added 4 new expressions, all inferrable. Star
is just the kind of types as it is in Haskell. Pi
is the dependent function type, it’s like Arr
, except the return type can depend on the supplied value.
For example, you can imagine a type like
Which says something like “give me an integer n
and a value and I’ll give you back a list of length n
”.
Interestingly, we’ve introduce constants. These are necessary simply because without them this language is unbelievable boring. Constants would be defined in the environment and they represent constant, irreducible terms. You should think of them almost like constructors in Haskell. For example, one can imagine that 3 constants
Which serve to define the natural numbers.
Last but not least, we’ve added “free variables” as an explicit
Now an important piece of a type checker is comparing types for equality, in STLC, equivalent types are syntactically equal so that was solved with deriving Eq
. Here we need a bit more subtlety. Indeed, now we need to check arbitrary expressions for equality! This is hard. We’ll reduce things as much as possible and then just check syntactic equality. This means that if True then a else b
would equal a
as we’d hope, but \x > if x then a else a
wouldn’t.
Now in order to facilitate this check we’ll define a type for fully reduced expressions. Since we’re only interested in checking equality on these terms we can toss the inferrable vs checkable division out the window.
data VConst = CAp VConst Val
 CVar String
 CFree Int
data Val = VStar
 VBool
 VTrue
 VFalse
 VConst VConst
 VArr Val Val
 VPi Val (Val > Val)
 VLam (Val > Val)
 VGen Int
Now since we have constants we can have chains of application that we can’t reduce, that’s what VConst
is. Notice that this handles the case of just having a constant nicely.
The value dichotomy uses a nice trick from the “Simple Easy!” paper, we use HOAS to have functions that reduce themselves when applied. The downside of this is that we need VGen
to peek inside the now opaque VLam
and VPi
. The idea is we’ll generate a unique Int
and apply the functions to VGen i
.
Now in order to conveniently generate these fresh integers I used monadgen
(it’s not self promotion if it’s useful :). Equality checking comes to
 *Whistle and fidget with hands*
instance Enum Val where
toEnum = VGen
fromEnum _ = error "You're a bad person."
eqTerm :: Val > Val > Bool
eqTerm l r = runGen $ go l r
where go VStar VStar = return True
go VBool VBool = return True
go VTrue VTrue = return True
go VFalse VFalse = return True
go (VArr f a) (VArr f' a') = (&&) <$> go f f' <*> go a a'
go (VLam f) (VLam g) = gen >>= \v > go (f v) (g v)
go (VPi f) (VPi g) = gen >>= \v > go (f v) (g v)
go (VGen i) (VGen j) = return (i == j)
go (VConst c) (VConst c') = case (c, c') of
(CVar v, CVar v') > return (v == v')
(CAp f a, CAp f' a') >
(&&) <$> go (VConst f) (VConst f') <*> go a a'
_ > return False
go _ _ = return False
Basically we just recurse and return true or false at the leaves.
Now that we know how to check equality of values, we actually need to map terms into those values. This involves basically writing a little interpreter.
inf :: [Val] > IExpr > Val
inf _ ETrue = VTrue
inf _ EFalse = VFalse
inf _ Bool = VBool
inf _ Star = VStar
inf _ (Free i) = VConst (CFree i)
inf _ (Const s) = VConst (CVar s)
inf env (Annot e _) = cnf env e
inf env (Var i) = env !! i
inf env (Pi l r) = VPi (cnf env l) (\v > cnf (v : env) r)
inf env (App l r) =
case inf env l of
VLam f > f (cnf env r)
VConst c > VConst . CAp c $ cnf env r
_ > error "Impossible: evaluated illtyped expression"
cnf :: [Val] > CExpr > Val
cnf env (CI e) = inf env e
cnf env (Lam c) = VLam $ \v > cnf (v : env) c
The interesting cases are for Lam
, Pi
, and App
. For App
we actually do reductions wherever we can, otherwise we know that we’ve just got a constant so we slap that on the front. Lam
and Pi
are basically the same, they wrap the evaluation of the body in a function and evaluate it based on whatever is fed in. This is critical, otherwise App
’s reductions get much more complicated.
We need one final thing. You may have noticed that all Val
’s are closed, there’s no free DeBruijn variables. This means that when we go under a binder we can’t type check open terms since we’re representing types as values and the term we’re checking shares variables with its type.
This means that our type checker when it goes under a binder is going to substitute the now free variable for a fresh Free i
. Frankly, this kinda sucks. I poked about for a better solution but this is what “Simple Easy!” does too..
To do these substitutions we have
ibind :: Int > IExpr > IExpr > IExpr
ibind i e (Var j)  i == j = e
ibind i e (App l r) = App (ibind i e l) (cbind i e r)
ibind i e (Annot l r) = Annot (cbind i e l) (cbind i e r)
ibind i e (Pi l r) = Pi (cbind i e l) (cbind i e r)
ibind _ _ e = e  Non recursive cases
cbind :: Int > IExpr > CExpr > CExpr
cbind i e (Lam b) = Lam (cbind (i + 1) e b)
cbind i e (CI c) = CI (ibind i e c)
This was a bit more work than I anticipated, but now we’re ready to actually write the type checker!
Since we’re doing bidirectional type checking, we’re once again going to have two functions, inferType
and checkType
. Our environments is now a record
The inferring stage is mostly the same
inferType :: Env > IExpr > GenT Int Maybe Val
inferType _ (Var _) = lift Nothing  The term is open
inferType (Env _ m) (Const s) = lift $ M.lookup s m
inferType (Env m _) (Free i) = lift $ M.lookup i m
inferType _ ETrue = return VBool
inferType _ EFalse = return VBool
inferType _ Bool = return VStar
inferType _ Star = return VStar
inferType env (Annot e ty) = do
checkType env ty VStar
let v = cnf [] ty
checkType env e v >> return v
inferType env (App f a) = do
ty < inferType env f
case ty of
VPi aTy body > do
checkType env a aTy
return (body $ cnf [] a)
_ > lift Nothing
inferType env (Pi ty body) = do
checkType env ty VStar
i < gen
let v = cnf [] ty
env' = env{locals = M.insert i v (locals env)}
checkType env' (cbind 0 (Free i) body) VStar
return VStar
The biggest difference is that now we have to compute some types on the fly. For example in Annot
we check that we are in fact annotating with a type, then we reduce it to a value. This order is critical! Remember that cnf
requires well typed terms.
Beyond this there are two interesting cases, there’s App
which nicely illustrates what a pi type means and Pi
which demonstrates how to deal with a binder.
For App
we start in the same way, we grab the (function) type of the function. We can then check that the argument has the right type. To produce the output type however, we have to normalize the argument as far as we can and then feed it to body
which computes the return type. Remember that if there’s some free variable in a
then it’ll just be represented as VConst (CFree ...)
.
Pi
checks that we’re quantifying over a type first off. From there it generates a fresh free variable and updates the environment before recursing. We use cbind
to replace all occurrences of the now unbound variable for an explicit Free
.
checkType
is pretty trivial after this. Lam
is almost identical to Pi
and CI
is just eqTerm
.
checkType :: Env > CExpr > Val > GenT Int Maybe ()
checkType env (CI e) v = inferType env e >>= guard . eqTerm v
checkType env (Lam ce) (VPi argTy body) = do
i < gen
let ce' = cbind 0 (Free i) ce
env' = env{locals = M.insert i argTy (locals env)}
checkType env' ce' (body $ VConst (CFree i))
checkType _ _ _ = lift Nothing
And that’s it!
So let’s circle back to where we started: bidirectional type checking! Hopefully we’ve seen how structuring a type checker around these two core functions yields something quite pleasant.
What makes this really interesting though is how well it scales. You can use this style type checker to handle subtyping, [dependent] pattern matching, heaps and tons of interesting features.
At 400 lines though, I think I’ll stop here :)
comments powered by Disqus ]]>One of the common pieces of folklore in the functional programming community is how one can cleanly formulate recursive types with category theory. Indeed, using a few simple notions we can build a coherent enough explanation to derive some concrete benefits.
In this post I’ll outline how one thinks of recursive types and then we’ll discuss some of the practical ramifications of such thoughts.
I’m assuming the reader is familiar with some basic notions from category theory. Specifically familiarity with the definitions of categories and functors.
Let’s talk about endofunctors, which are functors whose domain and codomain are the same. spoiler: These are the ones we care about in Haskell. An interesting notion that comes from endofunctors is that of algebras. An algebra in this sense is a pair of an object C
, and a map F C → C
. Here F
is called the “signature” and C
is called the carrier.
If you curious about why these funny terms, in abstract algebra we deal with algebras which are comprised of a set of distinguished elements, functions, and axioms called the signature. From there we look at sets (called carriers) which satisfy the specification. We can actually cleverly rearrange the specification for something like a group into an endofunctor! It’s out of scope for this post, but interesting if algebras your thing.
Now we can in fact define a category for Falgebras. in such a category an object is α : F A → A
and each arrow is a triplet.
f : A → B
α : F A → A
β : F B → B
So that f ∘ α = β ∘ F f
. In picture form
F f
F A ———————————————–→ F B
 
 
 α  β
↓ ↓
A —————————————————→ B
f
commutes. I generally elide the fact that we’re dealing with triplets and instead focus on the arrow, since that’s the interesting bit.
Now that we’ve established Falgebras, we glance at one more thing. There’s one more concept we need, the notion of initial objects. An initial object is an… object, I
in a category so that for any object C
f
I         → C
So that f
is unique.
Now what we’re interested in investigating is the initial object in the category of Falgebras. That’d mean that
α
F I ————————————————–→ I
 

 F λ  λ

↓ ↓
F C —————————————————→ C
Commutes only for a unique λ.
What’s the problem?
Now, remembering that we’re actually trying to understand recursive types, how can we fit the two together? We can think of recursive types as solutions to certain equations. In fact, our types are what are called the least fixed point solutions. Let’s say we’re looking at IntList
. We can imagine it defined as
We can in fact, factor out the recursive call in Cons
and get
Now we can represent a list of length 3 as something like
Which is all well and good, but we really want arbitrary length list. We want a solution to the equation that
X = IntList X
We can view such a type as a set {EmptyList, OneList, TwoList, ThreeList ... }
. Now how can we actually go about saying this? Well we need to take a fixed point of the equation! This is easy enough in Haskell since Haskell’s type system is unsound.
 Somewhere, somehow, a domain theorist is crying.
data FixedPoint f = Fix {unfix :: f (FixedPoint f)}
Now we can regain our normal representation of lists with
To see how this works
out :: FixedPoint IntList > [Int]
out (Fix f) = case fmap out f of
Nil > []
Cons a b > a : b
in :: [Int] > FixedPoint IntList
in [] = Nil
in (x : xs) = Fix (Cons x (in xs))
Now this transformation is interesting for one reason in particular, IntList
is a functor. Because of this, we can formulate an Falgebra for IntList
.
Now we consider what the initial object in this category would be. It’d be something I
so that we have a function
cata :: Listalg a > (I > a)  Remember that I > a is an arrow in FAlg
cata :: (List a > a) > I > a
cata :: (Either () (a, Int) > a) > I > a
cata :: (() > a) > ((a, Int) > a) > I > a
cata :: a > (Int > a > a) > I > a
cata :: (Int > a > a) > a > I > a
Now that looks sort of familiar, what’s the type of foldr
again?
So the arrow we get from the initiality of I
is precisely the same as foldr
! This leads us to believe that maybe the initial object for Falgebras in Haskell is just the least fixed point, just as [Int]
is the least fixed point for IntList
.
To confirm this, let’s generalize a few of our definitions from before
type Alg f a = f a > a
data Fix f = Fix {unfix :: f (Fix f)}
type Init f = Alg f (Fix f)
cata :: Functor f => Alg f a > Fix f > a
cata f = f . fmap (cata f) . unfix
Exercise, draw out the reduction tree for cata
on lists
Our suspicion is confirmed, the fixed point of an functor is indeed the initial object. Further more, we can easily show that initial objects are unique up to isomorphism (exercise!) so anything that can implement cata
is isomorphic to the original, recursive definition we were interested in.
Now that we’ve gone and determined a potentially interesting fact about recursive types, how can we use this knowledge? Well let’s start with a few things, first is that we can define a truly generic fold function now:
This delegates all the messy details of how one actually thinks about handling the “shape” of the container we’re folding across by relegating it to the collapsing function f a > a
.
While this may seem like a small accomplishment, it does mean that we can build off it to create data type generic programs that can be fitted into our existing world.
For example, what about mutual recursion. Fold captures the notion of recurring across one list in a rather slick way, however, recurring over two in lockstep involves a call to zip and other fun and games. How can we capture this with cata
?
We’d imagine that the folding functions for such a scenario would have the type
From here we can build
Similarly we can build up oodles of combinators for dealing with folding all built on top of cata
!
That unfortunately sounds like a lot of work! We can shamelessly freeload of the hard work of others thanks to hackage though. In particular, the package recursionschemes
has built up a nice little library for dealing with initial algebras. There’s only one big twist between what we’ve laid out and what it does.
One of the bigger stumbling blocks for our library was changing the nice recursive definition of a type into the functorfied version. Really it’s not realistic to write all your types this way. To help simplify the process recursionschemes
provides a type family called Base
which takes a type and returns its functorfied version. We can imagine something like
This simplifies the process of actually using all these combinators we’re building. To use recursionschemes, all you need to is define such an instance and write project :: t > Base t t
. After that it’s all kittens and recursion.
So dear reader, where are we left? We’ve got a new interesting formulation of recursive types that yields some interesting results and power. There’s one interesting chunk we’ve neglected though: what does unfolding look like?
It turns out there’s a good story for this as well, unfolding is the operation (anamorphism) defined by a terminal object in a category. A terminal object is the precise dual of an initial one. You can notice this all in recursionschemes which features ana
as well as cata
.
It’s fairly well known that Haskell is a bit um.. different from how stock hardware sees the world. I’m not aware of too many processors that have decided that immutability and higher order functions are the right way to go.
Compiling Haskell and its ilk, however, does have one interesting wrinkle on top of the normal problem: laziness. Laziness stands completely at odds with how most everything else works. Moreover, whether or not you think it’s the right default, it’s an interesting question of how to efficiently compile some evaluation strategy other than call by value or name.
To this end, people have built a lot of abstract machines that lazy languages could target. These machines can be mapped easily to what the hardware wants and transitively, we can get our compiler. Most of these work by “graph reduction” (that’s the G in STG) and the latest incarnation of these graph machines is the spineless tagless graph machine which lies at the heart of GHC and a few other compilers.
In this post, I’d like to go over how exactly the STG machine actually works. Turns out it’s pretty cool!
The basic idea behind a compiler intent on going the STG route is something like
In GHC case I understand the pipeline is something like
We’re really concerned with parts 6 and 7 here. First things first, let’s lay out what’s exactly in the STG language. It’s a tiny functional language that looks a bit like Haskell or Core, with a few restrictions. A program is simply a series of bindings, much like Haskell. The top levels look something like
f = {x y z} flag {a b c} > ...
You should read this for now as f = \a b c > ...
. The first set of variables and the flag correspond to some stuff we’ll discuss later.
Inside the ...
we can write most of what you would expect from Haskell. We have let[rec] bindings, case expressions, application, constructors, literals, and primitives. There is a caveat though: first off all, constructor applications must be fully saturated. This isn’t unlike OCaml or something where you can’t just treat a constructor as a function with an arbitrary name. We would write
\a > Just a
instead of just Just
. Another bit of trickiness: our language has no lambdas! So we can’t even write the above. Instead if we had something like
map Just [1, 2, 3]
We’d have to write
let f = \a > Just a
l'' = 3 : nil
l' = 2 : l''
l = 1 : l'
in map f l
The reason for the awkward l''
series is that we’re only allowed to apply constructors and functions to atoms (literals and variables).
One other noteworthy feature of STG is that we have primitive operations. They need to be fully saturated, just like constructors, but they work across unboxed things. For example there would probably be something like +#
which adds to unboxed integers. To work with these we also have unboxed literals, 1#
, 2#
, so on and so on.
Now, despite all these limitations placed on STG, it’s still a pretty stinking high level language. There’s letrec, higher order functions, a lot of the normal stuff we’d expect in a functional language. This means it’s not actually to hard to compile something like Haskell or Core to STG (I didn’t say “compile efficiently”).
As an example, let’s look at translating factorial into STG language. We start with
f :: Int > Int
f i = case i of
0 > 1
i > i * (f (i  1))
Now the first step is we change the binding form
f = {} n {i} > ...
The case expressions clause can remain the same, we’re already casing on an atom
case i of
(MkInt# i#) > ...
Now comes the first big change, our boxed integers are going to get in the way here, so the case expression strips away the constructor leaving us with an unboxed integer. We can similarly refactor the body to make evaluation order explicit
case i of
MkInt i# >
case i# # 1# of
dec# >
let dec = \{dec#} u {} > MkInt dec#
in case fact dec of
MkInt rest# >
case i# * rest# of
result# > MkInt result#
Notice how the case
expressions here are used to make the evaluation of various expressions explicit and let
was used to create a new thing to evaluate.
Now we can see what those extra {}’s were for. They notate the free variables for a thunk. Remember how we can have all sorts of closures and it can make for some really nice code? Well the machine doesn’t exactly support those naively. What we need to do and note the variables that we close over explicitly and then generate code that will store these free variables with the value that closes over them. This pair is more or less what is called a “closure” for the rest of this post.
Actually, I’ll sometimes use “thunk” as another descriptor for this pair. This is because closures in STG land do quite a lot! In particular, they are used to represent the fundamental unit of lazy code, not just closing over variables. We’ll have closures that actually don’t close over anything! This would be a bit strange, but each “thunk” in Haskell land is going to become a closure in STGville. The notion of forcing a thunk in Haskell is analogous to evaluating an STG closure and creating a thunk is creating a new closure. This is helpful to keep in mind as we examine the rest of the machine.
dec
for example has a free variable dec#
and it exists to box that result for the recursive call to factorial. We use case
expressions to get evaluation. Most programs thus become chains of case
’s and let
alternating between creating thunks and actually doing work.
That u
in between the {}’s in dec
was also important. It’s the update flag. Remember how in Haskell we don’t want to force the same thunk twice. If I say
let i = 1 + 1 in i + i
We should only evaluate 1 + 1
once. That means that the thunk i
will have to be mutated to not evaluate 1 + 1
twice. The update flag signifies the difference between thunks that we want to update and thunks that we don’t. For example, if we replaced the thunk for +
with the first result it returned, we’d be mighty surprised. Suddenly 1 + 1 + 1
is just 2!
The u
flag says “yes, I’m just a normal expression that should be updated” and the n flag says the opposite.
That about wraps up our discussion of the STG language, let’s talk about how to implement it now.
This language wouldn’t be much good if it didn’t lend itself to an easy implementation, indeed we find that the restrictions we placed upon the language prove to be invaluable for its compilation (almost like they were designed that way!).
In order to decide how best to implement it, we first define the formal semantics for our language, which operates on a tuple of 6 things:
A code is more or less the current thing we’re attempting to do. It’s either
Eval e p
 evaluate an expression in an environment (p
)Enter a
 Enter a closureReturnCon c ws
 Return a constructor applied to some argumentsReturnInt
 Return an integerNow the idea is we’re going to “unroll” our computations into pushing things onto the continuation stack and entering closures. We start with the code Eval main {}
. That is to say, we start by running main
. Then if we’re looking at a case
we do something really clever
EVAL(case expr of {pat1 > expr1; ...}, p) as rs us h o
becomes
EVAL (expr, p) as ({pat1 > expr1; ...} : rs) us h o
That is to say, we just push the pattern matching on to the continuation stack and evaluate the expression.
At some point we’ll get to a “leaf” in our expression. That is random literal (a number) or constructor. At this point we make use of our continuation stack
EVAL (C ws, p) as ((...; c vs > expr; ...) : rs) us h o
ReturnCon (C ws) as ((...; c vs > expr; ...) : rs) us h o
EVAL (expr, p[vs > ws]) as rs us h o
So our pattern matching is rolled into ReturnCon
. ReturnCon
will just look on top of the return stack looking for a continuation which wants its constructor and evaluate its expression, mapping the constructor’s variables to the pattern’s variables.
The story is similar for literals
EVAL (Int i, p) as ((...; c vs > expr; ...) : rs) us h o
ReturnInt i as ((...; i > expr; ...) : rs) us h o
EVAL (expr, p) as rs us h o
Another phase is how we handle let’s and letrec’s. In this phase instead of dealing with continuations, we allocate more thunks onto the heap.
EVAL ((let x = {fs} f {xs} > e; ... in expr), p) as rs us h o
EVAL e p' as us h' o
So as we’d expect, evaluating a let expression does indeed go and evaluate the body of the let expression, but changes up the environment in which we evaluate them. We have
p' = p[x > Addr a, ...]
h' = h[a > ({fs} f {xs} > e) p fs, ...]
In words “the new environment contains a binding for x
to some address a
. The heap is extended with an address a
with a closure {fs} f {xs} > ...
where the free variables come from p
”. The definition for letrec is identical except the free variables come from p'
allowing for recursion.
So the STG machine allocates things in lets, adds continuations with case, and jumps to continuation on values.
Now we also have to figure out applications.
EVAL (f xs, p) as rs us h o
ENTER a (values of xs ++ as) rs us h o
where the value of f
is Addr a
. So we push all the arguments (remember they’re atoms and therefore trivial to evaluate) on to the argument stack and enter the closure of the function.
How do we actually enter a closure? Well we know that our closures are of the form
({fs} f {vs} > expr) frees
If we have enough arguments to run the closure (length vs > length of argument stack), then we can just EVAL expr [vs > take (length vs) as, fs > frees]
. This might not be the case in something like Haskell though, we have partial application. So what do we do in this case?
What we want is to somehow get something that’s our closure but also knows about however many arguments we actually supplied it. Something like
({fs ++ supplied} f {notSupplied} > expr) frees ++ as
where supplied ++ notSupplied = vs
. This updating of a closure is half of what our update stack us
is for. The other case is when we do actually enter the closure, but f = u
so we’re going to want to update it. If this is the case we add an update from to the stack (as, rs, a)
where as
is the argument stack, rs
is the return stack, and a
is the closure which should be updated. Once we’ve pushed this frame, we promptly empty the argument stack and return stack.
We then add the following rules to the definition of ReturnCon
ReturnCon c ws {} {} (as, rs, a) : us h o
ReturnCon c ws as rs us h' o
where h'
is the new heap that’s replaced our old closure at a
with our new, spiffy, updated closure
h' = h[a > ({vs} n {} > c vs) ws]
So that’s what happens when we go to update a closure. But what about partial application?
Enter a as {} (asU, rs, aU) : us h o
Enter a (as ++ asU) rs us h' o
where
h a = ({vs} n {xs} > expr) frees
h' = h [aU > ((vs ++ bound) n xs > e) (frees ++ as)]
This is a simplified rule from what’s actually used, but gives some intuition to what’s happening: we’re minting a new closure in which we use the arguments we’ve just bound and that’s what the result of our update is.
Now that we have some idea of how this is going to work, what does this actually become on the machine?
The original paper by SPJ suggests an “interpreter” approach to compilation. In other words, we actually almost directly map the semantics to C and call it compiled. There’s a catch though, we’d like to represent the body of closures as C functions since they’re well.. functions. However, since all we do is enter closures and jump around to things till the cows come home, it had damn well better be fast. C function calls aren’t built to be that fast. Instead the paper advocates a tiny trampoliningesque approach.
When something wants to enter a closure, it merely returns it and our main loop becomes
while(1){cont = (*cont)();}
Which won’t stackoverflow. In reality, more underhanded tricks are applied to make the performance suck less, but for we’ll ignore such things.
In our compiled results there will be 2 stacks, not the 3 found in our abstract machine. In the first stack (Astack) there are pointer things and the Bstack has nonpointers. This are monitored by two variables/registers SpA
and SpB
which keep track of the heights of the two stacks. Then compilation becomes reasonably straightforward.
An application pushes the arguments onto appropriate stacks, adjusts Sp*, and enters the function. A let block allocates each of the bound variables, then the body. Entering a closure simply jumps to the closure’s code pointer. This is actually quite nifty. All the work of figuring out exactly what Enter
will do (updates, continuation jiggering) is left to the closure itself.
A case expression is a bit more complicated since a continuation’s representation involves boxing up the local environment for each branch. Once that’s bundled away, we represent a continuation as a simple code pointer. It is in charge of scrutinizing the argument stack and selecting an alternative and then running the appropriate code. This is a lot of work, and, unless I’m crazy, we’ll need two types of bound variables for each branch (really just ptr/nonptr). The selection of an alternative would be represented as a C switch, letting all sorts of trickery with jump tables be done by the C compiler.
In order to return a value, we do something clever. We take a constructor and point a global variable at its constructor closure, containing its values and jump to the continuation. The continuation can then peek and poke at this global variable to bind things as needed for the alternatives. There is potentially a massive speedup by returning through registers, but this is dangerously close to work.
From here, primitive operations can be compiled to statements/instructions in whatever environment we’re targeting. In C for example we’d just use the normal +
to add our unboxed integers.
The last beast to slay is updates. We represent update frames by pointers to argument stacks and a pointer to a closure. That means that the act of updating is merely saving Sp*
in an update form, clobbering them, and then jumping into the appropriate closure. We push the update form onto stack B and keep on going.
I realize that this is a glancing overview and I’m eliding a lot of the tricky details, but hopefully this is sufficient to understand a bit about what’s going on at an intuitive level.
So now that you’ve put all the effort to get through this post, I get to tell you it’s all lies! In reality GHC has applied all manner of tricks and hacks to get fast performance out of the STG model. To be honest I’m not sure where I should point to that explains these tricks because well… I have no idea what they are.
I can point to
If you have any suggestions for other links I’d love to add them!
Thanks Chris Ganas for proof reading
comments powered by Disqus ]]>I’ve been spending a lot of time whacking my head on focusing literature. I’d like to jot down some intuition around what a focused system is and how it relates to the rest of the world. I’m going to steer clear of actually proving things but I will point out where a proof would be needed.
In a nutshell, focusing is a strategy to create proofs that minimizes the amount of choices available at each step. Focusing is thus amenable to mechanization since a computer is very good at applying a lot of deterministic procedures but incredibly bad at nondeterministic choice.
Now when we set out to define a focused system we usually do something like
At each of these steps there’s a proof that says something like “System 2 is sound and complete with respect to System 1”. We can then chain these proofs together to get that we can transform any nonfocused proof into a focused one (focalization) and the reverse (defocalization).
In order to actually carry out these proofs there’s a fair amount of work and pain. Usually we’ll need something like cut elimination and/or identity expansion.
Now before we go on to define an example logic, let’s notice a few things. First off, in sequent calculus there are left and right rules. Left rules decompose known facts into other known facts while right rules transform our goal. There’s also an identity sequent which more or less just states
A is an atom
—————————————
Γ, A → A
This is a bit boring though.
Now certain rules are invertible: their conclusion implies their premise in addition to the reverse. For example if I said you must prove A ∧ B
clearly we’ll have to prove both A
and B
in order to prove A ∧ B
; there’s no alternative set of rule applications that let us circumvent proving A
and B
.
This means that if we were mechanically trying to prove something of the form A ∧ B
we can immediately apply the right rule that decomposes ∧
into 2 goals.
We can these sort of rules invertible or asynchronous. Dually, there are rules that when applied transform our goal into something impossible to prove. Consider ⊥ ∨ ⊤
, clearly apply the rule that transforms this into ⊥
would be a bad idea!
Now if we begin classifying all the left and write rules we’ll notice that the tend to all into 2 categories
We dub the first group “positive” things and the second “negative” things. This is called polarization and isn’t strictly necessary but greatly simplifies a lot of our system.
Now there are a few things that could be considered both positive and negative. For example we can consider ∧
as positive with
Γ → A⁺ Γ → B⁺
———————————————
Γ → A⁺ ∧ B⁺
Γ, A⁺, B⁺ → C
—————————————————
Γ, A⁺ ∧ B⁺ → C
In this case, the key determiner for the polarity of ∧ comes from its subcomponents. We can just treat ∧ as positive along with its subcomponents and with an appropriate dual ∧⁻, our proof system will still be complete.
As a quick example, implication ⊃
is negative. the right rule
Γ, A → B
——————————
Γ → A ⊃ B
While its left rule isn’t
Γ, A ⊃ B → A Γ, B, A ⊃ B → C
——————————————————————————————
Γ, A ⊃ B → C
Since we could easily have something like ⊥ ⊃ ⊤
but using this rule would entail (heh) proving ⊥
! Urk. If our system applied this rules remorselessly, we’d quickly end up in a divergent proof search.
Do note that these typing rules are straight out of Rob Simmons’ paper, linked below
Now that we’ve actually seen some examples of invertible rules and polarized connectives, let’s see how this all fits into a coherent system. There is one critical change we must make to the structure of our judgments: an addition to the form _ → _
. Instead of just an unordered multiset on the left, in order to properly do inversion we change this to Γ; Ω ⊢ A
where Ω is an ordered list of propositions we intend to focus on.
Furthermore, since we’re dealing with a polarized calculus, we occasionally want to view positive things as negative and vice versa. For this we have shifts, ↓ and ↑. When we’re focusing on some proposition and we reach a shift, we pop out of the focused portion of our judgment.
Our system is broken up into 3 essentially separate judgments. In this judgment we basically apply as many invertible rules as many places as we can.
Γ, A⁻; Q ⊢ U
——————————————
Γ; ↓A⁻, Q ⊢ U
Γ; A⁺, Ω ⊢ U Γ; B+; Ω ⊢ U
———————————————————————————
Γ; A⁺ ∨ B⁺, Ω ⊢ U
Γ; A⁺, B⁺, Ω ⊢ U
————————————————————
Γ; A⁺ ∧ B⁺, Ω ⊢ U
——————————————
Γ; ⊥, Ω ⊢ U
We first look at how to break down Ω into simpler forms. The idea is that we’re going to keep going till there’s nothing left in Ω. Ω can only contain positive propositions so eventually we’ll decompose everything to shifts (which we move into Γ) ⊤+ (which we just drop on the floor) or ⊥ (which means we’re done). These are all invertible rules to we can safely apply them eagerly and we won’t change the provability of our goal.
Once we’ve moved everything out of Ω we can make a choice. If U
is “stable” meaning that we can’t break it down further easily, we can pick a something negative out of our context and focus on it
Γ; [A⁻] ⊢ U
————————————–
Γ, A⁻; • ⊢ U
This pops us into the next judgment in our calculus. However, if U is not stable, then we have to decompose it further as well.
Γ; • ⊢ A⁺
——————————————
Γ; • ⊢ ↑ A⁺
———————————
Γ; • ⊢ ⊤⁻
Γ; A⁺ ⊢ B⁻
—————————————
Γ; • ⊢ A⁺ ⊃ B⁻
Γ; • ⊢ A⁻ Γ; • ⊢ B⁻
—————————————————————
Γ; • ⊢ A⁻ ∧ B⁻
If we have a negative connective at the top level we can decompose that further, leaving us with a strictly smaller goal. Finally, we may reach a positive proposition with nothing in Ω. In this case we focus on the right.
Γ ⊢ [A⁺]
———————————
Γ; • ⊢ A⁺
Now we’re in a position to discuss these two focused judgments. If we focus on the right we decompose positive connectives
——————————
Γ ⊢ [⊤⁺]
Γ; • ⊢ A⁻
—————————
Γ ⊢ ↓ A⁻
Γ ⊢ [A⁺]
—————————————
Γ ⊢ [A⁺ ∨ B⁺]
Γ ⊢ [B⁺]
—————————————
Γ ⊢ [A⁺ ∨ B⁺]
Γ ⊢ [A⁺] Γ ⊢ [B⁺]
———————————————————
Γ ⊢ [A⁺ ∧ B⁺]
These judgments follow the ones we’ve already seen. If we encounter a shift, we stop focusing. Otherwise we decompose the topmost positive connective. Now looking at these, you should see that sometimes these rules we’ll lead us to a “mistake”. Imagine if we applied the 4th rule to ⊤ ∨ ⊥
! This is why these rules are segregated into a separate judgment.
In this judgment’s dual we essentially apply the exact same rules to the left of the turnstile and on negative connectives.
Γ; A⁺ ⊢ U
————————————
Γ; [↑A⁺] ⊢ U
Γ ⊢ [A⁺] Γ; [B⁻] ⊢ U
——————————————————————
Γ; [A⁺ ⊃ B⁻] ⊢ U
Γ; [A⁻] ⊢ U
—————————————————
Γ; [A⁻ ∧ B⁻] ⊢ U
Γ; [B⁻] ⊢ U
—————————————————
Γ; [A⁻ ∧ B⁻] ⊢ U
That wraps up our focused system. The idea is now we have this much more limited system which can express the same things our original, unfocused system could. A computer can be easily programmed to do a focused search since there’s much less backtracking everywhere leading to fewer rules being applicable at each step. I think Pfenning has referred to this as removing most of the “don’tcare” nondeterminism from our rules.
I’m going to wrap up the post here. Proving focalization or even something like cut elimination is quite fiddly and I have no desire at all to try to transcribe it (painfully) into markdown and get it wrong in the process.
Instead, now that you have some idea of what focusing is about, go read Rob Simmons’ paper. It provides a clear account of proving everything necessary prove a focused system is complete and sound with respect to its unfocused counterpart.
Cheers
comments powered by Disqus ]]>All though most people I talk to know me for my blog, I do occasionally actually write software instead of just talking about it :)
Sadly, as a mercurial user most of my stuff has languished with on bitbucket. I’ve had a few people tell me that this is annoying for various reasons. Yesterday, I finally got around to fixing that!
As of yesterday, all of my interesting projects are mirrored on [github][mygithub]. I’m still using mercurial but thanks to the lovely githg tool this is not an issue. You can fork, pullrequest, or generally peek and poke as you please. From my end all of these actions look like nice mercurial changesets so I can continue to live under my rock where I don’t need to understand Git.
As a quick list of what haskell code is up there now
Which I think includes every project I’ve blogged about here as well as a few others. Sorry it took so long!
comments powered by Disqus ]]>Lately I’ve been reading a lot of type theory literature. In effort to help my future self, I’m going to jot down a few thoughts on quotient types, the subject of some recent googlefu.
The problem quotient types are aimed at solving is actually a very common one. I’m sure at some point or another you’ve used a piece of data you’ve wanted to compare for equality. Additionally, that data properly needed some work to determine whether it was equal to another piece.
A simple example might would be representing rational numbers. A rational number is a fraction of two integers, so let’s just say
Now all is well, we can define a Num
instance and what not. But what about equality? Clearly we want equivalent fractions to be equal. That should mean that (2, 4) = (1, 2)
since they both represent the same number.
Now our implementation has a sticky point, clearly this isn’t the case on its own! What we really want to say is “(2, 4) = (1, 2)
up to trivial rejiggering”.
Haskell’s own Rational
type solves this by not exposing a raw tuple. It still exists under the hood, but we only expose smart constructors that will reduce our fractions as far as possible.
This is displeasing from a dependently typed setting however, we want to be able to formally prove the equality of some things. This “equality modulo normalization” leaves us with a choice. Either we can really provide a function which is essentially
This doesn’t really help us though, there’s no way to express that a
should be observationally equivalent to b
. This is a problem seemingly as old as dependent types: How can we have a simple representation of equality that captures all the structure we want and none that we don’t.
Hiding away the representation of rationals certainly buys us something, we can use a smart constructor to ensure things are normalized. From there we could potentially prove a (difficult) theorem which essentially states that
This still leaves us with some woes however, now a lot of computations become difficult to talk about since we’ve lost the helpful notion that denominator o mkRat a = id
and similar. The lack of transparency shifts a lot of the burden of proof onto the code privy to the internal representation of the type, the only place where we know enough to prove such things.
Really what we want to say is “Hey, just forget about a bit of the structure of this type and just consider things to be identical up to R
”. Where R
is some equivalence relation, eg
a R a
a R b
implies b R a
a R b
and b R c
implies a R c
If you’re a mathematician, this should sound similar. It’s a lot like how we can take a set and partition it into equivalence classes. This operation is sometimes called “quotienting a set”.
For our example above, we really mean that our rational is a type quotiented by the relation (a, b) R (c, d)
iff a * c = b * d
.
Some other things that could potentially use quotienting
Basically anything where we want to hide some of the implementation details that are irrelevant for their behavior.
Now that I’ve spent some time essentially waving my hand about quotient types what are they? Clearly we need a rule that goes something like
Γ ⊢ A type, E is an equivalence relation on A
———————————————–———————————————————————————————
Γ ⊢ A // E type
Along with the typing rule
Γ ⊢ a : A
——————————————————
Γ ⊢ a : A // E
So all members of the original type belong to the quotiented type, and finally
Γ ⊢ a : A, Γ ⊢ b : A, Γ ⊢ a E b
–——————————————–——————————————————
Γ ⊢ a ≡ b : A // E
Notice something important here, that ≡
is the fancy shmancy judgmental equality baked right into the language. This calls into question decidability. It seems that a E b
could involve some nontrivial proof terms.
More than that, in a constructive, proof relevant setting things can be a bit trickier than they seem. We can’t just define a quotient to be the same type with a different equivalence relation, since that would imply some icky things.
To illustrate this problem, imagine we have a predicate P
on a type A
where a E b
implies P a ⇔ P b
. If we just redefine the equivalence relation on quotes, P
would not be a wellformed predicate on A // E
, since a ≡ b : A // E
doesn’t mean that P a ≡ P b
. This would be unfortunate.
Clearly some subtler treatment of this is needed. To that end I found this paper discussing some of the handling of NuRPL’s quotients enlightening.
The paper I linked to is a discussion on how to think about quotients in terms of other type theory constructs. In order to do this we need a few things first.
The first thing to realize is that NuPRL’s type theory is different than what you are probably used to. We don’t have this single magical global equality. Instead, we define equality inductively across the type. This notion means that our equality judgment doesn’t have to be natural in the type it works across. It can do specific things at each case. Perhaps the most frequent is that we can have functional extensionality.
f = g ⇔ ∀ a. f a = g a
Okay, so now that we’ve tossed aside the notion of a single global equality, what else is new? Well something new is the lens through which many people look at NuRPL’s type theory: PER semantics. Remember that PER is a relationship satisfying
a R b → then b R a
a R b ∧ b R c → a R c
In other words, a PER is an equivalence relationship that isn’t necessarily reflexive at all points.
The idea is to view types not as some opaque “thingy” but instead to be partial equivalence relations across the set of untyped lambda calculus terms. Inductively defined equality falls right out of this idea since we can just define a ≡ b : A
to be equivalent to (a, b) ∈ A
.
Now another problem rears it head, what does a : A
mean? Well even though we’re dealing with PERs, but it’s quite reasonable to say something is a member of a type if it’s reflexive. That is to say each relation is a full equivalence relation for the things we call members of that type. So we can therefore define a : A
to be (a, a) ∈ A
.
Another important constraint, in order for a type family to be well formed, it needs to respect the equality of the type it maps across. In other words, for all B : A → Type
, we have (a, a') ∈ A' ⇒ (B a = B a') ∈ U
. This should seem on par with how we defined function equality and we call this “type functionality”.
Let’s all touch on another concept: squashed types. The idea is to take a type and throw away all information other than whether or not it’s occupied. There are two basic types of squashing, extensional or intensional. In the intensional we consider two squashed things equal if and only if the types they’re squashing are equal
A = B
————————————
[A] = [B]
Now we can also consider only the behavior of the squashed type, the extensional view. Since the only behavior of a squashed type is simply existing, our extensional squash type has the equivalence
∥A∥ ⇔ ∥B∥
————————–
∥A∥ = ∥B∥
Now aside from this, the introduction of these types are basically the same: if we can prove that a type is occupied, we can grab a squashed type. Similarly, when we eliminate a type all we get is the trivial occupant of the squashed type, called •.
Γ ⊢ A
———————
Γ ⊢ [A]
Γ, x : A, Δ[̱•] ⊢ C[̱•]
——————————————————————————
Γ, x : A, Δ[x] ⊢ C[x]
What’s interesting is that when proving an equality judgment, we can unsquash obth of these types. This is only because NuRPL’s equality proofs computationally trivial.
Now with all of that out of the way, I’d like to present two typing rules. First
Γ ⊢ A ≡ A'; Γ, x : A, y : A ⊢ E[x; y] = E'[x; y]; E and E' are PERS
————————————————————————————————————————————————————————————————————
Γ ⊢ A // E ≡ A' // E'
In English, two quotients are equal when the types and their quotienting relations are equal.
Γ, u : x ≡ y ∈ (A // E), v :
∥x E y∥, Δ[u] ⊢ C [u]
———————————————————————————————————————————————————–
Γ, u : x ≡ y ∈ (A // E), Δ[u] ⊢ C [u]
There are a few new things here. The first is that we have a new Δ [u]
thing. This is a result of dependent types, can have things in our context that depend on u
and so to indicate that we “split” the context, with Γ, u, Δ
and apply the depend part of the context Δ
to the variable it depends on u
.
Now the long and short of this is that when we’re of this is that when we’re trying to use an equivalence between two terms in a quotient, we only get the squashed term. This done mean that we only need to provide a squash to get equality in the first place though
Γ ⊢ ∥ x E y
∥; Γ ⊢ x : A; Γ ⊢ y : A
——————————————————————————————————–
Γ ⊢ x ≡ y : A // E
Remember that we can trivially form an ∥ A ∥
from A
’.
Now there’s just one thing left to talk about, using our quotiented types. To do this the paper outlines one primitive elimination rule and defines several others.
Γ, x : A, y : A, e : x E y, a : ND, Δ[ndₐ{x;y}] ⊢ C[ndₐ{x;y}]
——————————————————————————————————————————————————————————————–
Γ, x : A // E, Δ[x] ⊢ C[x]
ND
is a admittedly odd type that’s supposed to represent nondeterministic choice. It has two terms, tt
and ff
and they’re considered “equal” under ND
. However, nd
returns its first argument if it’s fed tt
and the second if it is fed ff
. Hence, nondeterminism.
Now in our rule we use this to indicate that if we’re eliminating some quotiented type we can get any value that’s considered equal under E
. We can only be assured that when we eliminate a quotiented type, it will be related by the equivalence relation to x
. This rule captures this notion by allowing us to randomly choose some y : A
so that x E y
.
Overall, this rule simply states that if C
is occupied for any term related to x
, then it is occupied for C[x]
.
As with my last post, here’s some questions for the curious reader to pursue
The last one is particularly interesting.
Thanks to Jon Sterling for proof reading
comments powered by Disqus ]]>I’m part of a paper reading club at CMU. Last week we talked about a classic paper, Abstract Types have Existential Type. The concept described in this paper is interesting and straightforward. Sadly some of the notions and comparisons made in the paper are starting to show their age. I thought it might be fun to give a tldr using Haskell.
The basic idea is that when we have an type with an abstract implementation some functions upon it, it’s really an existential type.
To exemplify this let’s define an abstract type (in Haskell)
module Stack (Stack, empty, push, pop) where
newtype Stack a = Stack [a]
empty :: Stack a
empty = Stack []
push :: a > Stack a > Stack a
push a (Stack xs) = Stack (a : xs)
pop :: Stack a > Maybe a
pop (Stack []) = Nothing
pop (Stack (x : xs)) = Just x
shift :: Stack a > Maybe (Stack a)
shift (Stack []) = Nothing
shift (Stack (x : xs)) = Just (Stack xs)
Now we could import this module and use its operations:
What we couldn’t do however, is pattern match on stacks to take advantage of its internal structure. We can only build new operations out of combinations of the exposed API. The classy terminology would be to say that Stack
is abstract.
This is all well and good, but what does it mean type theoretically? If we want to represent Haskell as a typed calculus it’d be a shame to have to include Haskell’s (under powered) module system to talk about abstract types.
After all, we’re not really thinking about modules as so much as hiding some details. That sounds like something our type system should be able to handle without having to rope in modules. By isolating the concept of abstraction in our type system, we might be able to more deeply understand and reason about code that uses abstract types.
This is in fact quite possible, let’s rephrase our definition of Stack
module Stack (Stack, StackOps(..), ops) where
newtype Stack a = Stack [a]
data StackOps a = StackOps { empty :: Stack a
, push :: a > Stack a > Stack a
, pop :: Stack a > Maybe a
, shift :: Stack a > Maybe (Stack a) }
ops :: StackOps
ops = ...
Now that we’ve lumped all of our operations into one record, our module is really only exports a type name, and a record of data. We could take a step further still,
module Stack (Stack, StackOps(..), ops) where
newtype Stack a = Stack [a]
data StackOps s a = StackOps { empty :: s a
, push :: a > s a > s a
, pop :: s a > Maybe a
, shift :: s a > Maybe (s a) }
ops :: StackOps Stack
ops = ...
Now the only thing that needs to know the internals of Stack
. It seems like we could really just smush the definition into ops
, why should the rest of the file see our private definition.
module Stack (StackOps(..), ops) where
data StackOps s a = StackOps { empty :: s a
, push :: a > s a > s a
, pop :: s a > Maybe a
, shift :: s a > Maybe (s a) }
ops :: StackOps ???
ops = ...
Now what should we fill in ???
with? It’s some type, but it’s meant to be chosen by the callee, not the caller. Does that sound familiar? Existential types to the rescue!
{# LANGUAGE PolyKinds, KindSignatures, ExistentialQuantification #}
module Stack where
data Packed (f :: k > k' > *) a = forall s. Pack (f s a)
data StackOps s a = StackOps { empty :: s a
, push :: a > s a > s a
, pop :: s a > Maybe a
, shift :: s a > Maybe (s a) }
ops :: Packed StackOps
ops = Pack ...
The key difference here is Packed
. It lets us take a type function and instantiate it with some type variable and hide our choice from the user. This means that we can even drop the whole newtype
from the implementation of ops
ops :: Packed StackOps
ops = Pack $ StackOps { empty = []
, push = (:)
, pop = fmap fst . uncons
, shift = fmap snd . uncons }
where uncons [] = Nothing
uncons (x : xs) = Just (x, xs)
Now that we’ve eliminated the Stack
definition from the top level, we can actually just drop the notion that this is in a separate module.
One thing that strikes me as unpleasant is how Packed
is defined, we must jump through some hoops to support StackOps
being polymorphic in two arguments, not just one.
We could get around this with higher rank polymorphism and making the fields more polymorphic while making the type less so. We could also just wish for type level lambdas or something. Even some of the recent type level lens stuff could be aimed at making a general case definition of Packed
.
From the client side this definition isn’t actually so unpleasant to use either.
{# LANGUAGE RecordWildCards #}
someAdds :: Packed Stack Int > Maybe Int
someAdds (Pack Stack{..}) = pop (push 1 empty)
With record wild cards, there’s very little boilerplate to introduce our record into scope. Now we might wonder about using a specific instance rather than abstracting over all possible instantiations.
someAdds :: Packed Stack Int > Maybe Int
someAdds =
let (Pack Stack{..}) = ops in
pop (push 1 empty)
The resulting error message is amusing :)
Now we might wonder if we gain anything concrete from this. Did all those language extensions actually do something useful?
Well one mechanical transformation we can make is that we can change our existential type into a CPSed higher rank type.
unpackPacked :: (forall s. f s a > r) > Packed f a > r
unpackPacked cont (Pack f) = cont f
someAdds' :: Stack s Int > Maybe Int
someAdds' Stack{..} = pop (push 1 empty)
someAdds :: Packed Stack Int > Maybe Int
someAdds = unpackPacked someAdds'
Now we’ve factored out the unpacking of existentials into a function called unpack
. This takes a continuation which is parametric in the existential variable, s
.
Now our body of someAdds
becomes someAdds
, but notice something very interesting here, now s
is a normal universally quantified type variable. This means we can apply some nice properties we already have used, eg parametricity.
This is a nice effect of translating things to core constructs, all the tools we already have figured out can suddenly be applied.
Now that we’ve gone through transforming our abstract types in existential ones you can final appreciate at least one more thing: the subtitle on Bob Harper’s blog. You can’t say you didn’t learn something useful :)
I wanted to keep this post short and sweet. In doing this I’m going to some of the more interesting questions we could ask. For the curious reader, I leave you with these
Packed
?Cheers.
comments powered by Disqus ]]>First, an apology. Sorry this has take so long to push out. I’ve just started my first semester at Carnegie Mellon. I fully intend to keep blogging, but it’s taken a little while to get my feet under me. Happy readings :)
In this second post of my “intro to dependent types” series we’re going on a whirlwind tour of Agda. Specifically we’re going to look at translating our fauxHaskell from the last post into honest to goodness typecheckable Agda.
There are 2 main reasons to go through the extra work of using a real language rather than pseudocode
With that in mind let’s dive in!
There’s quite a bit of shared syntax between Agda and Haskell, so a Haskeller can usually guess what’s going on.
In Agda we still give definitions in much the same way (single :
though)
where as in Haskell we’d say
In fact, we even get Haskell’s nice syntactic sugar for functions.
Will desugar to a lambda.
One big difference between Haskell and Agda is that, due to Agda’s more expressive type system, type inference is woefully undecidable. Those top level signatures are not optional sadly. Some DT language work a little harder than Agda when it comes to inference, but for a beginner this is a bit of a feature: you learn what the actual (somewhat scary) types are.
And of course, you always give type signatures in Haskell I’m sure :)
Like Haskell function application is whitespace and functions are curried
 We could explicitly add parens
 foo : A > (B > C)
foo : A > B > C
foo = ...
a : A
a = ...
bar : B > C
bar = foo a
Even the data type declarations should look familiar, they’re just like GADTs syntactically.
Notice that we have this new Set
thing lurking in our code. Set
is just the kind of normal types, like *
in Haskell. In Agda there’s actually an infinite tower of these Bool : Set : Set1 : Set2 ...
, but won’t concern ourselves with anything beyond Set
. It’s also worth noting that Agda doesn’t require any particular casing for constructors, traditionally they’re lower case.
Pattern matching in Agda is pretty much identical to Haskell. We can define something like
One big difference between Haskell and Agda is that pattern matching must be exhaustive. Nonexhaustiveness is a compiler error in Agda.
This brings me to another point worth mentioning. Remember that structural induction I mentioned the other day? Agda only allows recursion when the terms we recurse on are “smaller”.
In other words, all Agda functions are defined by structural induction. This together with the exhaustiveness restriction means that Agda programs are “total”. In other words all Agda programs reduce to a single value, they never crash or loop forever.
This can occasionally cause pain though since not all recursive functions are modelled nicely by structural induction! A classic example is merge sort. The issue is that in merge sort we want to say something like
mergeSort : List Nat > List Nat
mergeSort [] = []
mergeSort (x :: []) = x :: []
mergeSort xs = let (l, r) = split xs in
merge (mergeSort l, mergeSort r)
But wait, how would the typechecker know that l
and r
are strictly smaller than xs
? In fact, they might not be! We know that the length of length xs > 1
, but convincing the typechecker of that fact is a pain! In fact, without elaborate trickery, Agda will reject this definition.
So, apart from these restriction for totality Agda has pretty much been a stripped down Haskell. Let’s start seeing what Agda offers over Haskell.
There wouldn’t be much point in writing Agda if it didn’t have dependent types. In fact the two mechanisms that comprise our dependent types translate wonderfully into Agda.
First we had pi types, remember those?
Those translate almost precisely into Agda, where we’d write
The only difference is the colons! In fact, Agda’s pi types are far more general than what we’d discussed previously. The extra generality comes from what we allow A
to be. In our previous post, A
was always some normal type with the kind *
(Set
in Agda). In Agda though, we allow A
to be Set
itself. In Haskell syntax that would be something like
What could a
be then? Well anything with the kind *
is a type, like Bool
, ()
, or Nat
. So that a
is like a normal type variable in Haskell
In fact, when we generalize pi types like this, they generalize parametric polymorphism. This is kind of like how we use “big lambdas” in System F to write out polymorphism explicitly.
Here’s a definition for the identity function in Agda.
This is how we actually do all parametric polymorphism in Agda, as a specific use of pi types. This comes from the idea that types are also “first class”. We can pass them around and use them as arguments to functions, even dependent arguments :)
Now our other dependently typed mechanism was our generalized generalized algebraic data types. These also translate nicely to Agda.
We indicate that we’re going to index our data on something the same way we would in Haskell++, by adding it to the type signature on the top of the data declaration.
Agda’s GGADTs also allow us to us to add “parameters” instead of indices. These are things which the data type may use, but each constructor handles uniformly without inspecting it.
For example a list type depends on the type of it’s elements, but it doesn’t poke further at the type or value of those elements. They’re handled “parametrically”.
In Agda a list would be defined as
If your wondering what on earth the difference is, don’t worry! You’ve already in fact used parametric/nonparametric type arguments in Haskell. In Haskell a normal algebraic type can just take several type variables and can’t try to do clever things depending on what the argument is. For example, our definition of lists
can’t do something different if a
is Int
instead of Bool
or something like that. That’s not the case with GADTs though, there we can do clever things like
Now we’re not treating our type argument opaquely, we can figure things out about it depending on what constructor our value uses! That’s the core of the difference between parameters in indices in Agda.
Next let’s talk about modules. Agda’s prelude is absolutely tiny. By tiny I mean essentially nonexistant. Because of this I’m using the Agda standard library heavily and to import something in Agda we’d write
import Foo.Bar.Baz
This isn’t the same as a Haskell import though. By default, imports in Agda import a qualified name to use. To get a Haskell style import we’ll use the special shortcut
open import Foo.Bar
which is short for
import Foo.Bar
open Bar
Because Agda’s prelude is so tiny we’ll have to import things like booleans, numbers, and unit. These are all things defined in the standard library, not even the core language. Expect any Agda code we write to make heavy use of the standard library and begin with a lot of imports.
Finally, Agda’s names are somewhat.. unique. Agda and it’s standard library are unicode heavy, meaning that instead of unit we’d type ⊤ and instead of Void
we’d use ⊥. Which is pretty nifty, but it does take some getting used to. If you’re familiar with LaTeX, the Emacs mode for Agda allows LaTeX style entry. For example ⊥ can be entered as \bot
.
The most common unicode name we’ll use is ℕ. This is just the type of natural numbers as their defined in Data.Nat
.
Now that we’ve seen what dependent types look like in Agda, let’s go over a few examples of their use.
First let’s import a few things
Now we can define a few simple Agda functions just to get a feel for how that looks.
not : Bool > Bool
not true = false
not false = true
and : Bool > Bool > Bool
and true b = b
and false _ = false
or : Bool > Bool > Bool
or false b = b
or true _ = true
As you can see defining functions is mostly identical to Haskell, we just pattern match and the top level and go from there.
We can define recursive functions just like in Haskell
plus : ℕ > ℕ > ℕ
plus (suc n) m = suc (plus n m)
plus zero m = m
Now with Agda we can use our data types to encode “proofs” of sorts.
For example
data IsEven : ℕ > Set where
evenz : IsEven zero
evens : (n : Nat) > IsEven n > IsEven (suc (suc n))
Now this inductively defines what it means for a natural number to be even so that if Even n
exists then n
must be even. We can also state oddness
data IsOdd : ℕ > Set where
oddo : IsOdd (suc zero)
odds : (n : ℕ) > IsOdd n > IsOdd (suc (suc n))
Now we can construct a decision procedure which produces either a proof of evenness or oddness for all natural numbers.
open import Data.Sum  The same thing as Either in Haskell; ⊎ is just Either
evenOrOdd : (n : ℕ) > Odd n ⊎ Even n
So we’re setting out to construct a function that, given any n
, builds up an appropriate term showing it is either even or odd.
The first two cases of this function are kinda the base cases of this recurrence.
So if we’re given zero or one, return the base case of IsEven
or IsOdd
as appropriate. Notice that instead of Left
or Right
as constructors we have inj₁
and inj₂
. They serve exactly the same purpose, just with a shinier unicode name.
Now our next step would be to handle the case where we have
Our code is going to be like the Haskell code
case evenOrOdd n of
Left evenProof > Left (EvenS evenProof)
Right oddProof > Right (OddS oddProof)
In words, we’ll recurse and inspect the result, if we get an even proof we’ll build a bigger even proof and if we can an odd proof we’ll build a bigger odd proof.
In Agda we’ll use the with
keyword. This allows us to “extend” the current pattern matching by adding an expression to the list of expressions we’re pattern matching on.
evenOrOdd (suc (suc n)) with evenOrOdd n
evenOrOdd (suc (suc n))  inj₁ x = ?
evenOrOdd (suc (suc n))  inj₂ y = ?
Now we add our new expression to use for matching by saying ... with evenOrOdd n
. Then we list out the next set of possible patterns.
From here the rest of the function is quite straightforward.
evenOrOdd (suc (suc n))  inj₁ x = inj₁ (evens n x)
evenOrOdd (suc (suc n))  inj₂ y = inj₂ (odds n y)
Notice that we had to duplicate the whole evenOrOdd (suc (suc n))
bit of the match? It’s a bit tedious so Agda provides some sugar. If we replace that portion of the match with ...
Agda will just automatically reuse the pattern we had when we wrote with
.
Now our whole function looks like
evenOrOdd : (n : ℕ) > IsEven n ⊎ IsOdd n
evenOrOdd zero = inj₁ evenz
evenOrOdd (suc zero) = inj₂ oddo
evenOrOdd (suc (suc n)) with evenOrOdd n
...  inj₁ x = inj₁ (evens n x)
...  inj₂ y = inj₂ (odds n y)
How can we improve this? Well notice that that suc (suc n)
case involved unpacking our Either
and than immediately repacking it, this looks like something we can abstract over.
bimap : (A B C D : Set) > (A > C) > (B > D) > A ⊎ B > C ⊎ D
bimap A B C D f g (inj₁ x) = inj₁ (f x)
bimap A B C D f g (inj₂ y) = inj₂ (g y)
If we gave bimap
a more Haskellish siganture
One interesting point to notice is that the type arguments in the Agda function (A
and B
) also appeared in the normal argument pattern! This is because we’re using the normal pi type mechanism for parametric polymorphism, so we’ll actually end up explicitly passing and receiving the types we quantify over. This messed with me quite a bit when I first starting learning DT languages, take a moment and convince yourself that this makes sense.
Now that we have bimap
, we can use it to simplify our evenOrOdd
function.
evenOrOdd : (n : ℕ) > IsEven n ⊎ IsOdd n
evenOrOdd zero = inj₁ evenz
evenOrOdd (suc zero) = inj₂ oddo
evenOrOdd (suc (suc n)) =
bimap (IsEven n) (IsOdd n)
(IsEven (suc (suc n))) (IsOdd (suc (suc n)))
(evens n) (odds n) (evenOrOdd n)
We’ve gotten rid of the explicit with
, but at the cost of all those explicit type arguments! Those are both gross and obvious. Agda can clearly deduce what A
, B
, C
and D
should be from the arguments and what the return type must be. In fact, Agda provides a convenient mechanism for avoiding this boilerplate. If we simply insert _
in place of an argument, Agda will try to guess it from the information it has about the other arguments and contexts. Since these type arguments are so clear from context, Agda can guess them all
evenOrOdd : (n : ℕ) > IsEven n ⊎ IsOdd n
evenOrOdd zero = inj₁ evenz
evenOrOdd (suc zero) = inj₂ oddo
evenOrOdd (suc (suc n)) =
bimap _ _ _ _ (evens n) (odds n) (evenOrOdd n)
Now at least the code fits on one line! This also raises something interesting, the types are so strict that Agda can actually figure out parts of our programs for us! I’m not sure about you but at this point in time my brain mostly melted :) Because of this I’ll try to avoid using _
and other mechanisms for Agda writing programs for us where I can. The exception of course being situations like the above where it’s necessary for readabilities sake.
One important exception to that rule is for parameteric polymorphism. It’s a royal pain to pass around types explicitly everywhere. We’re going to use an Agda feature called “implicit arguments”. You should think of these as arguments for which the _
is inserted for it. So instead of writing
We could write
This more closely mimicks what Haskell does for its parametric polymorphism. To indicate we want something to be an implicit argument, we just wrap it in {}
instead of ()
. So for example, we could rewrite bimap
as
bimap : {A B C D : Set} > (A > C) > (B > D) > A ⊎ B > C ⊎ D
bimap f g (inj₁ x) = inj₁ (f x)
bimap f g (inj₂ y) = inj₂ (g y)
To avoid all those underscores.
Another simple function we’ll write is that if we can construct an IsOdd n
, we can build an IsEven (suc n)
.
Now this function has two arguments, a number and a term showing that that number is odd. To write this function we’ll actually recurse on the IsOdd
term.
oddSuc .1 oddo = evens zero evenz
oddSuc .(suc (suc n)) (odds n p) = evens (suc n) (oddSuc n p)
Now if we squint hard and ignore those .
terms, this looks much like we’d expect. We build the Even
starting from evens zero evenz
. From there we just recurse and talk on a evens
constructor to scale the IsEven
term up by two.
There’s a weird thing going on here though, those .
patterns. Those are a nifty little idea in Agda that pattern matching on one thing might force another term to be some value. If we know that our IsOdd n
is oddo
n
must be suc zero
. Anything else would just be completely incorrect. To notate these patterns Agda forces you to prefix them with .
. You should read .Y
as “because of X, this must be Y”.
This isn’t an optional choice though, as .
patterns may do several wonky things. The most notable is that they often use pattern variables nonlinearly, notice that n
appeared twice in our second pattern clause. Without the .
this would be very illegal.
As an exercise to the reader, try to write
That wraps up this post which came out much longer than I expected. We’ve now covered enough basics to actually discuss meaningful dependently typed programs. That’s right, we can finally kiss natural numbers good bye in the next post!
Next time we’ll cover writing a small program but interesting program and use dependent types to assure ourselves of it’s correctness.
As always, please comment with any questions :)
comments powered by Disqus ]]>I’d like to start another series of blog posts. This time on something that I’ve wanted to write about for a while, dependent types.
There’s a noticeable lack of accessible materials introducing dependent types at a high level aimed at functional programmers. That’s what this series sets out help fill. Therefore, if you’re a Haskell programmer and don’t understand something, it’s a bug! Please comment so I can help make this a more useful resource for you :)
There are four parts to this series, each answering one question
So first things first, what are dependent types? Most people by now have heard the unhelpful quick answer
A dependent type is a type that depends on a value, not just other types.
But that’s not helpful! What does this actually look like? To try to understand this we’re going to write some Haskell code that pushes us as close as we can get to dependent types in Haskell.
Let’s start with the flurry of extensions we need
{# LANGUAGE DataKinds #}
{# LANGUAGE KindSignatures #}
{# LANGUAGE GADTs #}
{# LANGUAGE TypeFamilies #}
{# LANGUAGE UndecidableInstances #}
Now our first definition is a standard formulation of natural numbers
Here Z
represents 0 and S
means + 1
. So you should read S Z
as 1, S (S Z)
as 2 and so on and so on.
If you’re having some trouble, this function to convert an Int
to a Nat
might help
We can use this definition to formulate addition
This definition proceeds by “structural induction”. That’s a scary word that pops up around dependent types. It’s not all that complicated, all that it means is that we use recursion only on strictly smaller terms.
There is a way to formally define smaller, if a term is a constructor applied to several (recursive) arguments. Any argument to the constructor is strictly smaller than the original terms. In a strict language if we restrict ourselves to only structural recursion we’re guaranteed that our function will terminate. This isn’t quite the case in Haskell since we have infinite structures.
toInt :: Nat > Int
toInt (S n) = 1 + toInt n
toInt Z = 0
bigNumber = S bigNumber
main = print (toInt bigNumber)  Uh oh!
Often people will cheerfully ignore this part of Haskell when talking about reasoning with Haskell and I’ll stick to that tradition (for now).
Now back to the matter at hand. Since our definition of Nat
is quite straightforward, it get’s promoted to the kind level by DataKinds
.
Now we can “reflect” values back up to this new kind with a second GADTed definition of natural numbers.
Now, let’s precisely specify the somewhat handwavy term “reflection”. I’m using it in the imprecise sense meaning that we’ve lifted a value into something isomorphic at the type level. Later we’ll talk about reflection precisely mean lifting a value into the type level. That’s currently not possible since we can’t have values in our types!
What on earth could that be useful for? Well with this we can do something fancy with the definition of addition.
Now we’ve reflected our definition of addition to the type family. More than that, what we’ve written above is fairly obviously correct. We can now force our value level definition of addition to respect this type family
Now if we messed up this definition we’d get a type error!
plus' :: RNat n > RNat m > RNat (Plus n m)
plus' RZ n = n
plus' (RS n) m = plus' n m  Unification error! n ~ S n
Super! We know have types that express strict guarantees about our program. But how useable is this?
To put it to the test, let’s try to write some code that reads to integers for standard input and prints their sum.
We can easily do this with our normal plus
Easy as pie! But what about RNat
, how can we convert a Nat
to an RNat
? Well we could try something with type classes I guess
class Reify a where
type N
reify :: a > RNat N
But wait, that doesn’t work since we can only have once instance for all Nat
s. What if we did the opposite
class Reify (n :: Nat) where
nat :: RNat n > Nat
This let’s us go in the other direction.. but that doesn’t help us! In fact there’s no obvious way to propagate runtime values back into the types. We’re stuck.
Now, if we could add some magical extension to GHC could we write something like above program? Yes of course! The key idea is to not reflect up our types with data kinds, but rather just allow the values to exist in the types on their own.
For these I propose two basic ideas
For our special function types, we allow the return type to use the supplied value. These are called pi types. We’ll give this the following syntax
(x :: A) > B x
Where A :: *
and B :: A > *
are some sort of type. Notice that that A
in B
’s kind isn’t the data kind promoted version, but just the goodness to honest normal value.
Now in order to allow B
to actually make use of it’s supplied value, our second idea let’s normal types be indexed on values! Just like how GADTs can be indexed on types. We’ll call these GGADTs.
So let’s define a new version of RNat
This looks exactly like what we had before, but our semantics are different now. Those Z
’s and S
’s are meant to represent actual values, not members of some kind. There’s no promoting types to singleton kinds anymore, just plain old values being held in fancier types.
Because we can depend on normal values, we don’t even have to use our simple custom natural numbers.
Notice that we allowed our types to call functions, like +
. This can potentially be undecidable, something that we’ll address later.
Now we can write our function with a combination of these two ideas
Notice how we used pi types to change the return type dependent on the input value. Now we can feed this any old value, including ones we read from standard input.
Now, one might wonder how the typechecker could possibly know how to handle such things, after all how could it know what’ll be read from stdin!
The answer is that it doesn’t. When a value is reflected to the type level we can’t do anything with it. For example, if we had a type like
Then we would have to pattern match on n
at the value level to propagate information about n
back to the type level.
If we did something like
Then the typechecker would see that we’re matching on n
, so if we get into the 0 > ...
branch then n
must be 0
. It can then reduce the return type to if 0 == 0 then Bool else ()
and finally Bool
. A very important thing to note here is that the typechecker doesn’t evaluate the program. It’s examining the function in isolation of all other values. This means we sometimes have to hold its hand to ensure that it can figure out that all branches have the correct type.
This means that when we use pi types we often have to pattern match on our arguments in order to help the typechecker figure out what’s going on.
To make this clear, let’s play the typechecker for this function. I’m reverting to the Nat
type since it’s nicer for pattern matching.
toRNat :: (n :: Nat) > RNat n
toRNat Z = RZ  We know that n is `Z` in this branch
toRNat (S n) = RS (toRNat n { This has the type RNat n' })
p :: (n :: Nat) > (m :: Int) > RNat (plus n m)
p Z m = toRNat m
p (S n) m = RS (toRNat n m)
First the type checker goes through toRNat
.
In the first branch we have n
equals Z
, so RZ
trivially typechecks. Next we have the case S n
.
toRNat n
has the type RNat n'
by inductionS n' = n
.RS
builds us a term of type RNat n
.Now for p
. We start in much the same manner.
if we enter the p Z m
case
n
is Z
.plus n m
since plus Z m
is by definition equal to m
Look at the definition of plus
to confirm this).RNat m
easily since we have a function toRNat :: (n :: Nat) > RNat n
.m
and the resulting term has the type RNat m
.In the RS
case we know that we’re trying to produce a term of type RNat (plus (S n) m)
.
plus
, we can reduce plus (S n) m
to S (plus n m)
by the definition of plus
.plus n m
and that’s as simple as a recursive call.RS
to give us S (plus n m)
S (plus n m)
is equal to plus (S n) m
Notice how as we stepped through this as the typechecker we never needed to do any arbitrary reductions. We only ever reduce definitions when we have the outer constructor (WHNF) of one of the arguments.
While I’m not actually proposing adding {# LANGUAGE PiTypes #}
to GHC, it’s clear that with only a few orthogonal editions to system F we can get some seriously cool types.
Believe or not we’ve just gone through two of the most central concepts in dependent types
Not so bad was it? :) From here we’ll look in the next post how to translate our faux Haskell into actual Agda code. From there we’ll go through a few more detailed examples of pi types and GGADTs by poking through some of the Agda standard library.
Thanks for reading, I must run since I’m late for class. It’s an FP class ironically enough.
comments powered by Disqus ]]>Equality seems like one of the simplest things to talk about in a theorem prover. After all, the notion of equality is something any small child can intuitively grasp. The sad bit is, while it’s quite easy to handwave about, how equality is formalized seems to be a rather complex topic.
In this post I’m going to attempt to cover a few of the main different means of “equality proofs” or identity types and the surrounding concepts. I’m opting for a slightly more informal approach in the hopes of covering more ground.
This is not really an equality type per say, but it’s worth stating explicitly what definitional equality is since I must refer to it several times throughout this post.
Two terms A
and B
are definitional equal is a judgment notated
Γ ⊢ A ≡ B
This is not a user level proof but rather a primitive, untyped judgment in the metatheory of the language itself. The typing rules of the language will likely include a rule along the lines of
Γ ⊢ A ≡ B, Γ ⊢ x : A
————————————————————–
Γ ⊢ x : B
So this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker.
Now in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y
if x
is y
only by definition but also by computation.
So in Coq for example
(2 + 2) ≡ 4
Even though “definitionally” these are entirely separate entities. In most theories, definitionally equal means “inlining all definitions and with normalization”, but not all.
In type theories that distinguish between the two, the judgment that when normalized x
is y
is called judgmental equality. I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts.
This is the sort of equality that we’ll spend the rest of our time discussing. Propositional equality is a particular type constructor with the type/kind
Id : (A : Set) → A → A → Type
We should be able to prove a number of definitions like
reflexivity : (A : Set)(x : A) → Id x x
symmetry : (A : Set)(x y : A) → Id x y → Id y x
transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z
This is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about.
One very important difference is that we can make proofs like
sanity : Id 1 2 → ⊥
Since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!
This is arguably the simplest form of equality. Identity types are just normal inductive types with normal induction principles. The most common is equality given by Martin Lof
data Id (A : Set) : A → A → Type where
Refl : (x : A) → Id x x
This yields a simple induction principle
idind : (P : (x y : A) → Id x y → Type)
→ ((x : A) → P x x (Refl x))
→ (x y : A)(p : Id x y) → P x y p
In other words, if we can prove that P
holds for the reflexivity case, than P
holds for any x
and y
where Id x y
.
We can actually phrase Id
in a number of ways, including
data Id (A : Set)(x : A) : A → Set where
Refl : Id x x
This really makes a difference in the resulting induction principle
j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
→ P x Refl
→ (y : A)(p : Id x y) → P y p
This clearly turned out a bit differently! In particular now P
is only parametrized over one value of A
, y
. This particular elimination is traditionally named j
.
These alternative phrasings can have serious impacts on proofs that use them. It also has even more subtle effects on things like heterogeneous equality which we’ll discuss later.
The fact that this only relies on simple inductive principles is also a win for typechecking. Equality/substitution fall straight out of how normal inductive types are handled! This also means that we can keep decidability within reason.
The price we pay of course is that this is much more painful to work with. An intensional identity type means the burden of constructing our equality proofs falls on users. Furthermore, we lose the ability to talk about observational equality.
Observational equality is the idea that two “thingies” are indistinguishable by any test.
It’s clear that we can prove that if Id x y
, then f x = f y
, but it’s less clear how to go the other way and prove something like
fun_ext : (A B : Set)(f g : A → B)
→ ((x : A) → Id (f x) (g x)) → Id f g
fun_ext f g p = ??
Even though this is clearly desirable. If we know that f
and g
behave exactly the same way, we’d like our equality to be able to state that. However, we don’t know that f
and g
are constructed the same way, making this impossible to prove.
This can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following
Some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory.
In the type theory that we’ve outlined, this isn’t expressible sadly.
Some type theories go a different route to equality, giving us back the extensionality in the process. One of those type theories is extensional type theory.
In the simplest formulation, we have intensional type theory with a new rule, reflection
Γ ⊢ p : Id x y
——————————–————
Γ ⊢ x ≡ y
This means that our normal propositional equality can be shoved back into the more magical definitional equality. This gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types!
It isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. For example Martin Lof’s system is undecidable with extensional equality.
It’s worth noting that no extensional type theory is implemented this way. Instead they’ve taken a different approach to defining types themselves!
In this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms.
These PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. So a : T
is synonymous with (a, a) ∈ T
. Notice that since we are dealing with a PER, we know that ∀ a. (a, a) ∈ T
need not hold. This is reassuring, otherwise we’d be able to prove that every type was inhabited by every term!
The actual NuRPL&friends theory is a little more complicated than that. It’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea.
This is another flavor of extensional type theory which is really just intensional type theory plus some axioms.
We can arrive at this type theory in a number of ways, the simplest is to add axiom K
k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
→ P x (Refl x) → (p : Id x x) → P x p
This says that if we can prove that for any property P
, P x (Refl x)
holds, then it holds for any proof that Id x x
. This is subtly different than straightforward induction on Id
because here we’re not proving that a property parameterized over two different values of A
, but only one.
This is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id
as much meaning.
Using k
we can prove that for any p q : Id x y
, then Id p q
. In Agda notation
prop : (A : Set)(x y : A)(p q : x ≡ y)
→ p ≡ q
prop A x .x refl q = k A P (λ _ → refl) x q
where P : (x : A) → x ≡ x → Set
P _ p = refl ≡ p
This can be further refined to show that that we can eliminate all proofs that Id x x
are Refl x
rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
rec A P x .x p refl = p
recreflisuseless : (A : Set)(P : A → Set)(x : A)
→ (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq
recreflisuseless A P x p eq with prop A x x eq refl
recreflisuseless A P x p .refl  refl = refl
This form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. However, with recreflis–useless
we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it.
We essentially have normal propositional equality, but with the knowledge that things can only be equal in 1 way, up to propositional equality!
The next form of equality we’ll talk about is slightly different than previous ones. Heterogeneous equality is designed to coexist in some other type theory and supplement the existing form of equality.
Heterogeneous equality is most commonly defined with John Major equality
This is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal.
Now remember how earlier I’d mentioned that how we phrase these inductive equality types can have a huge impact? We’ll here we can see that because the above definition doesn’t typecheck in Agda!
That’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. We can however, cleverly phrase JMeq
so to avoid this
Now the constructor avoids quantifying over Set
and therefore fits inside the same universe as A
and B
.
JMeq
is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof.
reflect : (A : Set)(x y : A) → JMeq x y → Id x y
This reflection doesn’t look necessary, but arises for similar reasons that dictate that k
is unprovable.
It looks like this heterogeneous equality is a lot more trouble than it’s worth at first. It really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove.
If you’re looking for a concrete example, look no further than Observational Equality Now!. This paper gives allows observational equality to be jammed into a principally intensional system!
So this has been a whirlwind tour through a lot of different type theories. I partially wrote this to gather some of this information in one (free) place. If there’s something here missing that you’d like to see added, feel free to comment or email me.
Thanks to Jon Sterling for proof reading and many subtle corrections :)
comments powered by Disqus ]]>I’m going to a take a quick break from arguing with people on the internet to talk about a common point of confusion with theorem provers.
People will often state things like “A program in Coq never diverges” or that “we must prove that X halts”. To an outsider, that sounds impossible! After all, isn’t the halting problem undecidable?
Now the thing to realize is that while yes the halting problem is undecidable, we’re not solving it. The halting problem essentially states
For an arbitrary turing machine P. There is no algorithm guaranteed to terminate that will return true if P halts and false if it diverges.
In theorem provers, we cleverly avoid this road block with two simple tricks. I’m going to discuss these in the context of Coq but these ideas generalize between most theorem provers.
A program in Coq must halt. To do otherwise would introduce a logical inconsistency. So to enforce this we need to statically decide whether some program halts.
We just said that this is impossible though! To escape this paradox Coq opts for a simple idea: reject good programs.
Rather than guaranteeing to return true for every good program, we state that we’ll definitely reject all bad programs and then some.
For example, this termination checker would be logically consistent
It’d be useless of course, but consistent. Coq therefore accepts a certain set of programs which are known to terminate. For example, ones that limit themselves only to guarded coinduction or structural induction.
While it may be impossible to decide the termination of an arbitrary program, it’s certainly possible to prove the termination of a specific program.
When Coq’s heuristics fail, we can always resort to manually proving that our code will terminate. This may not be pleasant, but it’s certainly doable. By lifting the burden of Coq, we go from “constructing arbitrary proof of termination” to “checking arbitrary proof of termination”, which is decidable.
In Coq we can do this will well founded recursion. Simply put, well founded recursion means that we shift from using only term “size” to decide what’s a smaller recursive call to any nice binary relation. If you’re not interested in Coq specifically, you can check out your preferred proof assistants formalization of well founded recursion.
To this end, we define a relation for some type A : Set
, R : A > A > Prop
. Read R x y
as x
is smaller than y
.
Now we must show that this relation preserves some definition of “sanity”. This should mean that if when a function receives x
, for any y
so that R y x
, we should be able to recurse on y. This should also mean that there’s no infinite stack of terms so that R x y
, R z x
, R w z
…. because this would mean we could recurse infinitely. To capture this idea, we must prove well_founded A R
. What’s this “well founded” thing you say?
Well it’s just
Definition well_founded A R := forall a : A, Acc R a
This Acc
thing means “accessible”,
Inductive Acc (A : Type) (R : A > A > Prop) (x : A) : Prop :=
Acc_intro : (forall y : A, R y x > Acc R y) > Acc R x
So something is accessible in R
if everything less than it is also accessible.
We can easily prove that if R
is well_founded
there is no infinite chain that could lead us to infinite recursion.
Section founded.
Variable A : Set.
Variable R : A > A > Prop.
Variable well_founded : well_founded R.
CoInductive stream :=
 Cons : A > stream > stream.
CoInductive tower_of_bad : stream > Prop :=
OnTop : forall x y rest,
R y x >
tower_of_bad (Cons y rest) >
tower_of_bad (Cons x (Cons y rest)).
Lemma never_on_top :
forall x, forall rest, ~ tower_of_bad (Cons x rest).
intro; induction (well_founded x); inversion 1; try subst;
match goal with
[H : context[~ _]  _ ] => eapply H; eauto
end.
Qed.
Theorem no_chains :
forall xs, ~ tower_of_bad xs.
destruct 1; eapply never_on_top; eauto.
Qed.
End founded.
We’re using a powerful trick in never_on_top
, we’re inducting upon Acc
! This is the key to using well founded recursion. By inducting upon the Acc
instead of one of the terms of our function, we can easily recurse on any subterm y
, if R y x
.
This is handed to us by the lovely Fix
(uppercase).
Fix : well_founded R >
forall P : A > Type,
(forall x : A, (forall y : A, R y x > P y) > P x) >
forall x : A, P x
So Fix
is the better, cooler version of structural recursion that we were after. It lets us recurse on any y
where R y x
.
So in some sense, you can view Coq’s Fixpoint as just a specialization of Fix
where R x y
means that x
is a subterm of y
.
So in conclusion, theorem provers don’t do the impossible. Rather they have a small battery of tricks to cheat the impossible general case and simplify common cases.
Back to the internet I go.
comments powered by Disqus ]]>I’ve written a few times about church representations, but never aimed at someone who’d never heard of what a church representation is. In fact, it doesn’t really seem like too many people have!
In this post I’d like to fix that :)
Simply put, a church representation (CR) is a way of representing a piece of concrete data with a function. The CR can be used through an identical way to the concrete data, but it’s comprised entirely of functions.
They where originally described by Alanzo Church as a way of modeling all data in lambda calculus, where all we have is functions.
The simplest CR I’ve found is that of a tuples.
Let’s first look at our basic tuple API
Now this is trivially implemented with (,)
The church representation preserves the interface, but changes all the underlying implementations.
There’s our church pair, notice that it’s only comprised of >
. It also makes use of higher rank types. This means that a Tuple a b
can be applied to function producing any c
and it must return something of that type.
Let’s look at how the rest of our API is implemented
And that’s it!
It’s helpful to step through some reductions here
And for snd
snd (mkTuple True False)
fst (\f > f True False)
(\f > f True False) (\_ b > b)
(\_ b > b) True false
False
So we can see that these are clearly morally equivalent. The only real question here is whether, for each CR tuple there exists a normal tuple. This isn’t immediately apparent since the function type for the CR looks a lot more general. In fact, the key to this proof lies in the forall c
part, this extra polymorphism let’s us use a powerful technique called “parametricity” to prove that they’re equivalent.
I won’t actually go into such a proof now since it’s not entirely relevant, but it’s worth noting that both (,)
and Tuple
are completely isomorphic.
To convert between them is pretty straightforward
isoL :: Tuple a b > (a, b)
isoL tup = tup (,)
isoR :: (a, b) > Tuple a b
isoR (a, b) = \f > f a b
Now that we have an idea of how to church representations “work” let’s go through a few more examples to start to see a pattern.
Booleans have the simplest API of all
We can build all other boolean operations on test
This API is quite simple to implement with Bool
,
But how could we represent this with functions? The answer stems from test
,
Clever readers will notice this is almost identical to test
, a boolean get’s two arguments and returns one or the other.
We can write an isomorphism between Bool
and Boolean
as well
isoL :: Bool > Boolean
isoL b = if b then true else false
isoR :: Boolean > Bool
isoR b = test b True False
Now let’s talk about lists. One of the interesting things is lists are the first recursive data type we’ve dealt with so far.
Defining the API for lists isn’t entirely clear either. We want a small set of functions that can easily cover any conceivable operations for a list.
The simplest way to do this is to realize that we can do exactly 3 things with lists.
We can represent this with 3 functions
type List a = ...
nil :: List a
cons :: a > List a > List a
match :: List a > b > (a > List a > b) > b
If match
looks confusing just remember that
Is really the same as
In this way match
is just the pure functional version of pattern matching. We can actually simplify the API by realizing that rather than this awkward match
construct, we can use something cleaner.
foldr
forms a much more pleasant API to work with since it’s really the most primitive form of “recursing” on a list.
match :: List a > (a > List a > b) > b > b
match list f b = fst $ foldr list worker (b, nil)
where worker x (b, xs) = (f x xs, cons x xs)
The especially nice thing about foldr
is that it doesn’t mention List a
in its two “destruction” functions, all the recursion is handled in the implementation.
We can implement CR lists trivially using foldr
type List a = forall b. (a > b > b) > b > b
nil = \ _ nil > nil
cons x xs = \ cons nil > x `cons` xs cons nil
foldr list cons nil = list cons nil
Notice that we handle the recursion in the list type by having a b
as an argument? This is similar to how the accumulator to foldr
gets the processed tail of the list. This is a common technique for handling recursion in our church representations.
Last but not least, the isomorphism arises from foldr (:) []
,
The last case that we’ll look at is Either
. Like Pair
, Either
has 3 different operations.
This is pretty easy to implement with Either
Once again, the trick to encoding this as a function falls right out of the API. In this case we use the type of or
Last but not least, let’s quickly rattle off our isomorphism.
So now we can talk about the underlying pattern in CRs. First remember that for any type T
, we have a list of n
distinct constructors T1
T2
T3
…Tn
. Each of the constructors has a m
fields T11
, T12
, T13
…
Now the church representation of such a type T
is
forall c. (T11 > T12 > T13 > .. > c)
> (T21 > T22 > T23 > .. > c)
...
> (Tn1 > Tn2 > Tn3 > .. > c)
> c
This pattern doesn’t map quite as nicely to recursive types. Here we have to take the extra step of substituting all occurrences of T
for c
in our resulting church representation.
This is actually such a pleasant pattern to work with that I’ve written a library for automatically reifying a type between its church representation and concrete form.
Hopefully you now understand what a church representation is. It’s worth noting that a lot of stuff Haskellers stumble upon daily are really church representations in disguise.
My favorite example is maybe
, this function takes a success and failure continuation with a Maybe
and produces a value. With a little bit of imagination, one can realize that this is really just a function mapping a Maybe
to a church representation!
If you’re thinking that CRs are pretty cool! Now might be a time to take a look at one of my previous posts on deriving them automagically.
comments powered by Disqus ]]>I had a few people tell me after my last post that they would enjoy a write up on reading extensibleeffects so here goes.
I’m going to document my process of reading through and understanding how extensibleeffects is implemented. Since this is a fairly large library (about 1k) of code, we’re not going over all of it. Rather we’re just reviewing the core modules and enough of the extra ones to get a sense for how everything is implemented.
If you’re curious or still have questions, the modules that we don’t cover should serve as a nice place for further exploration.
extensibleeffects comes with quite a few modules, my find query reveals
$ find src name "*.hs"
src/Data/OpenUnion1.hs
src/Control/Eff/Reader/Strict.hs
src/Control/Eff/Reader/Lazy.hs
src/Control/Eff/Fresh.hs
src/Control/Eff/Cut.hs
src/Control/Eff/Exception.hs
src/Control/Eff/State/Strict.hs
src/Control/Eff/State/Lazy.hs
src/Control/Eff/Writer/Strict.hs
src/Control/Eff/Writer/Lazy.hs
src/Control/Eff/Coroutine.hs
src/Control/Eff/Trace.hs
src/Control/Eff/Choose.hs
src/Control/Eff/Lift.hs
src/Control/Eff.hs
src/Control/Eff/Reader/Strict.hs
Whew! Well I’m going to take a leap and assume that extensibleeffects is similar to the mtl in the sense that there are a few core modules, an then a bunch of “utility” modules. So there’s Control.Monad.Trans
and then Control.Monad.State
and a bunch of other implementations of MonadTrans
.
If we assume extensibleeffects is formatted like this, then we need to look at
And maybe a few other modules to get a feel for how to use these two. I’ve added Data.OpenUnion1
because it’s imported by Control.Monad.Eff
so is presumably important.
Since Data.OpenUnion1
is at the top of our dependency DAG, we’ll start with it.
So we’re starting with Data.OpenUnion1. If the authors of this code have stuck to normal Haskell naming conventions, that’s an open union of type constructors, stuff with the kind * > *
.
Happily, this module has an export list so we can at least see what’s public.
module Data.OpenUnion1( Union (..)
, SetMember
, Member
, (:>)
, inj
, prj
, prjForce
, decomp
, unsafeReUnion
) where
So we’re looking at a data type Union
, which we export everything for. Two type classes SetMember
and Member
, a type operator :>
, and a handful of functions, most likely to work with Union
.
So let’s figure out exactly what this union thing is
data Union r v = forall t. (Functor t, Typeable1 t) => Union (t v)
So Union r v
is just a wrapper around some of functor applied to v
. This seems a little odd, what’s this r
thing? The docs hint that Member t r
should always hold.
Member
is a type class of two parameters with no members. In fact, grep
ing the entire source reveals that the entire definition and instances for Member
in this code base is
infixr 1 :>
data ((a :: * > *) :> b)
class Member t r
instance Member t (t :> r)
instance Member t r => Member t (t' :> r)
So this makes it a bit clearer, :>
acts like a type level cons and Member
just checks for membership!
Now Union
makes a bit more sense, especially in light of the inj
function
So Union
takes some t
in r
and hides it away in an existential applied to v
. Now this is kinda like having a great nested bunch of Either
s with every t
applied to v
.
Dual to inj
, we can define a projection from a Union
to some t
in r
. This will need to return something wrapped in Maybe
since we don’t know which member of r
our Union
is wrapping.
prj :: (Typeable1 t, Member t r) => Union r v > Maybe (t v)
prj (Union v) = runId <$> gcast1 (Id v)
prj
does some evil Typeable
casts, but this is necessary since we’re throwing away all our type information with that existential. That Id
runId
pair is needed since gcast1
has the type
They’re just defined as
so just like Control.Monad.Identity
.
Now let’s try to figure out what this SetMember
thing is.
class Member t r => SetMember set (t :: * > *) r  r set > t
instance SetMember set t r => SetMember set t (t' :> r)
This is unhelpful, all we have is the recursive step with no base case! Resorting to grep reveals that our base case is defined in Control.Eff.Lift
so we’ll temporarily put this class off until then.
Now the rest of the file is defining a few functions to operate over Union
s.
First up is an unsafe “forced” version of prj
.
infixl 4 <?>
(<?>) :: Maybe a > a > a
Just a <?> _ = a
_ <?> a = a
prjForce :: (Typeable1 t, Member t r) => Union r v > (t v > a) > a
prjForce u f = f <$> prj u <?> error "prjForce with an invalid type"
prjForce
is really exactly what it says on the label, it’s a version of prj
that throws an exception if we’re in the wrong state of Union
.
Next is a way of unsafely rejiggering the type level list that Union
is indexed over.
We need this for our last function, decom
. This function partially unfolds our Union
into an Either
decomp :: Typeable1 t => Union (t :> r) v > Either (Union r v) (t v)
decomp u = Right <$> prj u <?> Left (unsafeReUnion u)
This provides a way to actually do some sort of induction on r
by breaking out each type piece by piece with some absurd case for when we don’t have a :> b
.
That about wraps up this little Union
library, let’s move on to see how this is actually used.
Now let’s talk about the core of extensibleeffects, Control.Eff
. As always we’ll start by taking a look at the export list
module Control.Eff(
Eff (..)
, VE (..)
, Member
, SetMember
, Union
, (:>)
, inj
, prj
, prjForce
, decomp
, send
, admin
, run
, interpose
, handleRelay
, unsafeReUnion
) where
So right away we can see that we’re exporting stuff Data.Union1
as well as several new things, including the infamous Eff
.
The first definition we come across in this module is VE
. VE
is either a simple value or a Union
applied to a VE
!
Right away we notice that “pure value or X” pattern we see with free monads and other abstractions over effects.
We also include a quick function to try to extract a pure value form Val
s
fromVal :: VE r w > w
fromVal (Val w) = w
fromVal _ = error "extensibleeffects: fromVal was called on a nonterminal effect."
Now we’ve finally reached the definition of Eff
!
So Eff
bears a striking resemblance to Cont
. There are two critical differences though, first is that we specialize our return type to something constructed with VE r
. The second crucial difference is that by universally quantifying over w
we sacrifice a lot of the power of Cont
, including callCC
!
Next in Control.Eff
is the instances for Eff
instance Functor (Eff r) where
fmap f m = Eff $ \k > runEff m (k . f)
{# INLINE fmap #}
instance Applicative (Eff r) where
pure = return
(<*>) = ap
instance Monad (Eff r) where
return x = Eff $ \k > k x
{# INLINE return #}
m >>= f = Eff $ \k > runEff m (\v > runEff (f v) k)
{# INLINE (>>=) #}
Notice that these are all really identical to Cont
s instances. Functor
adds a function to the head of the continuation. Monad
dereferences m
and feeds the result into f
. Exactly as with Cont
.
Next we can look at our primitive function for handling effects
I must admit, this tripped me up for a while. Here’s how I read it, “provide a function, which when given a continuation for the rest of the program expecting an a
, produces a side effecting VE r w
and we’ll map that into Eff
”.
Remember how Union
holds functors? Well each of our effects must act like as a functor and wrap itself in that union. By being open, we get the “extensible” in extensibleeffects.
Next we look at how to remove effects once they’ve been added to our set of effects. In mtlland, this is similar to the collection of runFooT
functions that are used to gradually strip a layer of transformers away.
The first step towards this is to transform the CPSed effectful computation Eff
, into a more manageable form, VE
This is a setup step so that we can traverse the “tree” of effects that our Eff
monad built up for us.
Next, we know that we can take an Eff
with no effects and unwrap it into a pure value. This is the “base case” for running an effectful computation.
Concerned readers may notice that we’re using a partial function, this is OK since the E
case is “morally impossible” since there is no t
so that Member t ()
holds.
Next is the function to remove just one effect from an Eff
handleRelay :: Typeable1 t
=> Union (t :> r) v  ^ Request
> (v > Eff r a)  ^ Relay the request
> (t v > Eff r a)  ^ Handle the request of type t
> Eff r a
handleRelay u loop h = either passOn h $ decomp u
where passOn u' = send (<$> u') >>= loop
Next to send
, this function gave me the most trouble. The trick was to realize that that decomp
will leave us in two cases.
v
, Union r v
t
producing a v
, t v
If we have a t v
, then we’re all set since we know exactly how to map that to a Eff r a
with h
.
Otherwise we need to take this effect, add it back into our computation. send (<$> u')
takes the rest of the computation, that continuation and feeds it the v
that we know our effects produce. This gives us the type Eff r v
, where that outer Eff r
contains our most recent effect as well as everything else. Now to convert this to a Eff r a
we need to transform that v
to an a
. The only way to do that is to use the supplied loop
function so we just bind to that.
Last but not least is a function to modify an effect somewhere in our effectful computation. A grep
reveals will see this later with things like local
from Control.Eff.Reader
for example.
To do this we want something like handleRelay
but without removing t
from r
. We also need to generalize the type so that t
can be anywhere in our. Otherwise we’ll have to prematurally solidify our stack of effects to use something like modify
.
interpose :: (Typeable1 t, Functor t, Member t r)
=> Union r v
> (v > Eff r a)
> (t v > Eff r a)
> Eff r a
interpose u loop h = maybe (send (<$> u) >>= loop) h $ prj u
Now this is almost identical to handleRelay
except instead of using decomp
which will split off t
and only works when r ~ t :> r'
, we use prj
! This gives us a Maybe
and since the type of u
doesn’t need to change we just recycle that for the send (<$> u) >>= loop
sequence.
That wraps up the core of extensibleeffects, and I must admit that when writing this I was still quite confused as to actually use Eff
to implement new effects. Reading a few examples really helped clear things up for me.
The State
monad has always been the sort of classic monad example so I suppose we’ll start here.
module Control.Eff.State.Lazy( State (..)
, get
, put
, modify
, runState
, evalState
, execState
) where
So we’re not reusing the State
from Control.Monad.State
but providing our own. It looks like
So what is this supposed to do? Well that s > w
looks a continuation of sorts, it takes the state s
, and produces the resulting value. The s > s
looks like something that modify
should use.
Indeed this is the case
modify :: (Typeable s, Member (State s) r) => (s > s) > Eff r ()
modify f = send $ \k > inj $ State f $ \_ > k ()
put :: (Typeable e, Member (State e) r) => e > Eff r ()
put = modify . const
we grab the continuation from send
and add a State
effect on top which uses our modification function s
. The continuation that State
takes ignores the value it’s passed, the current state, and instead feeds the program computation the ()
it’s expecting.
get
is defined in a similar manner, but instead of modifying the state, we use State’s continuation to feed the program the current state.
So we grab the continuation, feed it to a State id
which won’t modify the state, and then inject that into our open union of effects.
Now that we have the API for working with states, let’s look at how to remove that effect.
runState :: Typeable s
=> s  ^ Initial state
> Eff (State s :> r) w  ^ Effect incorporating State
> Eff r (s, w)  ^ Effect containing final state and a return value
runState s0 = loop s0 . admin where
loop s (Val x) = return (s, x)
loop s (E u) = handleRelay u (loop s) $
\(State t k) > let s' = t s
in loop s' (k s')
runState
first preps our effect to be pattern matched on with admin
. We then start loop
with the initial state.
loop
has two components, if we have run into a value, then we don’t interpret any effects, just stick the state and value together and return
them.
If we do have an effect, we use handleRelay
to split out the State s
from our effects. To handle the case where we get a VE w
, we just loop
with the current state. However, if we get a State t k
, we update the state with t
and pass the continuation k
.
From runState
evalState
and execState
.
evalState :: Typeable s => s > Eff (State s :> r) w > Eff r w
evalState s = fmap snd . runState s
execState :: Typeable s => s > Eff (State s :> r) w > Eff r s
execState s = fmap fst . runState s
That wraps up the interface for Control.Eff.State
. The nice bit is this makes it a lot clearer how to use send
, handleRelay
and a few other functions from the core.
Now we’re on to Reader
. The interesting thing here is that local
highlights how to use interpose
properly.
As always, we start by looking at what exactly this module provides
The definition of Reader
is refreshingly simple
Keen readers will note that this is just half of the State
definition which makes sense; Reader
is half of State
.
ask
is defined almost identically to get
We just feed the continuation for the program into Reader
. A simple wrapper over this gives our equivalent of reads
Next up is local
, which is the most interesting bit of this module.
local :: (Typeable e, Member (Reader e) r)
=> (e > e)
> Eff r a
> Eff r a
local f m = do
e < f <$> ask
let loop (Val x) = return x
loop (E u) = interpose u loop (\(Reader k) > loop (k e))
loop (admin m)
So local
starts by grabbing the view of the environment we’re interested in, e
. From there we define our worker function which looks a lot like runState
. The key difference is that instead of using handleRelay
we use interpose
to replace each Reader
effect with the appropriate environment. Remember that interpose
is not going to remove Reader
from the set of effects, just update each Reader
effect in the current computation.
Finally, we simply rejigger the computation with admin
and feed it to loop
.
In fact, this is very similar to how runReader
works!
runReader :: Typeable e => Eff (Reader e :> r) w > e > Eff r w
runReader m e = loop (admin m)
where
loop (Val x) = return x
loop (E u) = handleRelay u loop (\(Reader k) > loop (k e))
Now between Control.Eff.Reader
and Control.Eff.State
I felt I had a pretty good handle on most of what I’d read in extensibleeffects. There was just one remaining loose end: SetMember
. Don’t remember what that was? It was a class in Data.OpenUnion1
that was conspicuously absent of detail or use.
I finally found where it seemed to be used! In Control.Eff.Lift
.
First let’s poke at the exports of his module
This module is designed to lift an arbitrary monad into the world of effects. There’s a caveat though, since monads aren’t necessarily commutative, the order in which we run them in is very important. Imagine for example the difference between IO (m a)
and m (IO a)
.
So to ensure that Eff
can support lifted monads we have to do some evil things. First we must require that we never have to lifted monads and we always run the monad last. This is a little icky but it’s usefulness outweighs such ugliness.
To ensure condition 1, we need SetMember
.
So we define a new instance of SetMember
. Basically this says that any Lift
is a SetMember ... r
iff Lift m
is the last item in r
.
To ensure condition number two we define runLift
with the more restrictive type
We can now look into exactly how Lift
is defined.
So this Lift
acts sort of like a “suspended bind”. We postpone actually binding the monad and simulate doing so with a continuation a > v
.
We can define our one operation with Lift
, lift
.
This works by suspending the rest of the program in a our faux binding to be unwrapped later in runLift
.
runLift :: (Monad m, Typeable1 m) => Eff (Lift m :> ()) w > m w
runLift m = loop (admin m) where
loop (Val x) = return x
loop (E u) = prjForce u $ \(Lift m' k) > m' >>= loop . k
The one interesting difference between this function and the rest of the run functions we’ve seen is that here we use prjForce
. The reason for this is that we know that r
is just Lift m :> ()
. This drastically simplifies the process and means all we’re essentially doing is transforming each Lift
into >>=
.
That wraps up our tour of the module and with it, extensibleeffects.
This post turned out a lot longer than I’d expected, but I think it was worth it. We’ve gone through the coroutine/continuation based core of extensibleeffects and walked through a few different examples of how to actually use them.
If you’re still having some trouble putting the pieces together, the rest of extensible effects is a great collection of useful examples of building effects.
I hope you had as much fun as I did with this one!
Thanks to Erik Rantapaa a much longer post than I led him to believe
comments powered by Disqus ]]>One of my oldest habits with programming is reading other people’s code. I’ve been doing it almost since I started programming. For the last two years that habit has been focused on Hackage. Today I was reading the source code to the “logic programming monad” provided by logict
and wanted to blog about how I go about reading new Haskell code.
This time the code was pretty tiny, find . name *.hs  xargs wc l
reveals two files with just under 400 lines of code! logict
also only has two dependencies, base and the mtl, so there’s not a big worry of unfamiliar libraries.
It’s a lot easier to read this post if you have the source for logict on hand. To grab it, use cabal get
. My setup is something like
~ $ cabal get logict
~ $ cd logict0.6.0.2
~/logict0.6.0.2 $ cabal sandbox init
~/logict0.6.0.2 $ cabal install onlydependencies
I’m somewhat ashamed to admit that I use pretty primitive tooling for exploring a new codebase, it’s grep
and find
all the way! If you use a fancy IDE, perhaps you can just skip this section and take a moment to sit back and feel hightech.
First things first is to figure out what Haskell files are here. It can be different than what’s listed on Hackage since often libraries don’t export external files.
~/logict0.6.0.2 $ find . name *.hs
./dist/build/autogen/Paths_logict.hs
./Control/Monad/Logic.hs
./Control/Monad/Logic/Class.hs
Alright, there’s two source file and one sitting in dist. The dist one is almost certainly just cabal autogened stuff that we don’t care about.
It also appears that there’s no src
directory and every module is publicly exported! This means that we only have two modules to worry about.
The next thing to figure out is which to read first. In this case the choice is simple: greping for imports with
grep "import" r Control
reveals that Control.Monad.Logic
imports Control.Monad.Logic.Class
so we start with *.Class
.
Control.Monad.Logic.Class
Alright! Now it’s actually time to start reading code.
The first thing that jumps out is the export list
Alright, so we’re exporting everything from a class MonadLogic
, as well as two functions reflect
and lnot
. Let’s go figure out what MonadLogic
is.
class (MonadPlus m) => MonadLogic m where
msplit :: m a > m (Maybe (a, m a))
interleave :: m a > m a > m a
(>>) :: m a > (a > m b) > m b
ifte :: m a > (a > m b) > m b > m b
once :: m a > m a
The fact that this depends on MonadPlus
is pretty significant. Since most classes don’t require this I’m going to assume that it’s fairly key to either the implementation of some of these methods or to using them. Similar to how Monoid
is critical to Writer
.
The docs make it pretty clear what each member of this class does
msplit
Take a local computation and split it into it’s first result and another computation that computes the rest.
interleave
This is the key difference between MonadLogic
and []
. interleave
gives fair choice between two computation. This means that every result that appears in finitely many applications of msplit
for some a
and b
, will appear in finitely many applications of msplit
to interleave a b
.
>>
>>
is similar to interleave
. Consider some code like
(a >>= k) `mplus` (b >>= k)
This is equivalent to mplus a b >>= k
, but has different characteristics since >>=
might never terminate. >>
is described as “considering both sides of the disjunction”.
I have absolutely no idea what that means.. hopefully it’ll be clearer once we look at some implementations.
ifte
This is the equivalent of Prolog’s soft cut. We poke a logical computation and if it can succeed at all, then we feed it into the success computation, otherwise we’ll feed return the failure case.
once
once
is clever combinator to prevent backtracking. It will grab the first result from a computation, wrap it up and return it. This prevents backtracking further on the original computation.
Now the docs also state that everything is derivable from msplit
. These implementations look like
interleave m1 m2 = msplit m1 >>=
maybe m2 (\(a, m1') > return a `mplus` interleave m2 m1')
m >> f = do (a, m') < maybe mzero return =<< msplit m
interleave (f a) (m' >> f)
ifte t th el = msplit t >>= maybe el (\(a,m) > th a `mplus` (m >>= th))
once m = do (a, _) < maybe mzero return =<< msplit m
return a
The first thing I notice looking at interleave is that it kinda looks like
This makes sense, since this will fairly split between xs
and ys
just like interleave
is supposed to. Here msplit
is like pattern matching, mplus
is :
, and we have to sprinkle some return
in there for kicks and giggles.
Now about this mysterious >>
, the biggest difference is that each f a
is interleaved
, rather than mplus
ed. This should mean that it can be fairly split between our first result, f a
and the rest of them m' >> f
. Now if we can do something like
Should have nice and fair behavior.
The next two are fairly clear, ifte
splits it’s computation, and if it can it feeds the whole stinking thing return a
mplusm'
to the success computation, otherwise it just returns the failure computation. Nothing stunning.
once
is my favorite function. To prevent backtracking all we do is grab the first result and return
it.
So that takes care of MonadTrans
. The next thing to worry about are these two functions reflect
and lnot
.
reflect
confirms my suspicion that the dual of msplit
is mplus (return a) m'
.
reflect :: MonadLogic m => Maybe (a, m a) > m a
reflect Nothing = mzero
reflect (Just (a, m)) = return a `mplus` m
The next function lnot
negates a logical computation. Now, this is a little misleading because the negated computation either produces one value, ()
, or is mzero
and produces nothing. This is easily accomplished with ifte
and once
That takes care of most of this file. What’s left is a bunch of instances for monad transformers for MonadTrans
. There’s nothing to interesting in them so I won’t talk about them here. It might be worth glancing at the code if you’re interested.
One slightly odd thing I’m noticing is that each class implements all the methods, rather than just msplit
. This seems a bit odd.. I guess the default implementations are significantly slower? Perhaps some benchmarking is in order.
Now that we’ve finished with Control.Monad.Logic.Class, let’s move on to the main file.
Now we finally see the definition of LogicT
I have no idea how this works, but I’m guessing that this is a church version of [a]
specialized to some m
. Remember that the church version of [a]
is
Now what’s interesting here is that the church version is strongly connected to how CPSed code works. We could than imagine that mplus
works like cons
for church lists and yields more and more results. But again, this is just speculation.
This suspicion is confirmed by the functions to extract values out of a LogicT
computation
observeT :: Monad m => LogicT m a > m a
observeT lt = unLogicT lt (const . return) (fail "No answer.")
observeAllT :: Monad m => LogicT m a > m [a]
observeAllT m = unLogicT m (liftM . (:)) (return [])
observeManyT :: Monad m => Int > LogicT m a > m [a]
observeManyT n m
 n <= 0 = return []
 n == 1 = unLogicT m (\a _ > return [a]) (return [])
 otherwise = unLogicT (msplit m) sk (return [])
where
sk Nothing _ = return []
sk (Just (a, m')) _ = (a:) `liftM` observeManyT (n1) m'
observeT
grabs the a
from the success continuation and if no result is returned than it will evaluate fail "No Answer
which looks like the failure continuation! Looks like out suspicion is confirmed, we’re dealing with monadic church lists or some other permutation of those buzzwords.
Somehow in a package partially designed by Oleg I’m not surprised to find continuations :)
observeAllT
is quite similar, notice that we take advantage of the fact that r
is universally quantified to instantiate it to a
. This quantification is also used in observeManyT
. This quantification also prevents any LogicT
from taking advantage of the return type to do evil things with returning random values that happen to match the return type. This is what’s possible with ContT
for example.
Now we have the standard specialization and smart constructor for the nontransformer version.
type Logic = LogicT Identity
logic :: (forall r. (a > r > r) > r > r) > Logic a
logic f = LogicT $ \k > Identity .
f (\a > runIdentity . k a . Identity) .
runIdentity
Look familiar? Now we can inject real church lists into a Logic
computation. I suppose this shouldn’t be surprising since [a]
functions like a slightly broken Logic a
, without any sharing or soft cut.
Now we repeat all the observe*
functions for Logic
, I’ll omit these since they’re implementations are exactly as you’d expect and not interesting.
Next we have a few type class instances
instance Functor (LogicT f) where
fmap f lt = LogicT $ \sk fk > unLogicT lt (sk . f) fk
instance Applicative (LogicT f) where
pure a = LogicT $ \sk fk > sk a fk
f <*> a = LogicT $ \sk fk > unLogicT f (\g fk' > unLogicT a (sk . g) fk') fk
instance Alternative (LogicT f) where
empty = LogicT $ \_ fk > fk
f1 <> f2 = LogicT $ \sk fk > unLogicT f1 sk (unLogicT f2 sk fk)
instance Monad (LogicT m) where
return a = LogicT $ \sk fk > sk a fk
m >>= f = LogicT $ \sk fk > unLogicT m (\a fk' > unLogicT (f a) sk fk') fk
fail _ = LogicT $ \_ fk > fk
It helps for reading this if you expand sk
to “success continuation” and fk
to “fail computation”. Since we’re dealing with church lists I suppose you could also use cons
and nil
.
What’s particularly interesting to me here is that there are no constraints on m
for these type class declarations! Let’s go through them one at a time.
Functor
is usually pretty mechanical, and this is no exception. Here we just have to change a > m r > m r
to b > m r > m r
. This is trivial just by composing the success computation with f
.
Applicative
is similar. pure
just lifts a value into the church equivalent of a singleton list, [a]
. <*>
is a little bit more meaty, we first unwrap f
to it’s underlying function g
, and composes it with out successes computation for a
. Notice that this is very similar to how Cont
works, continuation passing style is necessary with church representations.
Now return
and fail
are pretty straightforward. Though this is interesting because since pattern matching calls fail
, we can just do something like
And we’ll run n
and m
until we get a Just
value.
As for >>=
, it’s implementation is very similar to <*>
. We unwrap m
and then feed the unwrapped a
into f
and run that with our success computations.
We’re only going to talk about one more instance for LogicT
, MonadLogic
, there are a few others but they’re mostly for MTL use and not too interesting.
instance (Monad m) => MonadLogic (LogicT m) where
msplit m = lift $ unLogicT m ssk (return Nothing)
where ssk a fk = return $ Just (a, (lift fk >>= reflect))
We’re only implementing msplit
here, which strikes me as a bit odd since we implemented everything before. We also actually need Monad m
here so that we can use LogicT
’s MonadTrans
instance.
To split a LogicT
, we run a special success computation and return Nothing
if failure is ever called. Now there’s one more clever trick here, since we can choose what the r
is in m r
, we choose it to be Maybe (a, LogicT m a)
! That way we can take the failure case, which essentially is just the tail of the list, and push it into reflect
.
This confused me a bit so I wrote the equivalent version for church lists, where msplit
is just uncons
.