In this post I’d just like to walk through some fun code, nothing particularly theoryy. The code I’d like to go through is a simple little module in ML that lets you easily construct “dynamic” types. This isn’t through the usual “really big sum of products” approach but instead is completely open and can be extended for every new defined type (at runtime).
The basic idea behind this trick hinges on how exceptions work in SML. Well, really it’s not about exceptions so much as what exceptions work with. In ML we can declare new exceptions like this
exception Foo of tyarg
and this gives us a new exception constructor Foo
and we can raise and handle it like you would expect
(raise (Foo 1)) handle Foo x => x
But what’s particularly interesting is that Foo
actually has a type. Really it’s just a constructor for a special type exn
. This means we can do things like pass around exception constructors, apply them, etc, etc.
exn
is what we might call an extensible data type, we can extend it arbitrarily. We could imagine allowing users to define their own such types but in SML we’ve just go the one. The reason we even have this one is because it’s a great choice if you can only allow one type to be raised and handled.
What we’re going to do is use the fact that we can generate new extensions to exn
at run time to create an exn
based structure providing a way to implement “tags”. Once we have these tags we’ll be able to implement a pair of functions
val tag : 'a tag > 'a > dynamic
val untag : 'a tag > dynamic > 'a option
So tags let us “forget” the type of some expression and treat it as some dynamic blob to be recovered at some time in the future. Concretely, we’d like to implement this signature
signature TAG =
sig
type dynamic
type 'a tag
val new : unit > 'a tag
val tag : 'a tag > 'a > dynamic
val untag : 'a tag > dynamic > 'a option
end
So let’s start implementing the thing. First we need to decide what the type dynamic
should be. I propose that it should be exn
. The reason being that we can always extend exn
in various ways so if we implement things with dynamic = exn
we’ll have the ability to make dynamic
“grow a new branch” to accommodate whatever we’re working with.
structure Tag :> TAG =
struct
type dynamic = exn
end
Ok, so what should tag
be? Well it’s going to be type indexed obviously so that we can even talk about the signatures of (un)tag
, but more importantly its purpose should be to tell us how to package something up into an exn
so we can get it back out. The downside of this whole extensible data type thing is that if we forget about the constructor we used to make an exn
it’s just lost forever! A tag
will make sure that once we make a constructor to use with dynamic
we won’t find ourselves with a dynamic
and no way to inspect it.
The best way I can think of for doing this is to just back the (un)tag
operations straight into the implementation of the type.
structure Tag :> TAG =
struct
type dynamic = exn
type 'a tag = {into : 'a > exn, out : exn > 'a option}
end
Now this makes it look like tags could perform arbitrary operations in the process of tagging and untagging, but really we’re going to implement it so it’s all very simple and efficient.
In particular, we’re now in a position to define our three core operators
structure Tag :> TAG =
struct
type dynamic = exn
type 'a tag = {into : 'a > exn, out : exn > 'a option}
fun new () : 'a tag =
let
exception Fresh of 'a
in
{ into = Fresh
, out = fn e =>
case e of Fresh a => SOME a  _ => NONE
}
end
fun tag {into, out} = into
fun untag {into, out} = out
end
Now tag and untag are pretty simple because we basically implemented them up in new
so let’s look carefully at that. We start by first minting a new constructor for exn
. We know that this will not clash with any other exception in existence, no one else can raise it or handle it unless we explicitly give them this constructor. Now while we have access to it, we bundle the constructor into the tag
record we’re making.
into
is quite easy to implement because it’s just constructor application. out
is also straightforward, all we do is pattern match to see if the given exn
is correct. All we do in the actual matching bit is see if we’ve been given something made with our Fresh
constructor and return the included a
if we did. The handling everything else is important, otherwise this would explode horribly every time we failed to untag something.
And there’s a nice way of implementing the same sort of run time typing you get in dynamic languages in SML. One nice advantage of this over the usual
datatype dynamic = INT of int  STRING of string  ...
approach is we can always extend our dynamic
with user defined types. So we can do something like
datatype foo = Foo of int
val fooTag = Tag.new () : foo tag
val d = Tag.tag fooTag (Foo 2)
val SOME (Foo 2) = Tag.untag fooTag d
There you go, this is just a very short post on a very short piece of code that let’s us do something fun. Some nice things you can do now
dynamic
to write an infinite loop without direct recursionexn
and using the generative effect of allocating a reference insteadSo summer seems to be about over. I’m very happy with mine, I learned quite a lot. In particular over the last few months I’ve been reading and fiddling with a different kind of type theory than I was used to: computational type theory. This is the type theory that underlies Nuprl (or JonPRL cough cough).
One thing that stood out to me was that you could do all these absolutely crazy things in this system that seemed impossible after 3 years of Coq and Agda. In this post I’d like to sketch some of the philosophical differences between CTT and a type theory more in the spirit of CiC.
First things first, let’s go over the more familiar notion of type theory. To develop one of these type theories you start by discussing some syntax. You lay out the syntax for some types and some terms
A ::= Σ x : A. A  Π x : A. A  ⊤  ⊥  ...
M ::= M M  λ x : A. M  <M, M>  π₁ M  ⋆  ...
And now we want to describe the all important M : A
relation. This tells us that some term has some type. It’s is inductively defined from a finite set of inferences. Ideally, it’s even decidable for philosophical reasons I’ve never cared too much about. In fact, it’s this relation that really governs our whole type theory, everything else is going to stem from this.
As an afterthought, we may decide that we want to identify certain terms which other terms this is called definitional equality. It’s another inductively defined (and decidable) judgment M ≡ N : A
. Two quick things to note here
M ≡ N : A
is independent of the complexity of A
The last point is some concern because it means that equality for functions is never going to be right for what we want. We have this uniformly complex judgment M ≡ N : A
but when A = Π x : B. C
the complexity should be greater and dependent on the complexity of B
and C
. That’s how it works in math after all, equality at functions is defined pointwise, something we can’t really do here if ≡
is to be decidable or just be of the same complexity no matter the type.
Now we can do lots of things with our theory. One thing we almost always want to do is now go back and build an operational semantics for our terms. This operational semantics should be some judgment M ↦ M
with the property that M ↦ N
will imply that M ≡ N
. This gives us some computational flavor in our type theory and lets us run the pieces of syntax we carved out with M : A
.
But these terms that we’ve written down aren’t really programs. They’re just serializations of the collections of rules we’ve applied to prove a proposition. There’s no ingrained notion of “running” an M
since it’s built on after the fact. What we have instead is this ≡
relation which just specifies which symbols we consider equivalent but even it is was defined arbitrarily. There’s no reason we ≡
needs to be a reasonable term rewriting system or anything. If we’re good at our jobs it will be, sometimes (HoTT) it’s not completely clear what that computation system is even though we’re working to find it. So I’d describe a (good) formal type theory as an axiomatic system like any other that we can add a computational flavor to.
This leads to the first interpretation of the propsastypes correspondence. This states that the inductively defined judgments of a logic give rise to a type theory whose terms are proof terms for those same inductively defined judgments. It’s an identification of similar looking syntactic systems. It’s useful to be sure if you want to develop a formal type theory, but it gives us less insight into the computational nature of a logic because we’ve reflected into a type theory which we have no reason to suspect has a reasonable computational characterization.
Now we can look at a second flavor of type theory. In this setting the way we order our system is very different. We start with an programming language, a collection of terms and an untyped evaluation relation between them. We don’t necessarily care about all of what’s in the language. As we define types later we’ll say things like “Well, the system has to include at least X” but we don’t need to exhaustively specify all of the system. It follows that we have actually no clue when defining the type theory how things compute. They just compute somehow. We don’t really even want the system to be strongly normalizing, it’s perfectly valid to take the lambda calculus or Perl (PerlPRL!).
So we have some terms and ↦
, on top of this we start by defining a notion of equality between terms. This equality is purely computational and has no notion of types yet (like M ≡ N : A
) because we have no types yet. This equality is sometimes denoted ~
, we usually define it as M ~ N
if and only if M ↦ O(Ms)
if and only if N ↦ O(Ns)
and if they terminate than Ms ~ Ns
. By this I mean that two terms are the same if they compute in the same way, either by diverging or running to the same value built from ~
equal components. For more on this, you could read Howe’s paper.
So now we still have a type theory with no types.. To fix this we go off an define inferences to answer three questions.
A = B
)a ∈ A
)a = b ∈ A
)The first questions is usually answered in a boring way, for instance, we would say that Π x : A. B = Π x : A'. B'
if we know that A = A'
and B = B'
under the assumption that we have some x ∈ A
. We then specify two and three. There we just give the rules for demonstrating that some value, which is a program existing entirely independently of the type we’re building, is in the type. Continuing with functions, we might state that
e x ∈ B (x ∈ A)
———————————————————
e ∈ Π x : A. B
Here I’m using _ (_)
as syntax for a hypothetical judgment, we have to know that e ∈ B
under the assumption that we know that x ∈ A
. Next we have to decide what it means for two values to be equal as functions. We’re going to do this behaviourally, by specifying that they behave as equal programs when used as functions. Since we use functions by applying them all we have to do is specify that they behave equally on application
v x = v' x ∈ B (x ∈ A)
————————————————————————
v = v' ∈ Π x : A. B
Equality is determined on a per type basis. Furthermore, it’s allowed to use the equality of smaller types in its definition. This means that when defining equality for Π x : A. B
we get to use the equalities for A
and B
! We make no attempt to maintain either decidability or uniform complexity in the collections of terms specified by _ = _ ∈ _
as we did with ≡
. As another example, let’s have a look at the equality type.
A = A' a = a' ∈ A b = b' ∈ A
————————————————————————————————
I(a; b; A) = I(a'; b'; A')
a = b ∈ A
——————————————
⋆ ∈ I(a; b; A)
a = b ∈ A
——————————————————
⋆ = ⋆ ∈ I(a; b; A)
Things to notice here, first off the various rules depend on the rules governing membership and equality in A
as we should expect. Secondly, ⋆
(the canonical occupant of I(...)
) has no type information. There’s no way to reconstruct whatever reasoning went into proving a = b ∈ A
because there’s no computational content in it. The thing on the left of the ∈
only describes the portions of our proof that involve computation and equalities in computational type theory are always computationally trivial. Therefore, they get the same witness no matter the proof, no matter the types involved. Finally, the infamous equality reflection rule is really just the principle of inversion that we’re allowed to use in reasoning about hypothetical judgments.
This leads us to the second cast of propsastypes. This one states that constructive proof has computational character. Every proof that we write in a logic like this gives us back an (untyped) program which we can run as appropriate for the theorem we’ve proven. This is the idea behind Kleene’s realizability model. Similar to what we’d do with a logical relation we define what each type means by defining the class of appropriate programs that fit its specification. For example, we defined functions to be the class of things that apply and proofs of equality are ⋆ when the equality is true and there are no proofs when it’s false. Another way of phrasing this correspondence is typesasspecs. Types are used to identify a collection of terms that may be used in some particular way instead of merely specifying the syntax of their terms. To read a bit more about this see Stuart Allen and Bob Harper’s work on the do a good job of explaining how this plays out for type theory.
A lot of the ways we actually interact with type theories is not on the blackboard but through some proof assistant which mechanizes the tedious aspects of using a type theory. For formal type theory this is particularly natural. It’s decidable whether M : A
holds so the user just writes a term and says “Hey this is a proof of A
” and the computer can take care of all the work of checking it. This is the basic experience we get with Coq, Agda, Idris, and others. Even ≡
is handled without us thinking about it.
With computational type theory life is a little sadder. We can’t just write terms like we would for a formal type theory because M ∈ A
isn’t decidable! We need to help guide the computer through the process of validating that our term is well typed. This is the price we pay for having an exceptionally rich notion of M = N ∈ A
and M ∈ A
, there isn’t a snowball’s chance in hell of it being decidable ^{1}. To make this work we switch gears and instead of trying to construct terms we start working with what’s called a program refinement logic, a PRL. A PRL is basically a sequent calculus with a central judgment of
H ≫ A ◁ e
This is going to be set up so that H ⊢ e ∈ A
holds, but there’s a crucial difference. With ∈
everything was an input. To mechanize it we would write a function accepting a context and two terms and checking whether one is a member of the other. With H ≫ A ◁ e
only H
and A
are inputs, e
should be thought of as an output. What we’ll do with this judgment is work with a tactic language to construct a derivation of H ≫ A
without even really thinking with that ◁ e
and the system will use our proof to construct the term for us. So in Agda when I want to write a sorting function what I might do is say
sort : List Nat → List Nat
sort xs = ...
I just give the definition and Agda is going to do the grunt work to make sure that I don’t apply a nat to a string or something equally nutty. In a system like (JonNuMetaλ)prl what we do instead is define the type that our sorting function ought to have and use tactics to prove the existence of a realizer for it. By default we don’t really specify what exactly that realizer. For example, if I was writing JonPRL maybe I’d say
 Somehow this says a list of nats is a sorted version of another
Operator sorting : (0; 0).
Theorem sort : [(xs : List Nat) {ys : List Nat  issorting(ys; xs)}] {
 Tactics go here.
}
I specify a sufficiently strong type so that if I can construct a realizer for it then I clearly have constructed a sorting algorithm. Of course we have tactics which let us say things “I want to use this realizer” and then we have to go off and show that the candidate realizer is a validate realizer. In that situation we’re actually acting as a type checker, constructing a derivation implying e ∈ A
.
Well, that’s this summer in a nutshell. Before I finish I had one more possible look on things. Computational type theory is not concerned with something being provable in an axiomatic system, rather it’s about describing constructions. Brouwer’s core idea is that a proof is a mental construction and computational type theory is a system for proving that a particular a computable process actually builds the correct object. It’s a translation of Brouwer’s notion of proof into terms a computer scientist might be interested in.
To be clear, this is the chance of the snowball not melting. Not the snowball’s chances of being able to decide whether or not M ∈ A
holds. Though I suppose they’re roughly the same.↩
I was reading a recent proposal to merge types and kinds in Haskell to start the transition to dependently typed Haskell. One thing that caught my eye as I was reading it was that this proposal adds * :: *
to the type system. This is of some significance because it means that once this is fully realized, Haskell will be inconsistent (as a logic) in a new way! Of course, this isn’t a huge deal since Haskell is already woefully inconsistent with
unsafePerformIO
So it’s not like we’ll be entering new territory here. All that it means is that there’s a new way to inhabit every type in Haskell. If you were using Haskell as a proof assistant you were already in for a rude awakening I’m afraid :)
This is an issue of significance though for languages like Idris or Agda where such a thing would actually render proofs useless. Famously, MartinLöf’s original type theory did have Type : Type
(or * :: *
in Haskell spelling) and Girard managed to derive a contradiction (Girard’s paradox). I’ve always been told that the particulars of this construction are a little bit complicated but to remember that Type : Type
is bad.
In this post I’d like to prove that Type : Type
is a contradiction in JonPRL. This is a little interesting because in most proof assistants this would work in two steps
Type : Type
OK to be fair, in something like Agda you could use the compiler hacking they’ve already done and just say {# OPTIONS setinset #}
or whatever the flag is. The spirit of the development is the same though
In JonPRL, I’m just going to prove this as a regular implication. We have a proposition which internalizes membership and I’ll demonstrate not(member(U{i}; U{i}))
is provable (U{i}
is how we say Type
in JonPRL). It’s the same logic as we had before.
Before we can really get to the proof we want to talk about, we should go through some of the more advanced features of JonPRL we need to use.
JonPRL is a little different than most proof assistants, for example We can define a type of all closed terms in our language and whose equality is purely computational. This type is base
. To prove that =(a; b; base)
holds you have to prove ceq(a; b)
, the finest grain equality in JonPRL. Two terms are ceq
if they
ceq
componentsWhat’s particularly exciting is that you can substitute any term for any other term ceq
to it, no matter at what type it’s being used and under what hypotheses. In fact, the reduce
tactic (which performs beta reductions) can conceptually be thought of as substituting a bunch of terms for their weakheadnormal forms which are ceq
to the original terms. The relevant literature behind this is found in Doug Howe’s “Equality in a Lazy Computation System”. There’s more in JonPRL in this regard, we also have the asymmetric version of ceq
(called approx
) but we won’t need it today.
Next, let’s talk about the image type. This is a type constructor with the following formation rule:
H ⊢ A : U{i} H ⊢ f : base
—————————————————————————————————
H ⊢ image(A; f) : U{i}
So here A
is a type and f
is anything. Things are going to be equal image
if we can prove that they’re of the form f w
and f w'
where w = w' ∈ A
. So image
gives us the codomain (range) of a function. What’s pretty crazy about this is that it’s not just the range of some function A → B
, we don’t really need a whole new type for that. It’s the range of literally any closed term we can apply. We can take the range of the Y combinator over pi types. We can take the range of lam(x. ⊥)
over unit
, anything we want!
This construct lets us define some really incredible things as a user of JonPRL. For example, the “squash” of a type is supposed to be a type which is occupied by <>
(and only <>
) if and only if there was an occupant of the original type. You can define these in HoTT with higher inductive types. Or, you can define these in this type theory as
Operator squash : (0).
[squash(A)] =def= [image(A; lam(x. <>))]
x ∈ squash(A)
if and only if we can construct an a
so that a ∈ A
and lam(x. <>) a ~ x
. Clearly x
must be <>
and we can construct such an a
if and only if A
is nonempty.
We can also define the setunion of two types. Something is supposed to be in the set union if and only if it’s in one or the other. Two define such a thing with an image type we have
Operator union : (0).
[union(A; B)] =def= [image((x : unit + unit) * decide(x; _.A; _.B); lam(x.snd(x)))]
This one is a bit more complicated. The domain of things we’re applying our function to this time is
(x : unit + unit) * decide(x; _.A; _.B)
This is a dependent pair, sometimes called a Σ type. The first component is a boolean; if it is true
the second component is of type A
, and otherwise it’s of type B
. So for every term of type A
or B
, there’s a term of this Σ type. In fact, we can recover that original term of type A
or B
by just grabbing the second component of the term! We don’t have to worry about the type of such an operation because we’re not creating something with a function type, just something in base
.
union
s let us define an absolutely critical admissible rule in our system. JonPRL has this propositional reflection of the equality judgment and membership, but in MartinLöf’s type theory, membership is nonnegatable. By this I mean that if we have some a
so that a = a ∈ A
doesn’t hold, we won’t be able to prove =(a; a; A) > void
. See in order to prove such a thing we first have to prove that =(a; b; A) > void
is a type, which means proving that =(a; a; A)
is a type.
In order to prove that =(a; b; A)
is a proposition we have to prove =(a; a; A)
, =(b; b; A)
, and =(A; A; U{i})
. The process of proving these will actually also show that the corresponding judgments, a ∈ A
, b ∈ A
, and A ∈ U{i}
hold.
However, in the case that a
and b
are the same term this is just the same as proving =(a; b; A)
! So =(a; a; A)
is a proposition only if it’s true. However, we can add a rule that says that =(a; b; A)
is a proposition if a = a ∈ (A ∪ base)
and similarly for b
! This fixes our negatibility issue because we can just prove that =(a; a; base)
, something that may be true even if a
is not equal in A
. Before having a function take a member(...)
was useless (member(a; A)
is just thin sugar for =(a; a; A)
! member(a; A)
is a proposition if and only if a = a ∈ A
holds, in other words, it’s a proposition if and only if it’s true! With this new rule, we can prove member(a; A)
is a proposition if A ∈ U{i}
and a ∈ base
, a much weaker set of conditions that are almost always true. We can apply this special rule in JonPRL with eqeqbase
instead of just eqcd
like the rest of our equality rules.
Now let’s actually begin proving Russell’s paradox. To start with some notation.
Infix 20 "∈" := member.
Infix 40 "~" := ceq.
Infix 60 "∪" := bunion.
Prefix 40 "¬" := not.
This let’s us say a ∈ b
instead of member(a; b)
. JonPRL recently grew this ability to add transparent notation to terms, it makes our theorems a lot prettier.
Next we define the central term to our proof:
Operator Russell : ().
[Russell] =def= [{x : U{i}  ¬ (x ∈ x)}]
Here we’ve defined Russell
as shorthand for a subset type, in particular a subset of U{i}
(the universe of types). x ∈ Russell
if x ∈ U{i}
and ¬ (x ∈ x)
. Now normally we won’t be able to prove that this is a type (specifically x ∈ x
is going to be a problem), but in our case we’ll have some help from an assumption that U{i} ∈ U{i}
.
Now we begin to define a small set of tactics that we’ll want. These tactics are really where the fiddly bits of using JonPRL’s tactic system come into play. If you’re just reading this for the intuition as to why Type ∈ Type
is bad just skip this. You’ll still understand the construction even if you don’t understand these bits of the proof.
First we have a tactic which finds an occurrence of H : A + B
in the context and eliminate it. This gives us two goals, one with an A
and one with a B
. To do this we use match, which gives us something like match goal with
in Coq.
Tactic breakplus {
@{ [H : _ + _  _] => elim <H>; thin <H> }
}.
Note the syntax [H : ...  ...]
to match on a sequent. In particular here we just have _ + _
and _
. Next we have a tactic bunioneqright
. It’s to help us work with bunion
s (unions). Basically it turns =(M; N; bunion(A; B))
into
=(lam(x.snd(x)) <<>, M>; lam(x.snd(x)) <<>, N>; bunion(A; B))
This is actually helpful because it turns out that once we unfold bunion
we have to prove that M
and N
are in an image type, remember that bunion
is just a thin layer of sugar on top of image types. In order to prove something is in the image type it needs to be of the form f a
where f
in our case is lam(x. snd(x))
.
This is done with
Tactic bunioneqright {
@{ [ =(M; N; L ∪ R)] =>
csubst [M ~ lam(x. snd(x)) <inr(<>), M>] [h.=(h;_;_)];
aux { unfold <snd>; reduce; auto };
csubst [N ~ lam(x. snd(x)) <inr(<>), N>] [h.=(_;h;_)];
aux { unfold <snd>; reduce; auto };
}
}.
The key here is csubst
. It takes a ceq
as its first argument and a “targeting”. It then tries to replace each occurrence of the left side of the equality with the right. To find each occurrence the targeting maps a variable to each occurrence. We’re allowed to use wildcards in the targeting as well. It also relegates actually proving the equality into a new subgoal. It’s easy enough to prove so we demonstrate it with aux {unfold <snd>; reduce; auto}
.
We only need to apply this tactic after eqeqbase
, this applies that rule I mentioned earlier about proving equalities to be wellformed in a much more liberal environment. Therefore we wrap those two tactics into one more convenient package.
Tactic eqbasetac {
@{ [ =(=(M; N; A); =(M'; N'; A'); _)] =>
eqeqbase; auto;
bunioneqright; unfold <bunion>
}
}.
There is one last tactic in this series, this one to prove that member(X; X) ∈ U{i'}
is wellformed (a type). It starts by unfolding member
into =(=(X; X; X); =(X; X; X); U{i})
and then applying the new tactic. Then we do other things. These things aren’t pretty. I suggest we just ignore them.
Tactic impredicativitywftac {
unfold <member>; eqbasetac;
eqcd; ?{@{[ =(_; _; base)] => auto}};
eqcd @i'; ?{breakplus}; reduce; auto
}.
Finally we have a tactic to prove that if we have not(P)
and P
existing in the context proves void
. This is another nice application match
Tactic contradiction {
unfold <not implies>;
@{ [H : P > void, H' : P  void] =>
elim <H> [H'];
unfold <member>;
auto
}
}.
We start by unfolding not
and implies
. This gives us P > void
and P
. From there, we just apply one to the other giving us a void as we wanted.
We’re now ready to prove our theorem. We start with
Theorem typenotintype : [¬ (U{i} ∈ U{i})] {
}.
We now have the main subgoal
Remaining subgoals:
[main] ⊢ not(member(U{i}; U{i}))
We can start by unfold not
and implies
. Remember that not
isn’t a built in thing, it’s just sugar. By unfolding it we get the more primitive form, something that actually apply the intro
tactic to.
{
unfold <not implies>; intro
}
Once unfolded, we’d get a goal along the lines of member(U{i}; U{i}) > void
. We immediately apply intro
to this though. Now we have two subgoals; one is the result of applying intro
, namely a hypothesis x : member(U{i}; U{i})
and a goal void
. The second subgoal is the “wellformedness” obligation.
We have to prove that member(U{i}; U{i})
is a type in order to apply the intro
tactic. This is a crucial difference between Coqlike systems and these proofrefinement logics. The process of demonstrating that what you’re proving is a proposition is intermingled with actually constructing the proof. It means you get to apply all the normal mathematical tools you have for proving things to be true in order to prove that they’re types. This gives us a lot of flexibility, but at the cost of sometimes annoying subgoals. They’re annotated with [aux]
(as opposed to [main]
). This means we can target them all at once using with the aux
tactics.
To summarize that whole paragraph as JonPRL would say it, our proof state is
[main]
1. x : member(U{i}; U{i})
⊢ void
[aux] ⊢ member(member(U{i}; U{i}); U{i'})
Let’s get rid of that auxiliary subgoal using that impredictivitywftac
, this subgoal is in fact exactly what it was made for.
{
unfold <not implies>; intro
aux { impredicativitywftac };
}
This picks off that [aux]
goal leaving us with just
[main]
1. x : member(U{i}; U{i})
⊢ void
Now we need to prove some lemmas. They state that Russell
is actually a type. This is possible to do here and only here because we’ll need to actually use x
in the process of proving this. It’s a very nice example of what explicitly proving wellformedness can give you! After all, the process of demonstrating that Russell
is a type is nontrivial and only possible in this hypothetical context, rather than just hoping that JonPRL is clever enough to figure that out for itself we get to demonstrate it locally.
We’re going to use the assert
tactic to get these lemmas. This lets us state a term, prove it as a subgoal and use it as a hypothesis in the main goal. If you’re logically minded, it’s cut.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
}
The thing in angle brackets is the name it will get in our hypothetical context for the main goal. This leaves us with two subgoals. The aux
one being the assertion and the main
one being allowed to assume it.
[aux]
1. x : member(U{i}; U{i})
⊢ member(Russell; U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
We can prove this by basically working our way towards using impredicativitywftac
. We’ll use aux
again to target the aux
subgoal. We’ll start by unfolding everything and applying eqcd
.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
};
}
Remember that Russell
is {x : U{i}  ¬ (x ∈ x)}
We just applied eqcd
to a subset type (Russell
), so we get two subgoals. One says that U{i}
is a type, one says that if x ∈ U{i}
then ¬ (x ∈ x)
is also a type. In essence this just says that a subset type is a type if both components are types. The former goal is quite straightforward so we applied auto
and take care of it. Now we have one new subgoal to handle
[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(not(member(x'; x')); not(member(x'; x')); U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
The second subgoal is just the rest of the proof, and the first subgoal is what we want to handle. It says that if we have a type x
, then not(member(x; x))
is a type (albeit in ugly notation). To prove this we have to unfold not
. So we’ll do this and apply eqcd
again.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
};
}
Remember that not(P)
desugars to P > void
. Applying eqcd
is going to give us two subgoals, P
is a type and void
is a type. However, member(void; U{i})
is pretty easy to prove, so we apply auto
again which takes care of one of our two new goals. Now we just have
[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(member(x'; x'); member(x'; x'); U{i})
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
Now we’re getting to the root of the issue. We’re trying to prove that member(x'; x')
is a type. This is happily handled by impredicativitywftac
which will use our assumption that U{i} ∈ U{i}
because it’s smart like that.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
}
Now we just have that main goal with the assumption russellwf
added.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
⊢ void
Now we have a similar wellformedness goal to assert and prove. We want to prove that ∈(Russell; Russell)
is a type. This is easier though; we can prove it easily using impredicativitywftac
.
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { impredicativitywftac; cum @i; auto };
}
That cum @i
is a quirk of impredicativitywftac
. It basically means that instead of proving =(...; ...; U{i'})
we can prove =(...; ...; U{i})
since U{i}
is a universe below U{i'}
and all universes are cumulative.
Our goal is now
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
⊢ void
Ok, so now the reasoning can start now that we have all these wellformedness lemmas. Our proof sketch is basically as follows
Russell ∈ Russell
is false. This is because if Russell
was in Russell
then by definition of Russell
it isn’t in Russell
.not(Russell ∈ Russell)
holds, then Russell ∈ Russell
holds.Here’s the first assertion:
{
unfold <not implies>; intro;
aux { impredicativitywftac };
assert [Russell ∈ U{i}] <russellwf>;
aux {
unfold <member Russell>; eqcd; auto;
unfold <not implies>; eqcd; auto;
impredicativitywftac
};
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { impredicativitywftac; cum @i; auto };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
}
Here are our subgoals:
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
⊢ not(member(Russell; Russell))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
We want to prove that first one. To start, let’s unfold that not
and move member(Russell; Russell)
to the hypothesis and use it to prove void
. We do this with intro
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
}
}
Notice that the wellformedness goal that intro
generated is handled by our assumption! After all, it’s just member(Russell; Russell) ∈ U{i}
, we already proved it. Now our subgoals look like this
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Here’s our clever plan
Russell ∈ Russell
, there’s an X : Russell
so that ceq(Russell; X)
holdsX : Russell
, we can unfold it to say that X : {x ∈ U{i}  ¬ (x ∈ x)}
X
and derive that ¬ (X ∈ X)
ceq(Russell; X)
gives ¬ (Russell; Russell)
Let’s start explaining this to JonPRL by introducing that X
(here called R
). We’ll assert an R : Russell
such that R ~ Russell
. We do this using dependent pairs (here written (x : A) * B(x)
).
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
}
}
We’ve proven this by intro
. For proving dependent products we provide an explicit witness for the first component. Basically to prove (x : A) * B(x)
we say intro [Foo]
. We then have a goal Foo ∈ A
and B(Foo)
. Since subgoals are fully independent of each other, we have to give the witness for the first component upfront. It’s a little awkward, Jon’s working on it :).
In this case we use intro [Russell]
. After this we have to prove that this witness has type Russell
and then prove the second component holds. Happily, auto
takes care of both of these obligations so intro [Russell] @i; auto
handles it all.
Now we promptly eliminate this pair. It gives us two new facts, that R : Russell
and R ~ Russell
hold.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>
}
}
This leaves our goal as
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
5. s : Russell
6. t : ceq(s; Russell)
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now let’s invert on the hypothesis that s : Russell
; we want to use it to conclude that ¬ (s ∈ s)
holds since that will give us ¬ (R ∈ R)
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
}
}
Now that we’ve unfolded all of those Russell
s our goal is a little bit harder to read, remember to mentally substitute {x : U{i}  not(member(x; x))}
as Russell
.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now we use #7 to derive that not(member(Russell; Russell))
holds.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ member(Russell; Russell)];
aux {
unfold <Russell>;
};
}
}
This leaves us with 3 subgoals, the first one being the assertion.
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ not(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now to prove this, what we need to do is substitute the unfolded Russell
for x''
; from there it’s immediate by assumption. We perform the substitution with chypsubst
. This takes a direction in which to substitute, which hypothesis to use, and another targeting telling us where to apply the substitution.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ member(Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
};
}
}
This leaves us with a much more tractable goal.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
⊢ not(member(x''; x''))
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
We’d like to just apply assumption
but it’s not immediately applicable due to some technically details (basically we can only apply an assumption in a proof irrelevant context but we have to unfold Russell
and introduce it to demonstrate that it’s irrelevant). So just read what’s left as a (very) convoluted assumption
.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ (Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
unfold <not implies>
intro; aux { impredicativitywftac };
contradiction
};
}
}
Now we’re almost through this assertion, our subgoals look like this (pay attention to 9 and 4)
[main]
1. x : member(U{i}; U{i})
2. russellwf : member({x:U{i}  not(member(x; x))}; U{i})
3. russellinrussellwf : member(member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))}); U{i})
4. x' : member({x:U{i}  not(member(x; x))}; {x:U{i}  not(member(x; x))})
5. s : {x:U{i}  not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i}  not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Once we unfold that Russell
we have an immediate contradiction so unfold <Russell>; contradiction
solves it.
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux {
unfold <not implies>;
intro @i; aux {assumption};
assert [(R : Russell) * R ~ Russell] <Rwithprop>;
aux {
intro [Russell] @i; auto
};
elim <Rwithprop>; thin <Rwithprop>;
unfold <Russell>; elim #5;
assert [¬ (Russell; Russell)];
aux {
unfold <Russell>;
chypsubst ← #8 [h. ¬ (h ∈ h)];
unfold <not implies>;
intro; aux { impredicativitywftac };
contradiction
};
unfold <Russell>; contradiction
}
}
This takes care of this subgoal, so now we’re back on the main goal. This time though we have an extra hypothesis which will provide the leverage we need to prove our next assertion.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ void
Now we’re going to claim that Russell
is in fact a member of Russell
. This will follow from the fact that we’ve proved already that Russell
isn’t in Russell
(yeah, it seems pretty paradoxical already).
{
unfold <not implies>; intro;
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell];
}
Giving us
[aux]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
⊢ member(Russell; Russell)
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
5. H : member(Russell; Russell)
⊢ void
Proving this is pretty straightforward, we only have to demonstrate that not(Russell ∈ Russell)
and Russell ∈ U{i}
, both of which we have as assumptions. The rest of the proof is just more wellformedness goals.
First we unfold everything and apply eqcd
. This gives us 3 subgoals, the first two are Russell ∈ U{i}
and ¬(Russell ∈ Russell)
. Since we have these as assumptions we’ll use main {assumption}
. That will target both these goals and prove them immediately. Here by using main
we avoid applying this to the wellformedness goal, which in this case actually isn’t the assumption.
{
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell];
aux {
unfold <member Russell>; eqcd;
unfold <member>;
main { assumption };
};
}
This just leaves us with one awful wellformedness goal requiring us to prove that not(=(x; x; x))
is a type if x
is a type. We actually proved something similar back when we prove that Russell
was wellformed. The proof is the same as then, just unfold, eqcd
and impredicativitywftac
. We use ?{!{auto}}
to only apply auto
in a subgoal where it immediately proves it. Here ?{}
says “run this or do nothing” and !{}
says “run this, if it succeeds stop, if it does anything else, fail”. This is not an interesting portion of the proof, don’t burn too many cycles trying to figure this out.
{
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell] <russellinrussell>;
aux {
unfold <member Russell>; eqcd;
unfold <member>;
main { assumption };
unfold <not implies>; eqcd; ?{!{auto}};
impredicativitywftac;
};
}
Now we just have the final subgoal to prove. We’re actually in a position to do so now.
[main]
1. x : member(U{i}; U{i})
2. russellwf : member(Russell; U{i})
3. russellinrussellwf : member(member(Russell; Russell); U{i})
4. russellnotinrussell : not(member(Russell; Russell))
5. russellinrussell : member(Russell; Russell)
⊢ void
Now that we’ve shown P
and not(P)
hold at the same time all we need to do is apply contradiction
and we’re done.
Theorem typenotintype [¬ (U{i} ∈ U{i})] {
unfold <not implies>; intro
aux { ... };
assert [Russell ∈ U{i}] <russellwf>;
aux { ... };
assert [(Russell ∈ Russell) ∈ U{i}] <russellinrussellwf>;
aux { ... };
assert [¬ (Russell ∈ Russell)] <notrussellinrussell>;
aux { ... };
assert [Russell ∈ Russell] <russellinrussell>;
aux { ... };
contradiction
}.
And there you have it, a complete proof of Russell’s paradox fully formalized in JonPRL! We actually proved a slightly stronger result than just that the type of types cannot be in itself, we proved that at any point in the hierarchy of universes (the first of which is Type
/*
/whatever) if you tie it off, you’ll get a contradiction.
I hope you found this proof interesting. Even if you’re not at all interested in JonPRL, it’s nice to see that allowing one to have U{i} ∈ U{i}
or * :: *
gives you the ability to have a type like Russell
and with it, inhabit void
. I also find it especially pleasing that we can prove something like this in JonPRL; it’s growing up so fast.
Thanks to Jon for greatly improving the original proof we had
comments powered by Disqus ]]>I wanted to write about something related to all the stuff I’ve been reading for research lately. I decided to talk about a super cool trick in a field called domain theory. It’s a method of generating a solution to a large class of recursive equations.
In order to go through this idea we’ve got some background to cover. I wanted to make this post readable even if you haven’t read too much domain theory (you do need to know what a functor/colimit is though, nothing crazy though). We’ll start with a whirlwind tutorial of the math behind domain theory. From there we’ll transform the problem of finding a solution to an equation into something categorically tractable. Finally, I’ll walk through the construction of a solution.
I decided not to show an example of applying this technique to model a language because that would warrant its own post, hopefully I’ll write about that soon :)
The basic idea with domain theory comes from a simple problem. Suppose we want to model the lambda calculus. We want a collection of mathematical objects D
so that we can treat element of D
as a function D > D
and each function D > D
as an element of D
. To see why this is natural, remember that we want to turn each program E
into d ∈ D
. If E = λ x. E'
then we need to turn the function e ↦ [e/x]E'
into a term. This means D → D
needs to be embeddable in D
. On the other hand, we might have E = E' E''
in which case we need to turn E'
into a function D → D
so that we can apply it. This means we need to be able to embed D
into D → D
.
After this we can turn a lambda calculus program into a specific element of D
and reason about its properties using the ambient mathematical tools for D
. This is semantics, understanding programs by studying their meaning in some mathematical structure. In our specific case that structure is D
with the isomorphism D ≅ D → D
. However, there’s an issue! We know that D
can’t just be a set because then there cannot be such an isomorphism! In the case where D ≅ N
, then D → D ≅ R
and there’s a nice proof by diagonalization that such an isomorphism cannot exist.
So what can we do? We know there are only countably many programs, but we’re trying to state that there exists an isomorphism between our programs (countable) and functions on them (uncountable). Well the issue is that we don’t really mean all functions on D
, just the ones we can model as lambda terms. For example, the function which maps all divergent programs to 1
and all terminating ones to 0
need not be considered because there’s no lambda term for it! How do we consider “computable” functions though? It’s not obvious since we define computable functions using the lambda calculus, what we’re trying to model here. Let’s set aside this question for a moment.
Another question is how do we handle this program: (λ x. x x) (λ x. x x)
? It doesn’t have a value after all! It doesn’t behave like a normal mathematical function because applying it to something doesn’t give us back a new term, it just runs forever! To handle this we do something really clever. We stop considering just a collection of terms and instead look at terms with an ordering relation ⊑
! The idea is that ⊑ represents definedness. A program which runs to a value is more defined than a program which just loops forever. Similarly, two functions behave the same on all inputs except for 0
where one loops we could say one is more defined than the other. What we’ll do is define ⊑ abstractly and then model programs into sets with such a relation defined upon them. In order to build up this theory we need a few definitions
A partially ordered set (poset) is a set A
and a binary relation ⊑
where
a ⊑ a
a ⊑ b
and b ⊑ c
implies a ⊑ c
a ⊑ b
and b ⊑ a
implies a = b
We often just denote the pair <A, ⊑>
as A
when the ordering is clear. With a poset A
, of particular interest are chains in it. A chain is collection of elements aᵢ
so that aᵢ ⊑ aⱼ
if i ≤ j
. For example, in the partial order of natural numbers and ≤
, a chain is just a run of ascending numbers. Another fundamental concept is called a least upper bound (lub). A lub of a subset P ⊆ A
is an element of x ∈ A
so that y ∈ P
implies y ⊑ x
and if this property holds for some z
also in A
, then x ⊑ z
. So a least upper bound is just the smallest thing bigger than the subset. This isn’t always guaranteed to exist, for example, in our poset of natural numbers N
, the subset N
has no upper bounds at all! When such a lub does exist, we denote it with ⊔P
. Some partial orders have an interesting property, all chains in them have least upper bounds. We call this posets complete partial orders or cpos.
For example while N
isn’t a cpo, ω
(the natural numbers + an element greater than all of them) is! As a quick puzzle, can you show that all finite partial orders are in fact CPOs?
We can define a number of basic constructions on cpos. The most common is the “lifting” operation which takes a cpo D
and returns D⊥
, a cpo with a least element ⊥
. A cpo with such a least element is called “pointed” and I’ll write that as cppo (complete pointed partial order). Another common example, given two cppos, D
and E
, we can construct D ⊗ E
. An element of this cppo is either ⊥
or <l, r>
where l ∈ D  {⊥}
and r ∈ E  {⊥}
. This is called the smash product because it “smashes” the ⊥s out of the components. Similarly, there’s smash sums D ⊕ E
.
The next question is the classic algebraic question to ask about a structure: what are the interesting functions on it? We’ll in particular be interested in functions which preserve the ⊑ relation and the taking of lub’s on chains. For this we have two more definitions:
x ⊑ y
implies f(x) ⊑ f(y)
C
, ⊔ f(P) = f(⊔ P)
.Notably, the collection of cppos and continuous functions form a category! This is because clearly x ↦ x
is continuous and the composition of two continuous functions is continuous. This category is called Cpo
. It’s here that we’re going to do most of our interesting constructions.
Finally, we have to discuss one important construction on Cpo
: D → E
. This is the set of continuous functions from D
to E
. The ordering on this is pointwise, meaning that f ⊑ g
if for all x ∈ D
, f(x) ⊑ g(x)
. This is a cppo where ⊥
is x ↦ ⊥
and all the lubs are determined pointwise.
This gives us most of the mathematics we need to do the constructions we’re going to want, to demonstrate something cool here’s a fun theorem which turns out to be incredibly useful: Any continuous function f : D → D
on a cppo D
has a least fixed point.
To construct this least point we need to find an x
so that x = f(x)
. To do this, note first that x ⊑ f(x)
by definition and by the monotonicity of f
: f(x) ⊑ f(y)
if x ⊆ y
. This means that the collection of elements fⁱ(⊥)
forms a chain with the ith element being the ith iteration of f
! Since D
is a cppo, this chain has an upper bound: ⊔ fⁱ(⊥)
. Moreover, f(⊔ fⁱ(⊥)) = ⊔ f(fⁱ(⊥))
by the continuity of f
, but ⊔ fⁱ(⊥) = ⊥ ⊔ (⊔ f(fⁱ(⊥))) = ⊔ f(fⁱ(⊥))
so this is a fixed point! The proof that it’s a least fixed point is elided because typesetting in markdown is a bit of a bother.
So there you have it, very, very basic domain theory. I can now answer the question we weren’t sure about before, the slogan is “computable functions are continuous functions”.
Cpo
So now we can get to the result showing domain theory incredibly useful. Remember our problem before? We wanted to find a collection D
so that
D ≅ D → D
However it wasn’t clear how to do this due to size issues. In Cpo
however, we can absolutely solve this. This huge result was due to Dana Scott. First, we make a small transformation to the problem that’s very common in these scenarios. Instead of trying to solve this equation (something we don’t have very many tools for) we’re going to instead look for the fixpoint of this functor
F(X) = X → X
The idea here is that we’re going to prove that all well behaved endofunctors on Cpo have fixpoints. By using this viewpoint we get all the powerful tools we normally have for reasoning about functors in category theory. However, there’s a problem: the above isn’t a functor! It has both positive and negative occurrences of X
so it’s neither a co nor contravariant functor. To handle this we apply another clever trick. Let’s not look at endofunctors, but rather functors Cpoᵒ × Cpo → Cpo
(I believe this should be attributed to Freyd). This is a binary functor which is covariant in the second argument and contravariant in the first. We’ll use the first argument everywhere there’s a negative occurrence of X
and the second for every positive occurrence. Take note: we need things to be contravariant in the first argument because we’re using that first argument negatively: if we didn’t do that we wouldn’t have a functor.
Now we have
F(X⁻, X⁺) = X⁻ → X⁺
This is functorial. We can also always recover the original map simply by diagonalizing: F(X) = F(X, X)
. We’ll now look for an object D
so that F(D, D) ≅ D
. Not quite a fixed point, but still equivalent to the equation we were looking at earlier.
Furthermore, we need one last critical property, we want F
to be locally continuous. This means that the maps on morphisms determined by F
should be continuous so F(⊔ P, g) = ⊔ F(P, g)
and viceversa (here P
is a set of functions). Note that such morphisms have an ordering because they belong to the pointwise ordered cppo we talked about earlier.
We have one final thing to set up before this proof: what about if there’s multiple nonisomorphic solutions to F
? We want a further coherence condition that’s going to provide us with 2 things
What we want is called minimal invariance. Suppose we have a D
and an i : D ≅ F(D, D)
. This is the minimal invariant solution if and only if the least fixed point of f(e) = i⁻ ∘ F(e, e) ∘ i
is id
. In other words, we want it to be the case that
d = ⊔ₓ fˣ(⊥)(d) (d ∈ D)
I mentally picture this as saying that the isomorphism is set up so that for any particular d
we choose, if we apply i
, fmap
over it, apply i
again, repeat and repeat, eventually this process will halt and we’ll run out of things to fmap
over. It’s a sort of a statement that each d ∈ D
is “finite” in a very, very handwavy sense. Don’t worry if that didn’t make much sense, it’s helpful to me but it’s just my intuition. This property has some interesting effects though: it means that if we find such a D
then (D, D)
is going to be both the initial algebra and final coalgebra of F
.
Without further ado, let’s prove that every locally continuous functor F
. We start by defining the following
D₀ = {⊥}
Dᵢ = F(Dᵢ₋₁, Dᵢ₋₁)
This gives us a chain of cppos that gradually get larger. How do we show that they’re getting larger? By defining an section from Dᵢ
to Dⱼ
where j = i + 1
. A section is a function f
which is paired with a (unique) function f⁰
so that f⁰f = id
and ff⁰ ⊑ id
. In other words, f
embeds its domain into the codomain and f⁰
tells us how to get it out. Putting something in and taking it out is a round trip. Since the codomain may be bigger though taking something out and putting it back only approximates a round trip. Our sections are defined thusly
s₀ = x ↦ ⊥ r₀ = x ↦ ⊥
sᵢ = F(rᵢ₋₁, sᵢ₋₁) rᵢ = F(rᵢ₋₁, sᵢ₋₁)
It would be very instructive to work out that these definitions are actually sections and retractions. Since typesetting this subscripts is a little rough, if it’s clear from context I’ll just write r
and s
. Now we’ve got this increasing chain, we define an interesting object
D = {x ∈ Πᵢ Dᵢ  x.(i1) = r(x.i)}
In other words, D
is the collection of infinitely large pairs. Each component if from one of those Dᵢ
s above and they cohere with each other so using s
and r
to step up the chain takes you from one component to the next. Next we define a way to go from a single Dᵢ
to a D
: upᵢ : Dᵢ → D
where
upᵢ(x).j = x if i = j
 rᵈ(x) if i  j = d > 0
 sᵈ(x) if j  i = d > 0
Interestingly, note that πᵢ ∘ upᵢ = id
(easy proof) and that upᵢ ∘ πᵢ ⊑ id
(slightly harder proof). This means that we’ve got more sections lying around: every Dᵢ
can be fed into D
. Consider the following diagram
s s s
D0 ——> D1 ——> D2 ——> ...
I claim that D
is the colimit to this diagram where the collection of arrows mapping into it are given with upᵢ
. Seeing this is a colimit follows from the fact that πᵢ ∘ upᵢ
is just id
. Specifically, suppose we have some object C
and a family of morphisms cᵢ : Dᵢ → C
which commute properly with s
. We need to find a unique morphism h
so that cᵢ = h ∘ upᵢ
. Define h
as ⊔ᵢ cᵢπᵢ
. Then
h ∘ upⱼ = (⊔j<i cᵢsʲrʲ) ⊔ cᵢ ⊔ (⊔j>i cᵢrʲsʲ) = (⊔j<i cᵢsʲrʲ) ⊔ cᵢ
The last step follows from the fact that rʲsʲ = id
. Furthermore, sʲrʲ ⊑ id
so cᵢsʲrʲ ⊑ cᵢ
so that whole massive term just evaluates to cᵢ
as required. So we have a colimit. Notice that if we apply F
to each Dᵢ
in the diagram we end up with a new diagram.
s s s
D1 ——> D2 ——> D3 ——> ...
D
is still the colimit (all we’ve done is shift the diagram over by one) but by identical reasoning to D
being a colimit, so is F(D, D)
. This means we have a unique isomorphism i : D ≅ F(D, D)
. The fact that i
is the minimal invariant follows from the properties we get from the fact that i
comes from a colimit.
With this construction we can construct our model of the lambda calculus simply by finding the minimal invariant of the locally continuous functor F(D⁻, D⁺) = D⁻ → D⁺
(it’s worth proving it’s locally continuous). Our denotation is defined as [e]ρ ∈ D
where e
is a lambda term and ρ
is a map of the free variables of e
to other elements of D
. This is inductively defined as
[λx. e]ρ = i⁻(d ↦ [e]ρ[x ↦ d])
[e e']ρ = i([e]ρ)([e']ρ)
[x]ρ = ρ(x)
Notice here that for the two main constructions we just use i
and i⁻
to fold and unfold the denotations to treat them as functions. We could go on to prove that this denotation is sound and complete but that’s something for another post.
That’s the main result I wanted to demonstrate. With this single proof we can actually model a very large class of programming languages into Cpo
. Hopefully I’ll get around to showing how we can pull a similar trick with a relational structure on Cpo
in order to prove full abstraction. This is nicely explained in Andrew Pitt’s “Relational Properties of Domains”.
If you’re interested in domain theory I learned from Gunter’s “Semantics of Programming Languages” book and recommend it.
comments powered by Disqus ]]>I’ve been trying to write a blog post to this effect for a while now, hopefully this one will stick. I intend for this to be a bit more openended than most of my other posts, if you’re interested in seeing the updated version look here. Pull requests/issues are more than welcome on the repository. I hope you learn something from this.
Lots of people seem curious about type theory but it’s not at all clear how to go from no math background to understanding “Homotopical Patch Theory” or whatever the latest cool paper is. In this repository I’ve gathered links to some of the resources I’ve personally found helpful.
I strongly urge you to start by reading one or more of the textbooks immediately below. They give a nice selfcontained introduction and a foundation for understanding the papers that follow. Don’t get hung up on any particular thing, it’s always easier to skim the first time and read closely on a second pass.
Practical Foundations of Programming Languages (PFPL)
I reference this more than any other book. It’s a very wide ranging survey of programming languages that assumes very little background knowledge. A lot people prefer the next book I mention but I think PFPL does a better job explaining the foundations it works from and then covers more topics I find interesting.
Types and Programming Languages (TAPL)
Another very widely used introductory book (the one I learned with). It’s good to read in conjunction with PFPL as they emphasize things differently. Notably, this includes descriptions of type inference which PFPL lacks and TAPL lacks most of PFPL’s descriptions of concurrency/interesting imperative languages. Like PFPL this is very accessible and well written.
Advanced Topics in Types and Programming Languages (ATTAPL)
Don’t feel the urge to read this all at once. It’s a bunch of fully independent but excellent chapters on a bunch of different topics. Read what looks interesting, save what doesn’t. It’s good to have in case you ever need to learn more about one of the subjects in a pinch.
One of the fun parts of taking in an interest in type theory is that you get all sorts of fun new programming languages to play with. Some major proof assistants are
Coq
Coq is one of the more widely used proof assistants and has the best introductory material by far in my opinion.
Agda
Agda is in many respects similar to Coq, but is a smaller language overall. It’s relatively easy to learn Agda after Coq so I recommend doing that. Agda has some really interesting advanced constructs like inductionrecursion.
Idris
It might not be fair to put Idris in a list of “proof assistants” since it really wants to be a proper programming language. It’s one of the first serious attempts at writing a programming language with dependent types for actual programming though.
Twelf
Twelf is by far the simplest system in this list, it’s the absolute minimum a language can have and still be dependently typed. All of this makes it easy to pick up, but there are very few users and not a lot of introductory material which makes it a bit harder to get started with. It does scale up to serious use though.
Per MartinLöf has contributed a ton to the current state of dependent type theory. So much so that it’s impossible to escape his influence. His papers on MartinLöf Type Theory (he called it Intuitionistic Type Theory) are seminal.
If you're confused by the papers above read the book in the next
entry and try again. The book doesn't give you as good a feel for
the various flavors of MLTT (which spun off into different areas
of research) but is easier to follow.
It’s good to read the original papers and here things from the horses mouth, but MartinLöf is much smarter than us and it’s nice to read other people explanations of his material. A group of people at Chalmer’s have elaborated it into a book.
The Works of John Reynold’s
John Reynold’s works are similarly impressive and always a pleasure to read.
Computational Type Theory
While most dependent type theories (like the ones found in Coq, Agda, Idris..) are based on MartinLöf later intensional type theories, computational type theory is different. It’s a direct descendant of his extensional type theory that has been heavily developed and forms the basis of NuPRL nowadays. The resources below describe the various parts of how CTT works.
Homotopy Type Theory
A new exciting branch of type theory. This exploits the connection between homotopy theory and type theory by treating types as spaces. It’s the subject of a lot of active research but has some really nice introductory resources even now.
Frank Pfenning’s Lecture Notes
Over the years, Frank Pfenning has accumulated lecture notes that are nothing short of heroic. They’re wonderful to read and almost as good as being in one of his lectures.
Learning category theory is necessary to understand some parts of type theory. If you decide to study categorical semantics, realizability, or domain theory eventually you’ll have to buckledown and learn a little at least. It’s actually really cool math so no harm done!
This is the absolute smallest introduction to category theory you can find that’s still useful for a computer scientist. It’s very light on what it demands for prior knowledge of pure math but doesn’t go into too much depth.
Category Theory
One of the better introductory books to category theory in my opinion. It’s notable in assuming relatively little mathematical background and for covering quite a lot of ground in a readable way.
Ed Morehouse’s Category Theory Lecture Notes
Another valuable piece of reading are these lecture notes. They cover a lot of the same areas as “Category Theory” so they can help to reinforce what you learned there as well giving you some of the author’s perspective on how to think about these things.
While I’m not as big a fan of some of the earlier chapters, the math presented in this book is absolutely topnotch and gives a good understanding of how some cool fields (like domain theory) work.
OPLSS
The Oregon Programming Languages Summer School is a 2 week long bootcamp on PLs held annually at the university of Oregon. It’s a wonderful event to attend but if you can’t make it they record all their lectures anyways! They’re taught be a variety of lecturers but they’re all world class researchers.
comments powered by Disqus ]]>So as a follow up to my prior tutorial on JonPRL I wanted to demonstrate a nice example of JonPRL being used to prove something
Unreasonably difficult in Agda or the like
I think I’m asking to be shown up when I say stuff like this…
I would like to implement the conatural numbers in JonPRL but without a notion of general coinductive or even inductive types. Just the natural numbers. The fun bit is that we’re basically going to lift the definition of a coinductively defined set straight out of set theory into JonPRL!
First, let’s go through some math. How can we formalize the notion of an coinductively defined type as we’re used to in programming languages? Recall that something is coinductively if it contains all terms so that we can eliminate the term according to the elimination form for our type. For example, MartinLof has proposed we view functions (Πtypes) as coinductively defined. That is,
x : A ⊢ f(x) : B(x)
————————————————————
f : Π x : A. B(x)
In particular, there’s no assertion that f
needs to be a lambda, just that f(x)
is defined and belongs to the right type. This view of “if we can use it, it’s in the type” applies to more than just functions. Let’s suppose we have a type with the following elimination form
L : List M : A x : Nat, y : List : A
——————————————————————————————————————
case(L; M; x.y.N) : A
This is more familiar to Haskellers as
case L of
[] > M
x :: y > N
Now if we look at the coinductively defined type built from this elimination rule we have not finite lists, but streams! There’s nothing in this elimination rule that specifies that the list be finite in length for it to terminate. All we need to be able to do is evaluate the term to either a ::
of a Nat
and a List
or nil
. This means that
fix x. cons(0; x) : List
Let’s now try to formalize this by describing what it means to build a coinductively type up as a set of terms. In particular the types we’re interested in here are algebraic ones, constructed from sums and products.
Now unfortunately I’m going to be a little handwavy. I’m going to act is if we’ve worked out a careful set theoretic semantics for this programming language (like the one that exists for MLTT). This means that All the equations you see here are across sets and that these sets contain programs so that ⊢ e : τ
means that e ∈ τ
where τ
on the right is a set.
Well we start with some equation of the form
Φ = 1 + Φ
This particular equation a is actually how we would go about defining the natural numbers. If I write it in a more Haskellish notation we’d have
data Φ = Zero  Succ Φ
Next, we transform this into a function. This step is a deliberate move so we can start applying the myriad tools we know of for handling this equation.
Φ(X) = 1 + X
We now want to find some X
so that Φ(X) = X
. If we can do this, then I claim that X
is a solution to the equation given above since
X = Φ(X)
X = 1 + X
precisely mirrors the equation we had above. Such an X
is called a “fixed point” of the function Φ
. However, there’s a catch: there may well be more than one fixed point of a function! Which one do we choose? The key is that we want the coinductively defined version. Coinduction means that we should always be able to examine a term in our type and its outermost form should be 1 + ???
. Okay, let’s optimistically start by saying that X
is ⊤
(the collection of all terms).
Ah okay, this isn’t right. This works only so long as we don’t make any observations about a term we claim is in this type. The minute we pattern match, we might have found we claimed a function was in our type! I have not yet managed to pay my rent by saying “OK, here’s the check… but don’t try to use it and it’s rent”. So perhaps we should try something else. Okay, so let’s not say ⊤
, let’s say
X = ⊤ ⋂ Φ(⊤)
Now, if t ∈ X
, we know that t ∈ 1 + ???
. This means that if we run e ∈ X
, we’ll get the correct outermost form. However, this code is still potentially broken:
case e of
Inl _ > ...
Inr x > case e of
Inl _ > ...
Inr _ > ...
This starts off as being well typed, but as we evaluate, it may actually become ill typed. If we claimed that this was a fixed point to our language, our language would be typeunsafe. This is an unappealing quality in a type theory.
Okay, so that didn’t work. What if we fixed this code by doing
X = ⊤ ⋂ Φ(⊤) ⋂ Φ(Φ(⊤))
Now this fixes the above code, but can you imagine a snippet of code where this still gets stuck? So each time we intersect X
with Φ(X)
we get a new type which behaves like the real fixed point so long as we only observe n + 1
times where X
behaves like the fixed point for n
observations. Well, we can only make finitely many observations so let’s just iterate such an intersection
X = ⋂ₙ Φⁿ(⊤)
So if e ∈ X
, then no matter how many times we pattern match and examine the recursive component of e
we know that it’s still in ⋂ₙ Φⁿ(⊤)
and therefore still in X
! In fact, it’s easy to prove that this is the case with two lemmas
X ⊆ Y
then Φ(X) ⊆ Φ(Y)
S
of sets, then ⋂ Φ(S) = Φ(⋂ S)
where we define Φ
on a collection of sets by applying Φ
to each component.These two properties state the monotonicity and cocontinuity of Φ
. In fact, cocontinuity should imply monotonicity (can you see how?). We then may show that
Φ(⋂ₙ Φⁿ(⊤)) = ⋂ₙ Φ(Φⁿ(⊤))
= ⊤ ⋂ (⋂ₙ Φ(Φⁿ(⊤)))
= ⋂ₙ Φⁿ(⊤)
As desired.
Now that we have some idea of how to formalize coinduction, can we port this to JonPRL? Well, we have natural numbers and we can take the intersection of types… Seems like a start. Looking at that example, we first need to figure out what ⊤
corresponds to. It should include all programs, which sounds like the type base
in JonPRL. However, it also should be the case that x = y ∈ ⊤
for all x
and y
. For that we need an interesting trick:
Operator top : ().
[top] =def= [isect(void; _.void)].
In prettier notation,
top ≙ ⋂ x : void. void
Now x ∈ top
if x ∈ void
for all _ ∈ void
. Hey wait a minute… No such _
exists so the if is always satisfied vacuously. Ok, that’s good. Now x = y ∈ top
if for all _ ∈ void
, x = y ∈ void
. Since no such _
exists again, all things are in fact equal in void
. We can even prove this within JonPRL
Theorem topistop :
[isect(base; x.
isect(base; y.
=(x; y; top)))] {
unfold <top>; auto
}.
This proof is really just:
x : void
in my context! Tell me more about that.Now the fact that x ∈ top
is a trivial corollary since our theorem tells us that x = x ∈ top
and the former is just sugar for the latter. With this defined, we can now write down a general operator for coinduction!
Operator corec : (1).
[corec(F)] =def= [isect(nat; n. natrec(n; top; _.x. so_apply(F;x)))].
To unpack this, corec
takes one argument which binds one variable. We then intersect the type natrec(n; top; _.x.so_apply(F;x))
for all n ∈ nat
. That natrec
construct is really saying Fⁿ(⊤)
, it’s just a little obscured. Especially since we have to use so_apply
, a sort of “metaapplication” which lets us apply a term binding a variable to another term. This should look familiar, it’s just how we defined fixed point of a Φ
!
For a fun demo, let’s define an F
so that cofix(F)
will give us the conatural numbers. I know that the natural numbers come from the least fixed point of X ↦ 1 + X
(because I said so above, so it must be so) so let’s define that.
Operator conatF : (0).
[conatF(X)] =def= [+(unit; X)].
This is just that X ↦ 1 + X
I wrote above in JonPRL land instead of math notation. Next we need to actually define conatural numbers using corec
.
Operator conat : ().
[conat] =def= [corec(R. conatF(R))].
Now I’ve defined this, but that’s no fun unless we can actual build some terms so that member(X; conat)
. Specifically I want to prove two things to start
member(czero; conat)
fun(member(M; conat); _.member(csucc(M); conat))
This states that conat
is closed under some zero and successor operations. Now what should those operations be? Remember what I said before, that we had this correspondence?
X ↦ 1 + X
Nat Zero Suc X
Now remember that conat
is isect(nat; n....)
and when constructing a member of isect
we’re not allowed to mention that n
in it (as opposed to fun
where we do exactly that). So that means czero
has to be a member of top
and sum(unit; ...)
. The top
bit is easy, everything is in top
! That diagram above suggests inl
of something in unit
Operator czero : ().
[czero] =def= [inl(<>)].
So now we want to prove that this in fact in conat
.
Theorem zerowf : [member(czero; conat)] {
}.
Okay loading this into JonPRL gives
⊢ czero ∈ conat
From there we start by unfolding all the definitions
{
unfold <czero conat conatF corec top>
}
This gives us back the desugared goal
⊢ inl(<>) ∈ ⋂n ∈ nat. natrec(n; top; _.x.+(unit; x))
Next let’s apply all the obvious introductions so that we’re in a position to try to prove things
unfold <czero conat conatF corec top>; auto
This gives us back
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ natrec(n; top; _.x.+(unit; x))
Now we’re stuck. We want to show inl
is in something, but we’re never going to be able to do that until we can reduce that natrec(n; top; _.x.+(unit; x))
to a canonical form. Since it’s stuck on n
, let’s induct on that n
. After that, let’s immediately reduce.
{
unfold <czero conat conatF corec top>; auto; elim #1; reduce
}
Now we have to cases, the base and inductive case.
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ top
1. [n] : nat
2. n' : nat
3. ih : inl(<>) = inl(<>) ∈ natrec(n'; top; _.x.+(unit; x))
⊢ inl(<>) = inl(<>) ∈ +(unit; natrec(n'; top; _.x.+(unit; x)))
Now that we have canonical terms on the right of the ∈
m, let’s let auto
handle the rest.
Theorem zerowf : [member(czero; conat)] {
unfold <czero conat conatF corec top>; auto; elim #1; reduce; auto
}.
So now we have proven that czero
is in the correct type. Let’s figure out csucc
? Going by our noses, inl
corresponded to czero
and our diagram says that inr
should correspond to csucc
. This gives us
Operator csucc : (0).
[csucc(M)] =def= [inr(M)].
Now let’s try to prove the corresponding theorem for csucc
Theorem succwf : [isect(conat; x. member(csucc(x); conat))] {
}.
Now we’re going to start off this proof like we did with our last one. Unfold everything, apply the introduction rules, and induct on n
.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce
}
Like before, we now have two subgoals:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
⊢ inr(x) = inr(x) ∈ ⋂_ ∈ void. void
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr(x) = inr(x) ∈ +(unit; natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)))
The first one looks pretty easy, that’s just foo ∈ top
, I think auto
should handle that.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto
}
This just leaves one goal to prove
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now, as it turns out, this is nice and easy: look at what our first assumption says! Since x ∈ isect(nat; n.Foo)
and our goal is to show that x ∈ Foo(n')
this should be as easy as another call to elim
.
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto
}
Note that the [n']
bit there lets us supply the term we wish to substitute for n
while eliminating. This leaves us here:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now a small hiccup: we know that y = x
is in the right type. so x = x
in the right type. But how do we prove this? The answer is to substitute all occurrences of x
for y
. This is written
{
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto;
hypsubst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
}
There are three arguments here, a direction to substitute, an index telling us which hypothesis to use as the equality to substitute with and finally, a term [h. ...]
. The idea with this term is that each occurrence of h
tells us where we want to substitute. In our case we used h
in two places: both where we use x
, and the direction says to replace the right hand side of the equality with the left side of the equality.
Actually running this gives
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ y = y ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
7. h : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ h = h ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)) ∈ U{i}
The first goal is the result of our substitution and it’s trivial; auto
will handle this now. The second goal is a little strange. It basically says that the result of our substitution is still a wellformed type. JonPRL’s thought process is something like this
You said you were substituting for things of this type here. However, I know that just because
x : A
doesn’t mean we’re using it in all those spots as if it has typeA
. What if you substitute things equal in top (always equal) for when they’re being used as functions! This would let us prove thatzero ∈ Π(...)
or something silly. Convince me that when we fill in those holes with something of the type you mentioned, the goal is still a type (in a universe).
However, these wellformedness goals usually go away with auto. This completes our theorem in fact.
Theorem succwf : [isect(conat; x. member(csucc(x); conat))] {
unfold <csucc conat conatF corec top>; auto; elim #2; reduce;
auto; elim #1 [n']; auto;
hypsubst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
auto
}.
Okay so we now have something kind of numberish, with zero and successor. But in order to demonstrate that this is the conatural numbers there’s one big piece missing.
The thing that distinguishes the conatural numbers from the inductive variety is the fact that we include infinite terms. In particular, I want to show that Ω (infinitely many csucc
s) belongs in our type.
In order to say Ω in JonPRL we need recursion. Specifically, we want to write
[omega] =def= [csucc(omega)].
But this doesn’t work! Instead, we’ll define the Y combinator and say
Operator omega : ().
[omega] =def= [Y(x.csucc(x))].
So what should this Y
be? Well the standard definition of Y is
Y(F) = (λ x. F (x x)) (λ x. F (x x))
Excitingly, we can just say that in JonPRL; remember that we have a full untyped computation system after all!
Operator Y : (1).
[Y(f)] =def= [ap(lam(x.so_apply(f;ap(x;x)));lam(x.so_apply(f;ap(x;x))))].
This is more or less a direct translation, we occasionally use so_apply
for the reasons I explained above. As a fun thing, try to prove that this is a fixed point, eg that
isect(U{i}; A. isect(fun(A; _.A); f . ceq(Y(f); ap(f; Y(f)))))
The complete proof is in the JonPRL repo under example/computationalequality.jonprl
. Anyways, we now want to prove
Theorem omegawf : [member(omega; conat)] {
}.
Let’s start with the same prelude
{
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
}
Two goals just like before
1. [n] : nat
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(zero; ⋂_ ∈ void. void; _.x.+(unit; x))
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
The goals start to get fun now. I’ve also carefully avoided using reduce
ike we did before. The reason is simple, if we reduce in the second goal, our ih
will reduce as well and we’ll end up completely stuck in a few steps (try it and see). So instead we’re going to finesse it a bit.
First let’s take care of that first goal. We can tell JonPRL to apply some tactics to just the first goal with the focus
tactic
{
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
focus 0 #{reduce 1; auto};
}
Here reduce 1
says “reduce by only one step” since really omega will diverge if we just let it run. This takes care of the first goal leaving just
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Here’s the proof sketch for what’s left
You can stop here or you can see how we actually do this. It’s somewhat tricky. The basic complication is that there’s no builtin tactic for 1. Instead we use a new type called ceq
which is “computational equality”. It ranges between two terms, no types involved here. It’s designed to work thusly if ceq(a; b)
, either
a
and b
run to weakhead normal form (canonical verifications) with the same outermost form, and all the inner operands are ceq
a
and b
both divergeSo if ceq(a; b)
then a
and b
“run the same”. What’s a really cool upshot of this is that if ceq(a; b)
then if a = a ∈ A
and b = b ∈ A
then a = b ∈ A
! ceq
is the strictest equality in our system and we can rewrite with it absolutely everywhere without regards to types. Proving this requires showing the above definition forms a congruence (two things are related if their subcomponents are related).
This was a big deal because until Doug Howe came up with this proof NuPRL/CTT was awash with rules trying to specify this idea chunk by chunk and showing those rules were valid wasn’t trivial. Actually, you should read that paper: it’s 6 pages and the proof concept comes up a lot.
So, in order to do 1. we’re going to say “the goal and the goal if we step it twice are computationally equal” and then use this fact to substitute for the stepped version. The tactic to use here is called csubst
. It takes two arguments
ceq
we’re assertingh.
term to guide the rewrite {
*{unfold <csucc conat conatF corec top omega Y>}; auto; elim #1;
focus 0 #{reduce 1; auto};
csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
[h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
}
This leaves us with two goals
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ ceq((λx. inr(x[x]))[λx. inr(x[x])]; inr((λx. inr(x[x]))[λx. inr(x[x])]))
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr((λx. inr(x[x]))[λx. inr(x[x])]) = inr((λx. inr(x[x]))[λx. inr(x[x])]) ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Now we have two goals. The first is that ceq
proof obligation. The second is our goal postsubstitution. The first one can easily be dispatched by step
. step
let’s us prove ceq
by saying
ceq(a; b)
holds ifa
steps to a'
in one stepceq(a'; b)
This will leave us with ceq(X; X)
which auto
can handle. The second term is.. massive. But also simple. We just need to step it once and we suddenly have inr(X) = inr(X) ∈ sum(_; A)
where X = X ∈ A
is our assumption! So that can also be handled by auto
as well. That means we need to run step
on the first goal, reduce 1
on the second, and auto
on both.
Theorem omegawf : [member(omega; conat)] {
unfolds; unfold <omega Y>; auto; elim #1;
focus 0 #{reduce 1; auto};
csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
[h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
[step, reduce 1]; auto
}.
And we’ve just proved that omega ∈ conat
, a term that is certainly the canonical (heh) example of coinduction in my mind.
Whew, I actually meant for this to be a short blog post but that didn’t work out so well. Hopefully this illustrated a cool trick in computer science (intersect your way to coinduction) and in JonPRL.
Funnily enough before this was written no one had actually realized you could do coinduction in JonPRL. I’m still somewhat taken with the fact that a very minimal proof assistant like JonPRL is powerful enough to let you do this by giving you such general purpose tools as family intersection and a full computation system to work with. Okay that’s enough marketing from me.
Cheers.
Huge thanks to Jon Sterling for the idea on how to write this code and then touching up the results
comments powered by Disqus ]]>JonPRL switched to ASCII syntax so I’ve updated this post accordingly
I was just over at OPLSS for the last two weeks. While there I finally met Jon Sterling in person. What was particularly fun is that for that last few months he’s been creating a proof assistant called JonPRL in the spirit of Nuprl. As it turns out, it’s quite a fun project to work on so I’ve implemented a few features in it over the last couple of days and learned more or less how it works.
Since there’s basically no documentation on it besides the readme and of course the compiler so I thought I’d write down some of the stuff I’ve learned. There’s also a completely separate post on the underlying type theory for Nuprl and JonPRL that’s very interesting in its own right but this isn’t it. Hopefully I’ll get around to scribbling something about that because it’s really quite clever.
Here’s the layout of this tutorial
JonPRL is pretty easy to build and install and having it will make this post more enjoyable. You’ll need smlnj
since JonPRL is currently written in SML. This is available in most package managers (including homebrew) otherwise just grab the binary from the website. After this the following commands should get you a working executable
git clone ssh://git@github.com/jonsterling/jonprl
cd jonprl
git submodule init
git submodule update
make
(This is excitingly fast to run)make test
(If you’re doubtful)You should now have an executable called jonprl
in the bin
folder. There’s no prelude for jonprl so that’s it. You can now just feed it files like any reasonable compiler and watch it spew (currently difficulttodecipher) output at you.
If you’re interested in actually writing JonPRL code, you should probably install David Christiansen’s Emacs mode. Now that we’re up and running, let’s actually figure out how the language works
JonPRL is composed of really 3 different sorts of minilanguages
In Coq, these roughly correspond to Gallina, Ltac, and Vernacular respectively.
The term language is an untyped language that contains a number of constructs that should be familiar to people who have been exposed to dependent types before. The actual concrete syntax is composed of 3 basic forms:
op(arg1; arg2; arg3)
.x
.x.e
. JonPRL has one construct for binding x.e
built into its syntax, that things like lam
or fun
are built off of.An operator in this context is really anything you can imagine having a node in an AST for a language. So something like lam is an operator, as is if
or pair
(corresponding to (,)
in Haskell). Each operator has a piece of information associated with it, called its arity. This arity tells you how many arguments an operator takes and how many variables x.y.z. ...
each is allowed to bind. For example, with lam has an arity is written (1)
since it takes 1 argument which binds 1 variable. Application (ap
) has the arity (0; 0)
. It takes 2 arguments neither of which bind a variable.
So as mentioned we have functions and application. This means we could write (lamx.x) y
in JonPRL as ap(lam(x.x); y)
. The type of functions is written with fun
. Remember that JonPRL’s language has a notion of dependence so the arity is (0; 1)
. The construct fun(A; x.B)
corresponds to (x : A) → B
in Agda or forall (x : A), B
in Coq.
We also have dependent sums as well (prod
s). In Agda you would write (M , N)
to introduce a pair and prod A lam x → B
to type it. In JonPRL you have pair(M; N)
and prod(A; x.B)
. To inspect a prod
we have spread
which let’s us eliminate a pair pair. Eg spread(0; 2)
and you give it a prod
in the first spot and x.y.e
in the second. It’ll then replace x
with the first component and y
with the second. Can you think of how to write fst
and snd
with this?
There’s sums, so inl(M)
, inr(N)
and +(A; B)
corresponds to Left
, Right
, and Either
in Haskell. For case analysis there’s decide
which has the arity (0; 1; 1)
. You should read decide(M; x.N; y.P)
as something like
case E of
Left x > L
Right y > R
In addition we have unit
and <>
(pronounced axe for axiom usually). Neither of these takes any arguments so we write them just as I have above. They correspond to Haskell’s type and valuelevel ()
respectively. Finally there’s void
which is sometimes called false
or empty
in theorem prover land.
You’ll notice that I presented a bunch of types as if they were normal terms in this section. That’s because in this untyped computation system, types are literally just terms. There’s no typing relation to distinguish them yet so they just float around exactly as if they were lam or something! I call them types because I’m thinking of later when we have a typing relation built on top of this system but for now there are really just terms. It was still a little confusing for me to see fun(unit; _.unit)
in a language without types, so I wanted to make this explicit.
Now we can introduce some more exotic terms. Later, we’re going to construct some rules around them that are going to make it behave that way we might expect but for now, they are just suggestively named constants.
U{i}
, the ith level universe used to classify all types that can be built using types other than U{i}
or higher. It’s closed under terms like fun
and it contains all the types of smaller universes=(0; 0; 0)
this is equality between two terms at a type. It’s a proposition that’s going to precisely mirror what’s going on later in the type theory with the equality judgmentmember(0; 0)
this is just like =
but internalizes membership in a type into the system. Remember that normally “This has that type” is a judgment but with this term we’re going to have a propositional counterpart to use in theorems.In particular it’s important to distinguish the difference between ∈
the judgment and member the term. There’s nothing inherent in member
above that makes it behave like a typing relation as you might expect. It’s on equal footing with flibbertyjibberty(0; 0; 0)
.
This term language contains the full untyped lambda calculus so we can write all sorts of fun programs like
lam(f.ap(lam(x.ap(f;(ap(x;x)))); lam(x.ap(f;(ap(x;x)))))
which is just the Y combinator. In particular this means that there’s no reason that every term in this language should normalize to a value. There are plenty of terms in here that diverge and in principle, there’s nothing that rules out them doing even stranger things than that. We really only depend on them being deterministic, that e ⇒ v
and e ⇒ v'
implies that v = v'
.
The other big language in JonPRL is the language of tactics. Luckily, this is very familiarly territory if you’re a Coq user. Unluckily, if you’ve never heard of Coq’s tactic mechanism this will seem completely alien. As a quick high level idea for what tactics are:
When we’re proving something in a proof assistant we have to deal with a lot of boring mechanical details. For example, when proving A → B → A
I have to describe that I want to introduce the A
and the B
into my context, then I have to suggest using that A
the context as a solution to the goal. Bleh. All of that is pretty obvious so let’s just get the computer to do it! In fact, we can build up a DSL of composable “proof procedures” or /tactics/ to modify a particular goal we’re trying to prove so that we don’t have to think so much about the low level details of the proof being generated. In the end this DSL will generate a proof term (or derivation in JonPRL) and we’ll check that so we never have to trust the actual tactics to be sound.
In Coq this is used to great effect. In particular see Adam Chlipala’s book to see incredibly complex theorems with oneline proofs thanks to tactics.
In JonPRL the tactic system works by modifying a sequent of the form H ⊢ A
(a goal). Each time we run a tactic we get back a list of new goals to prove until eventually we get to trivial goals which produce no new subgoals. This means that when trying to prove a theorem in the tactic language we never actually see the resulting evidence generated by our proof. We just see this list of H ⊢ A
s to prove and we do so with tactics.
The tactic system is quite simple, to start we have a number of basic tactics which are useful no matter what goal you’re attempting to prove
id
a tactic which does nothingt1; t2
this runs the t1
tactic and runs t2
on any resulting subgoals*{t}
this runs t
as long as t
does something to the goal. If t
ever fails for whatever reason it merely stops running, it doesn’t fail itself?{t}
tries to run t
once. If t
fails nothing happens!{t}
runs t
and if t
does anything besides complete the proof it fails. This means that !{id}
for example will always fail.t1  t2
runs t1
and if it fails it runs t2
. Only one of the effects for t1
and t2
will be shown.t; [t1, ..., tn]
first runs t
and then runs tactic ti
on subgoal i
th subgoal generated by t
trace "some words"
will print some words
to standard out. This is useful when trying to figure out why things haven’t gone your way.fail
is the opposite of id
, it just fails. This is actually quite useful for forcing backtracking and one could probably implement a makeshift !{}
as t; fail
.It’s helpful to see this as a sort of tree, a tactic takes one goal to a list of a subgoals to prove so we can imagine t
as this part of a tree
H

———–————————— (t)
H' H'' H'''
If we have some tactic t2
then t; t2
will run t
and then run t2
on H
, H'
, and H''
. Instead we could have t; [t1, t2, t3]
then we’ll run t
and (assuming it succeeds) we’ll run t1
on H'
, t2
on H''
, and t3
on H'''
. This is actually how things work under the hood, composable fragments of trees :)
Now those give us a sort of bedrock for building up scripts of tactics. We also have a bunch of tactics that actually let us manipulate things we’re trying to prove. The 4 big ones to be aware of are
intro
elim #NUM
eqcd
memcd
The basic idea is that intro
modifies the A
part of the goal. If we’re looking at a function, so something like H ⊢ fun(A; x.B)
, this will move that A
into the context, leaving us with H, x : A ⊢ B
.
If you’re familiar with sequent calculus intro
runs the appropriate right rule for the goal. If you’re not familiar with sequent calculus intro
looks at the outermost operator of the A
and runs a rule that applies when that operator is to the right of a the ⊢
.
Now one tricky case is what should intro
do if you’re looking at a prod? Well now things get a bit dicey. We’d might expect to get two subgoals if we run intro
on H ⊢ prod(A; x.B)
, one which proves H ⊢ A
and one which proves H ⊢ B
or something, but what about the fact that x.B
depends on whatever the underlying realizer (that’s the program extracted from) the proof of H ⊢ A
! Further, Nuprl and JonPRL are based around extractstyle proof systems. These mean that a goal shouldn’t depend on the particular piece of evidence proving of another goal. So instead we have to tell intro
up front what we want the evidence for H ⊢ A
to be is so that the H ⊢ B
section may use it.
To do this we just give intro an argument. For example say we’re proving that · ⊢ prod(unit; x.unit)
, we run intro [<>]
which gives us two subgoals · ⊢ member(<>; unit)
and · ⊢ unit
. Here the []
let us denote the realizer we’re passing to intro
. In general any term arguments to a tactic will be wrapped in []
s. So the first goal says “OK, you said that this was your realizer for unit
, but is it actually a realizer for unit
?” and the second goal substitutes the given realizer into the second argument of prod
, x.unit
, and asks us to prove that. Notice how here we have to prove member(<>; unit)
? This is where that weird member
type comes in handy. It let’s us sort of play type checker and guide JonPRL through the process of type checking. This is actually very crucial since type checking in Nuprl and JonPRL is undecidable.
Now how do we actually go about proving member(<>; unit)
? Well here memcd
has got our back. This tactic transforms member(A; B)
into the equivalent form =(A; A; B)
. In JonPRL and Nuprl, types are given meaning by how we interpret the equality of their members. In other words, if you give me a type you have to say
Long ago, Stuart Allen realized we could combine the two by specifying a partial equivalence relation for a type. In this case rather than having a separate notion of membership we check to see if something is equal to itself under the PER because when it is that PER behaves like a normal equivalence relation! So in JonPRL member
is actually just a very thin layer of sugar around =
which is really the core defining notion of typehood. To handle =
we have eqcd
which does clever things to handle most of the obvious cases of equality.
Finally, we have elim
. Just like intro
let us simplify things on the right of the ⊢, elim
let’s us eliminate something on the left. So we tell elim
to “eliminate” the nth item in the context (they’re numbered when JonPRL prints them) with elim #n
.
Just like with anything, it’s hard to learn all the tactics without experimenting (though a complete list can be found with jonprl listtactics
). Let’s go look at the command language so we can actually prove some theorems.
So in JonPRL there are only 4 commands you can write at the top level
Operator
[oper] =def= [term]
(A definition)Tactic
Theorem
The first three of these let us customize and extend the basic suite of operators and tactics JonPRL comes with. The last actually lets us state and prove theorems.
The best way to see these things is by example so we’re going to build up a small development in JonPRL. We’re going to show that products are monoid with unit
up to some logical equivalence. There are a lot of proofs involved here
prod(unit; A)
entails A
prod(A; unit)
entails A
A
entails prod(unit; A)
A
entails prod(A; unit)
prod(A; prod(B; C))
entails prod(prod(A; B); C)
prod(prod(A; B); C)
entails prod(A; prod(B; C))
I intend to prove 1, 2, and 5. The remaining proofs are either very similar or fun puzzles to work on. We could also prove that all the appropriate entailments are inverses and then we could say that everything is up to isomorphism.
First we want a new snazzy operator to signify nondependent products since writing prod(A; x.B)
is kind of annoying. We do this using operator
Operator prod : (0; 0).
This line declares prod
as a new operator which takes two arguments binding zero variables each. Now we really want JonPRL to know that prod
is sugar for prod
. To do this we use =def=
which gives us a way to desugar a new operator into a mess of existing ones.
[prod(A; B)] =def= [prod(A; _.B)].
Now we can change any occurrence of prod(A; B)
for prod(A; _.B)
as we’d like. Okay, so we want to prove that we have a monoid here. What’s the first step? Let’s verify that unit
is a left identity for prod
. This entails proving that for all types A
, prod(unit; A) ⊃ A
and A ⊃ prod(unit; A)
. Let’s prove these as separate theorems. Translating our first statement into JonPRL we want to prove
fun(U{i}; A.
fun(prod(unit; A); _.
A))
In Agda notation this would be written
(A : Set) → (_ : prod(unit; A)) → A
Let’s prove our first theorem, we start by writing
Theorem leftid1 :
[fun(U{i}; A.
fun(prod(unit; A); _.
A))] {
id
}.
This is the basic form of a theorem in JonPRL, a name, a term to prove, and a tactic script. Here we have id
as a tactic script, which clearly doesn’t prove our goal. When we run JonPRL on this file (Cc Cl if you’re in Emacs) you get back
[XXX.jonprl:8.39.1]: tactic 'COMPLETE' failed with goal:
⊢ funA ∈ U{i}. (prod(unit; A)) => A
Remaining subgoals:
⊢ funA ∈ U{i}. (prod(unit; A)) => A
So focus on that Remaining subgoals
bit, that’s what we have left to prove, it’s our current goal. Now you may notice that this outputted goal is a lot prettier than our syntax! That’s because currently in JonPRL the input and outputted terms may not match, the latter is subject to pretty printing. In general this is great because you can read your remaining goals, but it does mean copying and pasting is a bother. There’s nothing to the left of that ⊢ yet so let’s run the only applicable tactic we know. Delete that id
and replace it with
{
intro
}.
The goal now becomes
Remaining subgoals:
1. A : U{i}
⊢ (prod(unit; A)) => A
⊢ U{i} ∈ U{i'}
Two ⊢s means two subgoals now. One looks pretty obvious, U{i'}
is just the universe above U{i}
(so that’s like Set₁ in Agda) so it should be the case that U{i} ∈ U{i'}
by definition! So the next tactic should be something like [???, memcd; eqcd]
. Now what should that ??? be? Well we can’t use elim
because there’s one thing in the context now (A : U{i}
), but it doesn’t help us really. Instead let’s run unfold <prod>
. This is a new tactic that’s going to replace that prod
with the definition that we wrote earlier.
{
intro; [unfold <prod>, memcd; eqcd]
}
Notice here that ,
behinds less tightly than ;
which is useful for saying stuff like this. This gives us
Remaining subgoals:
1. A : U{i}
⊢ (unit × A) => A
We run intro again
{
intro; [unfold <prod>, memcd; eqcd]; intro
}
Now we are in a similar position to before with two subgoals.
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
⊢ A
1. A : U{i}
⊢ unit × A ∈ U{i}
The first subgoal is really what we want to be proving so let’s put a pin in that momentarily. Let’s get rid of that second subgoal with a new helpful tactic called auto
. It runs eqcd
, memcd
and intro
repeatedly and is built to take care of boring goals just like this!
{
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto]
}
Notice that we used what is a pretty common pattern in JonPRL, to work on one subgoal at a time we use []
’s and id
s everywhere except where we want to do work, in this case the second subgoal.
Now we have
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
⊢ A
Cool! Having a pair of unit × A
really ought to mean that we have an A
so we can use elim
to get access to it.
{
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto];
elim #2
}
This gives us
Remaining subgoals:
1. A : U{i}
2. _ : unit × A
3. s : unit
4. t : A
⊢ A
We’ve really got the answer now, #4 is precisely our goal. For this situations there’s assumption
which is just a tactic which succeeds if what we’re trying to prove is in our context already. This will complete our proof
Theorem leftid1 :
[fun(U{i}; A.
fun(prod(unit; A); _.
A))] {
intro; [unfold <prod>, memcd; eqcd]; intro; [id, auto];
elim #2; assumption
}.
Now we know that auto
will run all of the tactics on the first line except unfold <prod>
, so what we just unfold <prod>
first and run auto
? It ought to do all the same stuff.. Indeed we can shorten our whole proof to unfold <prod>; auto; elim #2; assumption
. With this more heavily automated proof, proving our next theorem follows easily.
Theorem rightid1 :
[fun(U{i}; A.
fun(prod(A; unit); _.
A))] {
unfold <prod>; auto; elim #2; assumption
}.
Next, we have to prove associativity to complete the development that prod
is a monoid. The statement here is a bit more complex.
Theorem assoc :
[fun(U{i}; A.
fun(U{i}; B.
fun(U{i}; C.
fun(prod(A; prod(B;C)); _.
prod(prod(A;B); C)))))] {
id
}.
In Agda notation what I’ve written above is
assoc : (A B C : Set) → A × (B × C) → (A × B) × C
assoc = ?
Let’s kick things off with unfold <prod>; auto
to deal with all the boring stuff we had last time. In fact, since x
appears in several nested places we’d have to run unfold
quite a few times. Let’s just shorten all of those invocations into *{unfold <prod>}
{
*{unfold <prod>}; auto
}
This leaves us with the state
Remaining subgoals:
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ A
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ B
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ C
In each of those goals we need to take apart the 4th hypothesis so let’s do that
{
*{unfold <prod>}; auto; elim #4
}
This leaves us with 3 subgoals still
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ A
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ B
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ C
The first subgoal is pretty easy, assumption
should handle that. In the other two we want to eliminate 6 and then we should be able to apply assumption. In order to deal with this we use 
to encode that disjunction. In particular we want to run assumption
OR elim #6; assumption
leaving us with
{
*{unfold <prod>}; auto; elim #4; (assumption  elim #6; assumption)
}
This completes the proof!
Theorem assoc :
[fun(U{i}; A.
fun(U{i}; B.
fun(U{i}; C.
fun(prod(A; prod(B;C)); _.
prod(prod(A;B); C)))))] {
*{unfold <prod>}; auto; elim #4; (assumption  elim #6; assumption)
}.
As a fun puzzle, what needs to change in this proof to prove we can associate the other way?
So we just proved a theorem.. but what really just happened? I mean how did we go from “Here we have an untyped computation system which types just behaving as normal terms” to “Now apply auto
and we’re done!”. In this section I’d like to briefly sketch the path from untyped computation to theorems.
The path looks like this
We start with our untyped language and its notion of computation
We already discussed this in great depth before.
We define a judgment a = b ∈ A
.
This is a judgment, not a term in that language. It exists in whatever metalanguage we’re using. This judgment is defined across 3 terms in our untyped language (I’m only capitalizing A
out of convention). This is supposed to represent that a
and b
are equal elements of type A
. This also gives meaning to typehood: something is a type in CTT precisely when we know what the partial equivalence relation defined by  =  ∈ A
on canonical values is.
Notice here that I said partial. It isn’t the case that a = b ∈ A
presupposes that we know that a : A
and b : A
because we don’t have a notion of :
yet!
In some sense this is where we depart from a type theory like Coq or Agda’s. We have programs already and on top of them we define this 3 part judgment which interacts which computation in a few ways I’m not specifying. In Coq, we would specify one notion of equality, generic over all types, and separately specify a typing relation.
From here we can define the normal judgments of Martin Lof’s type theory. For example, a : A
is a = a ∈ A
. We recover the judgment A type
with A = A ∈ U
(where U
here is a universe).
This means that inhabiting a universe A = A ∈ U
, isn’t necessarily inductively defined but rather negatively generated. We specify some condition a term must satisfy to occupy a universe.
Hypothetical judgments are introduced in the same way they would be in MartinLof’s presentations of type theory. The idea being that H ⊢ J
if J
is evident under the assumption that each term in H
has the appropriate type and furthermore that J
is functional (respects equality) with respect to what H
contains. This isn’t really a higher order judgment, but it will be defined in terms of a higher order hypothetical judgment in the metatheory.
With this we have something that walks and quacks like normal type theory. Using the normal tools of our metatheory we can formulate proofs of a : A
and do normal type theory things. This whole development is building up what is called “Computational Type Theory”. The way this diverges from MartinLof’s extensional type theory is subtle but it does directly descend from MartinLof’s famous 1979 paper “Constructive Mathematics and Computer Programming” (which you should read. Instead of my blog post).
Now there’s one final layer we have to consider, the PRL bit of JonPRL. We define a new judgment, H ⊢ A [ext a]
. This is judgment is cleverly set up so two properties hold
H ⊢ A [ext a]
should entail that H ⊢ a : A
or H ⊢ a = a ∈ A
H ⊢ A [ext a]
, a
is an output and H
and A
are inputs. In particular, this implies that in any inference for this judgment, the subgoals may not use a
in their H
and A
.This means that a
is completely determined by H
and A
which justifies my use of the term output. I mean this in the sense of Twelf and logic programming if that’s a more familiar phrasing. It’s this judgment that we see in JonPRL! Since that a
is output we simply hide it, leaving us with H ⊢ A
as we saw before. When we prove something with tactics in JonPRL we’re generating a derivation, a tree of inference rules which make H ⊢ A
evident for our particular H
and A
! These rules aren’t really programs though, they don’t correspond one to one with proof terms we may run like they would in Coq. The computational interpretation of our program is bundled up in that a
.
To see what I mean here we need a little bit more machinery. Specifically, let’s look at the rules for the equality around the proposition =(a; b; A)
. Remember that we have a term <>
lying around,
a = b ∈ A
————————————————————
<> = <> ∈ =(a; b; A)
So the only member of =(a; b; A)
is <>
if a = b ∈ A
actually holds. First off, notice that <> : A
and <> : B
doesn’t imply that A = B
! In another example, lam(x. x) ∈ fun(A; _.A)
for all A
! This is a natural consequence of separating our typing judgment from our programming language. Secondly, there’s not really any computation in the e
of H ⊢ =(a; b; A) (e)
. After all, in the end the only thing e
could be so that e : =(a; b; A)
is <>
! However, there is potentially quite a large derivation involved in showing =(a; b; A)
! For example, we might have something like this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption
Now we write derivations of this sequent up side down, so the thing we want to show starts on top and we write each rule application and subgoal below it (AI people apparently like this?). Now this was quite a derivation, but if we fill in the missing [ext e]
for this derivation from the bottom up we get this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption [ext x]
Notice how at the bottom there was some computational content (That x
signifies that we’re accessing a variable in our context) but than we throw it away right on the next line! That’s because we find that no matter what the extract was that let’s us derive =(b; a; A)
, the only realizer it could possible generate is <>
. Remember our conditions, if we can make evident the fact that b = a ∈ A
then <> ∈ =(b; a; A)
. Because we somehow managed to prove that b = a ∈ A
holds, we’re entitled to just use <>
to realize our proof. This means that despite our somewhat tedious derivation and the bookkeeping that we had to do to generate that program, that program reflects none of it.
This is why type checking in JonPRL is woefully undecidable: in part, the realizers that we want to type check contain none of the helpful hints that proof terms in Coq would. This also means that extraction from JonPRL proofs is built right into the system and we can actually generate cool and useful things! In Nuprlland, folks at Cornell actually write proofs and use this realizers to run real software. From what Bob Constable said at OPLSS they can actually get these programs to run fast (within 5x of naive C code).
So to recap, in JonPRL we
H ⊢ A
In fact, we can see all of this happen if you call JonPRL from the command line or hit Cc Cc in emacs! On our earlier proof we see
Operator prod : (0; 0).
⸤prod(A; B)⸥ ≝ ⸤A × B⸥.
Theorem leftid1 : ⸤⊢ funA ∈ U{i}. (prod(unit; A)) => A⸥ {
funintro(A.funintro(_.prodelim(_; _.t.t); prod⁼(unit⁼; _.hyp⁼(A))); U⁼{i})
} ext {
lam_. lam_. spread(_; _.t.t)
}.
Theorem rightid1 : ⸤⊢ funA ∈ U{i}. (prod(A; unit)) => A⸥ {
funintro(A.funintro(_.prodelim(_; s._.s); prod⁼(hyp⁼(A); _.unit⁼)); U⁼{i})
} ext {
lam_. lam_. spread(_; s._.s)
}.
Theorem assoc : ⸤⊢ funA ∈ U{i}. funB ∈ U{i}. funC ∈ U{i}. (prod(A; prod(B; C))) => prod(prod(A; B); C)⸥ {
funintro(A.funintro(B.funintro(C.funintro(_.independentprodintro(independentprodintro(prodelim(_;
s.t.prodelim(t; _._.s)); prodelim(_; _.t.prodelim(t;
s'._.s'))); prodelim(_; _.t.prodelim(t; _.t'.t')));
prod⁼(hyp⁼(A); _.prod⁼(hyp⁼(B); _.hyp⁼(C)))); U⁼{i}); U⁼{i});
U⁼{i})
} ext {
lam_. lam_. lam_. lam_. ⟨⟨spread(_; s.t.spread(t; _._.s)), spread(_; _.t.spread(t; s'._.s'))⟩, spread(_; _.t.spread(t; _.t'.t'))⟩
}.
Now we can see that those Operator
and ≝
bits are really what we typed with =def=
and Operator
in JonPRL, what’s interesting here are the theorems. There’s two bits, the derivation and the extract or realizer.
{
derivation of the sequent · ⊢ A
} ext {
the program in the untyped system extracted from our derivation
}
We can move that derivation into a different proof assistant and check it. This gives us all the information we need to check that JonPRL’s reasoning and helps us not trust all of JonPRL (I wrote some of it so I’d be a little scared to trust it :). We can also see the computational bit of our proof in the extract. For example, the computation involved in taking A × unit → A
is just lam_. lam_. spread(_; s._.s)
! This is probably closer to what you’ve seen in Coq or Idris, even though I’d say the derivation is probably more similar in spirit (just ugly and beta normal). That’s because the extract need not have any notion of typing or proof, it’s just the computation needed to produce a witness of the appropriate type. This means for a really tricky proof of equality, your extract might just be <>
! Your derivation however will always exactly reflect the complexity of your proof.
OK, so I’ve just dumped about 50 years worth of hard research in type theory into your lap which is best left to ruminate for a bit. However, before I finish up this post I wanted to do a little bit of marketing so that you can see why one might be interested in JonPRL (or Nuprl). Since we’ve embraced this idea of programs first and types as PERs, we can define some really strange types completely seamlessly. For example, in JonPRL there’s a type ⋂(A; x.B)
, it behaves a lot like fun
but with one big difference, the definition of  =  ∈ ⋂(A; x.B)
looks like this
a : A ⊢ e = e' ∈ [a/x]B
————————————————————————
e = e' ∈ ⋂(A; x.B)
Notice here that e
and e'
may not use a
anywhere in their bodies. That is, they have to be in [a/x]B
without knowing anything about a
and without even having access to it.
This is a pretty alien concept that turned out to be new in logic as well (it’s called “uniform quantification” I believe). It turns out to be very useful in PRL’s because it lets us declare things in our theorems without having them propogate into our witness. For example, we could have said
Theorem rightid1 :
[⋂(U{i}; A.
fun(prod(A; unit); _.
A))] {
unfold <prod>; auto; elim #2; assumption
}.
With the observation that our realizer doesn’t need to depend on A
at all (remember, no types!). Then the extract of this theorem is
lamx. spread(x; s._.s)
There’s no spurious lam _. ...
at the beginning! Even more wackily, we can define subsets of an existing type since realizers need not have unique types
e = e' ∈ A [e/x]P [e'/x]P
————————————————————————————
e = e' ∈ subset(A; x.P)
And in JonPRL we can now say things like “all odd numbers” by just saying subset(nat; n. ap(odd; n))
. In intensional type theories, these types are hard to deal with and still the subject of open research. In CTT they just kinda fall out because of how we thought about types in the first place. Quotients are a similarly natural conception (just define a new type with a stricter PER) but JonPRL currently lacks them (though they shouldn’t be hard to add..).
Finally, if you’re looking for one last reason to dig into **PRL, the fact that we’ve defined all our equalities extensionally means that several very useful facts just fall right out of our theory
Theorem funext :
[⋂(U{i}; A.
⋂(fun(A; _.U{i}); B.
⋂(fun(A; a.ap(B;a)); f.
⋂(fun(A; a.ap(B;a)); g.
⋂(fun(A; a.=(ap(f; a); ap(g; a); ap(B; a))); _.
=(f; g; fun(A; a.ap(B;a))))))))] {
auto; ext; ?{elim #5 [a]}; auto
}.
This means that two functions are equal in JonPRL if and only if they map equal arguments to equal output. This is quite pleasant for formalizing mathematics for example.
Whew, we went through a lot! I didn’t intend for this to be a full tour of JonPRL, just a taste of how things sort of hang together and maybe enough to get you looking through the examples. Speaking of which, JonPRL comes with quite a few examples which are going to make a lot more sense now.
Additionally, you may be interested in the documentation in the README which covers most of the primitive operators in JonPRL. As for an exhaustive list of tactics, well….
Hopefully I’ll be writing about JonPRL again soon. Until then, I hope you’ve learned something cool :)
A huge thanks to David Christiansen and Jon Sterling for tons of helpful feedback on this
comments powered by Disqus ]]>Veering wildly onto the theory side compared to my last post, I’d like to look at some more Twelf code today. Specifically, I’d like to prove a fun theorem called cut admissibility (or elimination) for a particular logic: a simple intuitionistic propositional sequent calculus. I chucked the code for this over here.
If those words didn’t make any sense, here’ an incomplete primer on what we’re doing here. First of all we’re working with a flavor of logic called “sequent calculus”. Sequent calculus describes a class of logics characterized by using studying “sequents”, a sequent is just an expression Γ ⊢ A
saying “A
is true under the assumption that the set of propositions, Γ
, are true”. A sequent calculus defines a couple things
What exactly A
is, a calculus defines what propositions it talks about
For us we’re only interested in a few basic connectives, so our calculus can talk about true
, false
, A
and B
(A ∧ B
), A
or B
(A ∨ B
), and A
implies B
(A ⇒ B
).
Rules for inferring Γ ⊢ A
holds. We can use these inference rules to build up proofs of things in our system.
In sequent calculus there are two sorts of inference rules, left and right. A left rule takes a fact that we know and let’s us reason backwards to other things we must know hold. A right rule let’s us take the thing we’re trying to prove and instead prove smaller, simpler things.
More rules will follow in the Twelf code but for a nice example consider the left and right rules for ∧
,
Γ, A, B ⊢ C
———————————————
Γ, A ∧ B ⊢ C
Γ ⊢ A Γ ⊢ B
———————————————
Γ ⊢ A ∧ B
The left rule says if we know that A ∧ B
is true, we can take it apart and try to prove our goal with assumptions that A
and B
are true. The right rule says to prove that A ∧ B
is true we need to prove A
is true and B
is true. A proof in this system is a true of these rules just like you’d expect in a type theory or natural deduction.
We also tacitly assume a bunch of boring rules called structural rules about our sequents hold, so that we can freely duplicate, drop and swap assumptions in Γ
. For a less awful introduction to sequent calculus Frank Pfenning has some good notes.
Now we want to prove a particular (meta)theorem about sequent calculus
Γ ⊢ A
Γ, A ⊢ B
Γ ⊢ B
This theorem means a couple different things for example, our system is consistent and our system also admits lemmas. As it turns out, proving this theorem is hard. The basic complication is that we don’t know what form either of the first two proofs.
We now formalize our sequent calculus in Twelf. First we declare a type and some constants to represent propositions.
prop : type.
=> : prop > prop > prop. %infix right 4 =>.
true : prop.
false : prop.
/\ : prop > prop > prop. %infix none 5 /\.
\/ : prop > prop > prop. %infix none 5 \/.
Notice here that we use infix
to let us write A /\ B => C
. Having specified these we now define what a proof is in this system. This is structured a little differently than you’d be led to believe from the above. We have an explicit type proof
which is inhabited by “proof terms” which serve as a nice shortcut to those trees generated by inference rules. Finally, we don’t explicitly represent Γ
, instead we have this thing called hyp
which is used to represent a hypothesis in Γ
. Left rules manipulate use these hypotheses and introduce new ones. Pay attention to /\/l
and /\/r
since you’ve seen the handwritten equivalents.
proof : type.
hyp : type.
init : hyp > proof.
=>/r : (hyp > proof) > proof.
=>/l : (hyp > proof) > proof > hyp > proof.
true/r : proof.
false/l : hyp > proof.
/\/r : proof > proof > proof.
/\/l : (hyp > hyp > proof) > hyp > proof.
\//r1 : proof > proof.
\//r2 : proof > proof.
\//l : (hyp > proof) > (hyp > proof) > hyp > proof.
The right rules are at least a little intuitive, the left rules are peculiar. Essentially we have a weird CPSy feel going on here, to decompose a hypothesis you hand the hyp
to the constant along with a continuation which takes the hypotheses you should get out of the decomposition. For example for \/
we have to right rules (think Left
and Right
), then one left rule which takes two continuations and one hyp
(think either
). Finally, that init
thing is the only way to actually take a hypothesis and use it as a proof.
We now want to unite these two pieces of syntax with a typing judgment letting us say that a proof
proves some particular prop
.
of : proof > prop > type.
hof : hyp > prop > type.
of/init : of (init H) A
< hof H A.
of/=>/r : of (=>/r F) (A => B)
< ({h} hof h A > of (F h) B).
of/=>/l : of (=>/l C Arg F) U
< hof F (A => B)
< of Arg A
< ({h} hof h B > of (C h) U).
of/true/r : of true/r true.
of/false/l : of (false/l H) A
< hof H false.
of//\/r : of (/\/r R L) (A /\ B)
< of L A
< of R B.
of//\/l : of (/\/l C H) U
< hof H (A /\ B)
< ({h}{h'} hof h A > hof h' B > of (C h h') U).
of/\//r1 : of (\//r1 L) (A \/ B)
< of L A.
of/\//r2 : of (\//r2 R) (A \/ B)
< of R B.
of/\//l : of (\//l R L H) C
< hof H (A \/ B)
< ({h} hof h A > of (L h) C)
< ({h} hof h B > of (R h) C).
In order to handle hypotheses we have this hof
judgment which handles typing various hyp
s. We introduce it just like we introduce hyp
s in those continuationy things for left rules. Sorry for dumping so much code on you all at once: it’s just a lot of machinery we need to get working in order to actually start formalizing cut.
I would like to point out a few things about this formulation of sequent calculus though. First off, it’s very Twelfy, we use the LF context to host the context of our logic using HOAS. We also basically just have void
as the type of hypothesis! Look, there’s no way to construct a hypothesis, let alone a typing derivation hof
! The idea is that we’ll just wave our hands at Twelf and say “consider our theorem in a context with hyp
s and hof
s with
%block someHofs : some {A : prop} block {h : hyp}{deriv : hof h A}.
In short, Twelf is nifty.
Now we’re almost in a position to state cut admissibility, we want to say something like
cut : of Lemma A
> ({h} hyp h A > of (F h) B)
> of ??? B
But what should that ??? be? We could just say “screw it it’s something” and leave it as an output of this lemma but experimentally (an hour of teeth gnashing later) it’s absolutely not worth the pain. Instead let’s do something clever.
Let’s first define an untyped version of cut
which works just across proofs without mind to typing derivations. We can’t declare this total because it’s just not going to work for illtyped things, we can give it a mode though (it’s not needed) just as mechanical documentation.
cut : proof > (hyp > proof) > proof > type.
%mode cut +A +B C.
The goal here is we’re going to state our main theorem as
of/cut : {A} of P A
> ({h} hof h A > of (F h) B)
> cut P F C
> of C B
> type.
%mode of/cut +A +B +C D E.
Leaving that derivation of cut
as an output. This let’s us produce not just a random term but instead a proof that that term makes “sense” somehow along with a proof that it’s well typed.
cut
is going to mirror the structure of of/cut
so we now need to figure out how we’re going to structure our proof. It turns out a rather nice way to do this is to organize our cuts into 4 categories. The first one are “principle” cuts, they’re the ones where we have a right rule as our lemma and we immediately decompose that lemma in the other term with the corresponding left rule. This is sort of the case that we drive towards everywhere and it’s where the substitution bit happens.
First we have some simple cases
trivial : cut P' ([x] P) P.
p/init1 : cut (init H) ([x] P x) (P H).
p/init2 : cut P ([x] init x) P.
In trivial
we don’t use the hypothesis at all so we’re just “strengthening” here. p/init1
and p/init2
deal with the init
rule on the left or right side of the cut, if it’s on the left we have a hypothesis of the appropriate type so we just apply the function. If it’s on the left we have a proof of the appropriate type so we just return that. In the more interesting cases we have the principle cuts for some specific connectives.
p/=> : cut (=>/r F) ([x] =>/l ([y] C y x) (Arg x) x) Out'
< ({y} cut (=>/r F) (C y) (C' y))
< cut (=>/r F) Arg Arg'
< cut Arg' F Out
< cut Out C' Out'.
p//\ : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
< ({x}{y} cut (/\/r R L) (C x y) (C' x y))
< ({x} cut R (C' x) (Out x))
< cut L Out Out'.
p/\//1 : cut (\//r1 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
< ({x} cut (\//r1 L) (C1 x) (C1' x))
< cut L C1' Out.
p/\//2 : cut (\//r2 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
< ({x} cut (\//r2 L) (C2 x) (C2' x))
< cut L C2' Out.
Let’s take a closer look at p/=>
, the principle cut for =>
. First off, our inputs are =>/r F
and ([x] =>/l ([y] C y x) (Arg x) x)
. The first one is just a “function” that we’re supposed to substitute into the second. Now the second is comprised of a continuation and an argument. Notice that both of these depend on x
! In order to handle this the first two lines of the proof
< ({y} cut (=>/r F) (C y) (C' y))
< cut (=>/r F) Arg Arg'
Are to remove that dependence. We get back a C'
and an Arg'
which doesn’t use the hyp
(x
). In order to do this we just recurse and cut the =>/r F
out of them. Notice that both the type and the thing we’re substituting are the same size, what decreases here is what we’re substituting into. Now we’re ready to actually do some work. First we need to get a term representing the application of F
to Arg'
. This is done with cut since it’s just substitution
< cut Arg' F Out
But this isn’t enough, we don’t need the result of the application, we need the result of the continuation! So we have to cut the output of the application through the continuation
< cut Out C' Out'.
This code is kinda complicated. The typed version of this took me an hour since after 2am I am charitably called an idiot. However this same general pattern holds with all the principle cuts
=>
since it’s just lying about in an input.Try to work through the case for /\
now
p//\ : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
< ({x}{y} cut (/\/r R L) (C x y) (C' x y))
< ({x} cut R (C' x) (Out x))
< cut L Out Out'.
After principle cuts we really just have a number of boring cases whose job it is to recurse. The first of these is called rightist substitution because it comes up if the term on the right (the part using the lemma) has a right rule first. This means we have to hunt in the subterms to go find where we’re actually using the lemma.
r/=> : cut P ([x] (=>/r ([y] F y x))) (=>/r ([y] F' y))
< ({x} cut P (F x) (F' x)).
r/true : cut P ([x] true/r) true/r.
r//\ : cut P ([x] /\/r (R x) (L x)) (/\/r R' L')
< cut P L L'
< cut P R R'.
r/\/1 : cut P ([x] \//r1 (L x)) (\//r1 L')
< cut P L L'.
r/\/2 : cut P ([x] \//r2 (R x)) (\//r2 R')
< cut P R R'.
Nothing here should be surprising keeping in mind that all we’re doing here is recursing. The next set of cuts is called leftist substitution. Here we are actually recursing on the term we’re trying to substitute.
l/=> : cut (=>/l ([y] C y) Arg H) ([x] P x) (=>/l ([x] C' x) Arg H)
< ({x} cut (C x) P (C' x)).
l/false : cut (false/l H) P (false/l H).
l//\ : cut (/\/l ([x][y] C x y) H) P (/\/l ([x][y] C' x y) H)
< ({x}{y} cut (C x y) P (C' x y)).
l/\/ : cut (\//l ([y] R y) ([y] L y) H) ([x] P x)
(\//l ([y] R' y) ([y] L' y) H)
< ({x} cut (L x) P (L' x))
< ({x} cut (R x) P (R' x)).
It’s the same game but just a different target, we’re now recursing on the continuation because that’s where we somehow created a proof of A
. This means that on l/=>
we’re substation left term which has three parts
hyp B
to proof
of C
A
A > B
Now we’re only interesting in how we created that proof of C
, that’s the only relevant part of this substitution. The output of this case is going to have that left rule, =>/l ??? Arg H
so we have where ???
is a replacement of C
that we get by cutting C
through P
“pointwise”. This comes through on the recursive call
< ({x} cut (C x) P (C' x)).
For one more case, consider the left rule for \/
l/\/ : cut (\//l R L H) P
We start by trying to cut a left rule into P
so we need to produce a left rule in the output with different continuations, something like
(\//l R' L' H)
Now what should R'
and L'
be? In order to produce them we’ll throw up a pi so we can get L x
, a proof with the appropriate type to cut again. With that, we can recurse and get back the new continuation we want.
< ({x} cut (L x) P (L' x))
< ({x} cut (R x) P (R' x)).
There’s one last class of cuts to worry about, think about the cases we’ve covered so far
So what happens if we have a left rule on the right and a right rule on the left, but they don’t “match up”. By this I mean that the left rule in that right term works on a different hypothesis than the one that the function it’s wrapped in provides. In this case we just have to recurse some more
lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
< cut P Arg Arg'
< ({y} cut P (C y) (C' y)).
lr//\ : cut P ([x] /\/l ([y][z] C y z x) H) (/\/l C' H)
< ({y}{z} cut P (C y z) (C' y z)).
lr/\/ : cut P ([x] \//l ([y] R y x) ([y] L y x) H) (\//l R' L' H)
< ({y} cut P (L y) (L' y))
< ({y} cut P (R y) (R' y)).
When we have such an occurrence we just do like we did with right rules.
Okay, now that we’ve handled all of these cases we’re ready to type the damn thing.
of/cut : {A} of P A
> ({h} hof h A > of (F h) B)
> cut P F C
> of C B
> type.
%mode of/cut +A +B +C D E.
Honestly this is less exciting than you’d think. We’ve really done all the creative work in constructing the cut
type family. All that’s left to do is check that this is correct. As an example, here’s a case that exemplifies how we verify all leftright commutative cuts.
 : of/cut _ P ([x][h] of/=>/l ([y][yh] C y yh x h) (A x h) H)
(lr/=> CC CA) (of/=>/l C' A' H)
< of/cut _ P A CA A'
< ({y}{yh} of/cut _ P (C y yh) (CC y) (C' y yh)).
We start by trying to show that
lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
< cut P Arg Arg'
< ({y} cut P (C y) (C' y)).
Is type correct. To do this we have a derivation P
that the left term is welltyped. Notice that I’ve overloaded P
here, in the rule lr/=>
P
was a term and now it’s a typing derivation for that term. Next we have a typing derivation for [x] =>/l ([y] C y x) (Arg x) H
. This is a function which takes two arguments. x
is a hypothesis, the same as in lr/=>
, however now we have h
which is a hof
derivation that h
has a type. There’s only one way to type a usage of the left rule for =>
, with of/=>/l
so we have that next.
Finally, our output is on the next line in two parts. First we have a derivation for cut
showing how to construct the “cut out term” in this case. Next we have a new typing derivation that again uses of/=>/l
. Notice that both of these depend on some terms we get from the recursive calls here.
Since we’ve gone through all the cases already and done all the thinking, I’m not going to reproduce it all here. The intuition for how cut works is really best given by the untyped version with the understand that we check that it’s correct with this theorem as we did above.
To recap here’s what we did
Hope that was a little fun, cheers!
comments powered by Disqus ]]>It’s been a while since I did one of these “read a package and write about it” posts. Part of this is that it turns out that most software is awful and writing about code I read just makes me grumpy. However I found something nice to write about! In this post I’d like to close a somewhat embarrassing gap in my knowledge: we’re going to walk through streaming library.
I know that both lists and lazyIO are kind of.. let’s say fragile but have neglected learning one of these fancy libraries that aim to solve those problems. Today we’ll be looking at one of those libraries, pipes!
Pipes provides one core type Proxy
and a few operations on it, like await
and yield
. We can pair together a pipeline of operations which can send data to their neighbors and request more data from them as they need them. With these coroutine like structures we can nicely implement efficient, streaming computations.
As always this starts by getting our hands on the code with the
~ $ cabal get pipes
~ $ cd pipes4.1.5/
Now from here we can query all the available files to see what we’re up against
~/pipes4.1.5 $ wc l **/*.hs  sort nr
4796 total
1513 src/Pipes/Tutorial.hs
854 src/Pipes/Core.hs
836 src/Pipes/Prelude.hs
517 src/Pipes.hs
380 src/Pipes/Lift.hs
272 tests/Main.hs
269 src/Pipes/Internal.hs
85 benchmarks/PreludeBench.hs
68 benchmarks/LiftBench.hs
2 Setup.hs
So the first thing I notice is that there’s this great honking module called Pipes.Tutorial
which houses a brief introduction to the pipes package. I skimmed this before starting but it doesn’t really seem to explain the implementation details.. If you don’t really know what pipes is, read this tutorial now. After doing so you have exactly my knowledge of pipes!
The next interesting module here is Pipes.Internal
. I’ve found that .Internal
modules seem to house the fundamental bits of the package so we’ll start there.
This module starts with an emphatic warning
{ This is an internal module, meaning that it is unsafe to
import unless you understand the risks. }
So this seems like a perfect place to start without really understanding this library :D It exports a few different functions and one type:
module Pipes.Internal (
 * Internal
Proxy(..)
, unsafeHoist
, observe
, X
, closed
) where
I recognize one of those types: Proxy
as the central type behind the whole pipes concept, it is the type of component in the pipe line. Let’s look at how it’s actually implemented
data Proxy a' a b' b m r
= Request a' (a > Proxy a' a b' b m r )
 Respond b (b' > Proxy a' a b' b m r )
 M (m (Proxy a' a b' b m r))
 Pure r
So two of those constructors, M
and Pure
, look pretty vanilla. The first one let’s us lift an action in the underlying monad m
, into Proxy
. It’s a little bit weird instead of having M (m r)
we instead have M (m (Proxy ...))
however this doesn’t seem like a big deal because we have Pure
to promote an r
to a Proxy .... r
. So we can lift some m r
to Proxy a' a b' b m r
with M . fmap Pure
. It’s still not clear to me why this is a benefit though.
The first two constructors are really cool though, Request
and Respond
. The first thing that pops into my head is that this looks like a sort of freemonad pattern. Look how we’ve got
Request
and Respond
are definitely actions)This would make a lot of sense, free monad transformers nicely give rise to coroutines which are very much in line with pipes. Because of this free monad like shape, I expect that the monad instance will be like free monads and behave “like substitution”. We should chase down the leaves of this Proxy
(including under lambdas) and replace each Pure r
with f r
for >>=
and Pure (f a)
for fmap
.
To check if we’re right, we go down one line
instance Monad m => Functor (Proxy a' a b' b m) where
fmap f p0 = go p0 where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > Pure (f r)
This looks like what I had in mind, we run down p
and in the first 3 branches we recurse. I’ll admit it looks a little intimidating but after staring at it for a bit I realized that the first 3 lines are all just variations on fmap go
! Indeed, we can rewrite this to
go p = case p of
Request a' fa > Request a' (fmap go fa)
Respond b fb' > Respond b (fmap go fb')
M m > M (fmap go m)
Pure r > Pure (f r)
This makes the idea a bit clearer in my mind. Let’s look at the applicative instance next!
instance Monad m => Applicative (Proxy a' a b' b m) where
pure = Pure
pf <*> px = go pf where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure f > fmap f px
(*>) = (>>)
First note that pure = Pure
which isn’t a stunner just from the naming. In <*>
we have the same sort of pattern as in fmap
. We race down the “function” side of <*>
and whenever we reach a Pure
we have a function from a > b
, with that function we call fmap
on the structure on the “argument” side. So we’re kind of gluing that px
onto that pf
by changing each Pure f
to fmap f px
.
Finally we have the monad instance. Of course the return
implementation is the same as for pure
but (>>=) = _bind
so the implementation of _bind
has been chucked out of the instance itself. It turns out there’s a good reason for that: _bind
has a bunch of rewrite rules attached to it.
_bind
:: Monad m
=> Proxy a' a b' b m r
> (r > Proxy a' a b' b m r')
> Proxy a' a b' b m r'
p0 `_bind` f = go p0 where
go p = case p of
Request a' fa > Request a' (\a > go (fa a ))
Respond b fb' > Respond b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > f r
Now excitingly the implementation of bind
is almost exactly what we had before! Now instead of Pure f > fmap f px
it’s Pure r > f r
so we have something more like substitution than gluing.
Now that Proxy
is a monad, we can make it a monad transformer!
instance MonadTrans (Proxy a' a b' b) where
lift m = M (m >>= \r > return (Pure r))
So we need to take an m a
and return Proxy a' a b' b m a
, we want to use M :: m (Proxy a' a b' b m a)
but we have an m a
, by fmap
ing Pure
we’re good to go.
From here on out it’s just a series of not so exciting MTL instances so we’ll skip those.. There’s a couple interesting things left though! Before we get to them recall the monad transformer laws
return = lift . return
m >>= f = lift m >>= (lift . f)
In other words, lift
should “commute” with the two operations of the monad type class. This isn’t actually true by default with Proxy
, for example
return a = Pure a
lift (return a) = M (fmap Pure (return a)) = M (return (Pure a))
To solve this we have observe
. This function is supposed to normalize a Proxy
so that these laws hold.
observe :: Monad m => Proxy a' a b' b m r > Proxy a' a b' b m r
observe p0 = M (go p0) where
go p = case p of
Request a' fa > return (Request a' (\a > observe (fa a )))
Respond b fb' > return (Respond b (\b' > observe (fb' b')))
M m' > m' >>= go
Pure r > return (Pure r)
Note that go
takes a Proxy a' a b' b m r
and returns m (Proxy a' a b' b m r)
. By doing this, we can route stick everything in m
with return
except for M m'
which we just unwrap and keep going. This means return (Pure a) = go (Pure a)
which is what is required for the monad transformer laws to hold.
Finally, the last thing in this file is X
which is used to represent the type for communication that cannot happen. So if we have a pipe at the beginning of the pipeline, it shouldn’t be able to ask for input from another pipe.
newtype X = X X
closed :: X > a
closed (X x) = closed x
And there are no nonbottom expressions which occupy this type so we’re good. Now that we’ve seen the internal implementation of most of Proxy
we can go look at the infrastructure pipes provides around this. Again going by the names, now that we’ve covered the internals it makes sense to move onto Pipes.Core
.
Pipes.Core
seems much closer to the actual user interface of the library, we can see that it exports a bunch of familiar sounding names:
module Pipes.Core (
Proxy
, runEffect
, respond
, (/>/)
, (//>)
, request
, (\>\)
, (>\\)
, push
, (>~>)
, (>>~)
, pull
, (>+>)
, (+>>)
, reflect
, X
, Effect
, Producer
, Pipe
, Consumer
, Client
, Server
, Effect'
, Producer'
, Consumer'
, Client'
, Server'
, (\<\)
, (/</)
, (<~<)
, (~<<)
, (<+<)
, (<\\)
, (//<)
, (<<+)
, closed
) where
Now a few of those we’ve seen before, namely Proxy
, X
, and closed
. Notice that Proxy
is exported abstractly here so we can’t write code which violates the monad transformer laws using this module.
The first new function is called runEffect
, but it has the type
Monad m => Effect m r > m r
Which sounds great! I however have no clue what an effect is so let’s dig around the type exports first. There are a few type synonyms here
type Effect = Proxy X () () X
type Producer b = Proxy X () () b
type Pipe a b = Proxy () a () b
type Consumer a = Proxy () a () X
type Client a' a = Proxy a' a () X
type Server b' b = Proxy X () b' b
type Effect' m r = forall x' x y' y . Proxy x' x y' y m r
type Producer' b m r = forall x' x . Proxy x' x () b m r
type Consumer' a m r = forall y' y . Proxy () a y' y m r
type Server' b' b m r = forall x' x . Proxy x' x b' b m r
type Client' a' a m r = forall y' y . Proxy a' a y' y m r
Even though this looks like a lot, about half of these are actually duplicates which just use XRankNTypes
instead of explicitly using X
. An Effect
as seen above is Proxy X () () X
.. I had to double check this but proxy takes 6 type arguments, here they are in order
a'
is the type of things that we can send up a Request
a
is the type of things which a request will returnb'
is what we may be sent to respond tob
is what we may respond withm
is the underlying monad we may use for effectsr
is the return valueSo an Effect
can only request things if it can produce an X
, and it will get back a ()
from its requests, and it can only respond with an X
and will get back a ()
after responding. Since we can never produce an X
an Effect
can never request or respond.
Similarly, a Producer
can respond
to things with b
s, but it will only ever get back a ()
after a response and it can never request
something. A Consumer
is the dual, never responding but can request
, it can only hand the code responding a ()
though.
Also in there are Client
s and Server
s which seem to be like a Consumer
and a Producer
but that can actually send meaningful messages with a request
and receive something interesting with a respond
instead of just ()
.
Okay, with these type synonyms in mind let’s go look at some code! Since an Effect
can’t request or respond, it’s really equivalent to just some monadic action.
runEffect :: Monad m => Effect m r > m r
runEffect = go
where
go p = case p of
Request v _ > closed v
Respond v _ > closed v
M m > m >>= go
Pure r > return r
This let’s us write runEffect
which just uses the absurdity of producing a v :: X
in order to turn a Proxy
into an m
.
runEffect
is also the first function we’ve seen to actually escape the Proxy
monad as well! It let’s us convert a selfcontained pipeline into just an effect which should mean it comes up basically everywhere, just like runStateT
.
Since the Proxy
monad is abstract, we need some functions to actually be able to request things. Thus we have respond
respond :: Monad m => a > Proxy x' x a' a m a'
respond a = Respond a Pure
This is actually pretty trivial, we have a constructor after all whose job it is to Respond
to things so we just use that with the a
we have as a response. Since we have no interesting continuation yet, but we need something of type a' > Proxy x' x a' a m a'
we just use Pure
. This should be very familiar to users of free monads (remember that Pure
= return
)!
Next is something interesting, we’ve seen a lot of ways of manipulating a pipe, but never actually a way of combining two pipes so that they interact, our next function does that.
(/>/)
:: Monad m
=> (a > Proxy x' x b' b m a')
> (b > Proxy x' x c' c m b')
> (a > Proxy x' x c' c m a')
(fa />/ fb) a = fa a //> fb
Here we have two arguments, both functions to pipelines and we return a pipeline as output. Notice here that the first Proxy
is something which is going to respond
with things of type b
and expect something of type b'
in return and our second function is going to map b
s to a Proxy
which returns a b'
. This means we can replace each Respond
in the first with a call to the second function and pipe the output into our continuation for that Respond
. Indeed this matches up with the return type so I anticipate that it what shall happen. However, this function shells out to another right below it so we’ll have to look at it to confirm.
(//>)
:: Monad m
=> Proxy x' x b' b m a'
> (b > Proxy x' x c' c m b')
> Proxy x' x c' c m a'
p0 //> fb = go p0
where
go p = case p of
Request x' fx > Request x' (\x > go (fx x))
Respond b fb' > fb b >>= \b' > go (fb' b')
M m > M (m >>= \p' > return (go p'))
Pure a > Pure a
The interesting line here is Respond b fb' > ...
which does exactly what I thought it ought to (I feel clever). In that line we run the function we have in the second argument with the data the first argument was Respond
ing with. We sort of “intercept” a message intended for downstream and just handle it right there. Since we do this for all things Respond
ing with b
s we now only respond with c
s hence the change in type. It doesn’t effect the upstream type, but we can now take something producing values and transform them to instead run some other computation (perhaps producing something else).
In a limited case we can do something like
intercept :: Monad m
=> (b > c)
> Proxy a' a b' b m r
> Proxy a' a b' c m r
intercept f p = p //> respond . f
Cool! Now up next seems to be the dual of what we’ve just looked at.
request :: Monad m => a' > Proxy a' a y' y m a
request a' = Request a' Pure
This is just what we had with respond
but using Request
instead. Similarly we ahve a counterpart for />/
. It again shells out to a similar, pointful, function >\\
(\>\)
:: Monad m
=> (b' > Proxy a' a y' y m b)
> (c' > Proxy b' b y' y m c)
> (c' > Proxy a' a y' y m c)
(fb' \>\ fc') c' = fb' >\\ fc' c'
(>\\)
:: Monad m
=> (b' > Proxy a' a y' y m b)
> Proxy b' b y' y m c
> Proxy a' a y' y m c
fb' >\\ p0 = go p0
where
go p = case p of
Request b' fb > fb' b' >>= \b > go (fb b)
Respond x fx' > Respond x (\x' > go (fx' x'))
M m > M (m >>= \p' > return (go p'))
Pure a > Pure a
I’d expect that this function does sort of what the other did before. It’ll take Request
s and “answer” them inline by replacing it with a call to the other function. In fact, when you think about what the hell is the difference between a request and a response? They’re completely symmetric! They both transmit information sending one type in one direction and one type in another. So we should have exactly the same code that just happens to use Request
instead of Respond
. which is indeed what we have.
The only real difference here is in the argument order which hints at the fact that we’re going to break symmetry sooner or later, it just hasn’t happened yet.
Next up is
push :: Monad m => a > Proxy a' a a' a m r
push = go
where
go a = Respond a (\a' > Request a' go)
push
takes a seed a
and chucks it down the pipeline. Once it gets a response, it throws it up the pipeline with Request
and when it gets a response (something of type a
) it starts the whole process over again. Now the process starts by sending values down, there’s no reason why we can’t do the reverse and start by asking for a value
pull :: Monad m => Proxy a' a a' a m r
pull = go
where
go a' = Request a' (\a > Respond a go)
which conveniently is right near by. Now push
and pull
each give rise to a form of composition which takes two Proxy
s and glues them together. The first is
(>~>)
:: Monad m
=> (_a > Proxy a' a b' b m r)
> ( b > Proxy b' b c' c m r)
> (_a > Proxy a' a c' c m r)
This takes two Proxy
s which can communicate with each other and gives back a Proxy
which has internalized this dialogue. This shells out to the pointful version, >>~
(>>~)
:: Monad m
=> Proxy a' a b' b m r
> (b > Proxy b' b c' c m r)
> Proxy a' a c' c m r
p >>~ fb = case p of
Request a' fa > Request a' (\a > fa a >>~ fb)
Respond b fb' > fb' +>> fb b
M m > M (m >>= \p' > return (p' >>~ fb))
Pure r > Pure r
For this code we walk down the tree and recurse in all cases except where we have a Response
. This should send some information to that function we got as an argument and then use that response to continue, so we want some way of taking a Proxy b' b c' c m r
and a b' > Proxy a' a b' b m r
and giving back a Proxy a' a c' c m r
. This looks like the exact dual to >>~
and indeed is the equivalent in the pull
version.
(+>>)
:: Monad m
=> (b' > Proxy a' a b' b m r)
> Proxy b' b c' c m r
> Proxy a' a c' c m r
fb' +>> p = case p of
Request b' fb > fb' b' >>~ fb
Respond c fc' > Respond c (\c' > fb' +>> fc' c')
M m > M (m >>= \p' > return (fb' +>> p'))
Pure r > Pure r
This does the exact opposite of >>~
. It walks around recursing until we get to a Request
, this should transfer control up to that function b' > Proxy ...
and it does by calling >>~
. So these two operators >>+
and >>~
work together to join up to Proxy
functions by using one to answer the other’s Request
and Respond
s. The symmetry breaking here is who should we inspect “first” so to speak. If we start with the upstream one than the second one is only run when a value is push
ed down to it and if we start with the former we only run the upstream version when we pull
something from it. Nifty.
One thing to note, what happens when one of these Proxy
s give up and return
? This potential situation is reflected in the fact that both of these Proxy
s must return an r
. Therefore, whenever one of these returns and we’re currently running it (the upstream for >>~
, downstream for >>+
) we can just return the value and be done with the whole thing. In this sense composing a Proxy
has this short circuiting property, at any point in the pipeline you can just give up and return
something!
Remember before how I was ranting about how Request
and Respond
were really the same damn thing, it turns out I’m not the only one who thought that
reflect :: Monad m => Proxy a' a b' b m r > Proxy b b' a a' m r
reflect = go
where
go p = case p of
Request a' fa > Respond a' (\a > go (fa a ))
Respond b fb' > Request b (\b' > go (fb' b'))
M m > M (m >>= \p' > return (go p'))
Pure r > Pure r
Looking at the type here is really telling, all we do to switch the upstream and downstream ends is swap the constructors Request
and Respond
! That actually wraps up the core of pipes, the rest is just a bunch of synonyms with the arguments flipped!
Now that we’ve finished up Pipes.Core
it’s not clear where to go so I decided to go look at the top level Pipes
module since between the .Internal
and .Core
modules we should have covered a lot of it. It turns out the top level only imports those two modules so we can now go through that!
Really the top level package Pipes
just re exports some stuff and defines some thin layers of the rest
module Pipes (
Proxy
, X
, Effect
, Effect'
, runEffect
, Producer
, Producer'
, yield
, for
, (~>)
, (<~)
, Consumer
, Consumer'
, await
, (>~)
, (~<)
, Pipe
, cat
, (>>)
, (<<)
, ListT(..)
, runListT
, Enumerable(..)
, next
, each
, every
, discard
, module Control.Monad
, module Control.Monad.IO.Class
, module Control.Monad.Trans.Class
, module Control.Monad.Morph
, module Data.Foldable
)
Now what haven’t we seen, the first thing is this yield
construct which turns out to be a snazzier name for respond
with a nicer type.
yield :: Monad m => a > Producer' a m ()
yield = respond
Similarly, for
is just a synonym (//>)
(first joiner we went through) and ~>
is the point free version. On the other end we have stuff overlaying request
and friends but they’re not quite symmetric
await :: Monad m => Consumer' a m a
await = request ()
(>~)
:: Monad m
=> Proxy a' a y' y m b
> Proxy () b y' y m c
> Proxy a' a y' y m c
p1 >~ p2 = (\() > p1) >\\ p2
So we need to cope with the fact request
can actually transfer interesting data down as well as up, in the basic case though we just assume that we’re dealing with ()
s. Also note that >~
is biased to the downstream Proxy
, it starts by running it and whenever we actually request something (by sending up a ()
) we run p1
. This function lets us compose Proxy
s, not functions to Proxy
s so that’s one nice effect.
Finally, we see our first example of a pipe
cat :: Monad m => Pipe a a m r
cat = pull ()
cat
works by requesting something upstream immediately and passing it downstream. Nothing interesting except that it combines great with other Proxy
s. Say for example we have a random number generator, we can easily create a Proxy
producing random numbers with
randoms = lift getRandomNumber >~ cat
we use >~
to replace each request
in cat
with a call to getRandomNumber
which will be immediately pushed downstream. Similarly, we can use cat
to push everything into some computation. If we want to debug a pipe by just printing everything we could say
printAll = for cat (lift . print)
So cat
is a nice way of lifting something to work across Proxy
s of values if nothing else.
Next is the common case of composing to Proxy
s,
(>>)
:: Monad m
=> Proxy a' a () b m r
> Proxy () b c' c m r
> Proxy a' a c' c m r
p1 >> p2 = (\() > p1) +>> p2
>>
makes it easy to join up to Proxy
s that don’t send any interesting data “up” with requests. >>
starts by running p2
using +>>
and whenever p2
requests something it goes and runs p1
for a while. This function lets us connect a Pipe
to Pipe
or Producer
to Consumer
for example.
Finally, we wrap up this module with the definition of ListT
inside it. Using Producer
we can define a nonbroken version of ListT
newtype ListT m a = Select { enumerate :: Producer a m () }
instance (Monad m) => Functor (ListT m) where
fmap f p = Select (for (enumerate p) (\a > yield (f a)))
instance (Monad m) => Applicative (ListT m) where
pure a = Select (yield a)
mf <*> mx = Select (
for (enumerate mf) (\f >
for (enumerate mx) (\x >
yield (f x) ) ) )
instance (Monad m) => Monad (ListT m) where
return a = Select (yield a)
m >>= f = Select (for (enumerate m) (\a > enumerate (f a)))
fail _ = mzero
What’s kinda nifty here is we just use a Producer
returning a ()
to represent our list. Here we can use for
to access every yield x
which corresponds to our “list” having an entry x
! From there this is really just the standard set of instances for a list! In particular >>=
is mapConcat
for producers. That about wraps up this module and I’ll end the blog post with it.
I didn’t actually go through all of pipes
here, just the “core operations” which everything else is built on top of. In particular, I urge you to go read how Pipes.Prelude
is implemented. Just like implementing the Haskell prelude is a good exercise the same is true of pipes.
It turned out that pipes
isn’t all that awful on the inside, it’s a library built around a specific freemonad like structure which a couple different methods of joining two computations together. In particular there were a few different notions of composition which really define pipes
>>=
Respond
s with another function using //>
or for
in noninfix speakRequest
s with another function using >\\
>>+
and >>~
Hope you learned as much as I did, cheers.
comments powered by Disqus ]]>I’m a fan of articles like this one which set out to explain a really complicated subject in 600 words or less. I wanted to write one with a similar goal for compiling a language like Haskell. To help with this I’ve broken down what most compilers for a lazy language do into 5 different phases and spent 200 words explaining how they work. This isn’t really intended to be a tutorial on how to implement a compiler, I just want to make it less magical.
I assume that you know how a lazy functional language looks (this isn’t a tutorial on Haskell) and a little about how your machine works since I make a few references to how some lower level details are compiled. These will make more sense if you know such things, but they’re not necessary.
And the wordcountclock starts… now.
Our interactions with compilers usually involve treating them as a huge function from string to string. We give them a string (our program) and it gives us back a string (the compiled code). However, on the inside the compiler does all sorts of stuff to that string we gave it and most of those operations are inconvenient to do as string operations. In the first part of the compiler, we convert the string into an abstract syntax tree. This is a data structure in the compiler which represents the string, but in
The process of going String > AST is called “parsing”. It has a lot of (kinda stuffy IMO) theory behind it. This is the only part of the compiler where the syntax actually matters and is usually the smallest part of the compiler.
Examples:
Now that we’ve constructed an abstract syntax tree we want to make sure that the program “makes sense”. Here “make sense” just means that the program’s types are correct. The process for checking that a program type checks involves following a bunch of rules of the form “A has type T if B has type T1 and C has type…”. All of these rules together constitute the type system for our language. As an example, in Haskell f a
has the type T2
if f
has the type T1 > T2
and a
has the type T1
.
There’s a small wrinkle in this story though: most languages require some type inference. This makes things 10x harder because we have to figure the types of everything as we go! Type inference isn’t even possible in a lot of languages and some clever contortions are often needed to be inferrable.
However, once we’ve done all of this the program is correct enough to compile. Past type checking, if the compiler raises an error it’s a compiler bug.
Examples:
Now that we’re free of the constraints of having to report errors to the user things really get fun in the compiler. Now we start simplifying the language by converting a language feature into a mess of other, simpler language features. Sometimes we convert several features into specific instances of one more general feature. For example, we might convert our big fancy pattern language into a simpler one by elaborating each case
into a bunch of nested case
s.
Each time we remove a feature we end up with a slightly different language. This progression of languages in the compiler are called the “intermediate languages” (ILs). Each of these ILs have their own AST as well! In a good compiler we’ll have a lot of ILs as it makes the compiler much more maintainable.
An important part of choosing an IL is making it amenable to various optimizations. When the compiler is working with each IL it applies a set of optimizations to the program. For example
1 + 1
to 2
during compile timeExamples:
At some point in the compiler, we have to deal with the fact we’re compiling a lazy language. One nice way is to use a spineless, tagless, graph machine (STG machine).
How an STG machine works is a little complicated but here’s the gist
During this portion of the compiler, we’d transform out last IL into a Clike language which actually works in terms of pushing, popping, and entering closures.
The key idea here that makes laziness work is that a closure defers work! It’s not a value, it’s a recipe for how to compute a value when we need it. Also note, all calls are tail calls since function calls are just a special case of entering a closure.
Another really beautiful idea in the STG machine is that closures evaluate themselves. This means closures present a uniform interface no matter what, all the details are hidden in that bundled up code. (I’m totally out of words to say this, but screw it it’s really cool).
Examples:
Finally, after converting to compiling STG machine we’re ready to output the target code. This bit is very dependent on what exactly we’re targeting.
If we’re targeting assembly, we have a few things to do. First, we have to switch from using variables to registers. This process is called register allocation and we basically slot each variable into an available register. If we run out, we store variables in memory and load it in as we need it.
In addition to register allocation, we have to compile those Clike language constructs to assembly. This means converting procedures into a label and some instructions, pattern matches into something like a jump table and so on. This is also where we’d apply lowlevel, bittwiddling optimizations.
Examples:
Okay, clock off.
Hopefully that was helpful even if you don’t care that much about lazy languages (most of these ideas apply in any compiler). In particular, I hope that you now believe me when I say that lazy languages aren’t magical. In fact, the worry of how to implement laziness only really came up in one section of the compiler!
Now I have a question for you dear reader, what should I elaborate on? With summer ahead, I’ll have some free time soon. Is there anything else that you would like to see written about? (Just not parsing please)
comments powered by Disqus ]]>