Code and Co

Runtime Tagging

Danny Gratzer — Mon, 12 Oct 2015 00:00:00 UT

Posted on October 12, 2015

Tags: sml

In this post I’d just like to walk through some fun code, nothing particularly theory-y. The code I’d like to go through is a simple little module in ML that lets you easily construct “dynamic” types. This isn’t through the usual “really big sum of products” approach but instead is completely open and can be extended for every new defined type (at runtime).

The Basic Idea

The basic idea behind this trick hinges on how exceptions work in SML. Well, really it’s not about exceptions so much as what exceptions work with. In ML we can declare new exceptions like this

    exception Foo of tyarg

and this gives us a new exception constructor Foo and we can raise and handle it like you would expect

    (raise (Foo 1)) handle Foo x => x

But what’s particularly interesting is that Foo actually has a type. Really it’s just a constructor for a special type exn. This means we can do things like pass around exception constructors, apply them, etc, etc.

exn is what we might call an extensible data type, we can extend it arbitrarily. We could imagine allowing users to define their own such types but in SML we’ve just go the one. The reason we even have this one is because it’s a great choice if you can only allow one type to be raised and handled.

What we’re going to do is use the fact that we can generate new extensions to exn at run time to create an exn based structure providing a way to implement “tags”. Once we have these tags we’ll be able to implement a pair of functions

    val tag   : 'a tag -> 'a -> dynamic
    val untag : 'a tag -> dynamic -> 'a option

So tags let us “forget” the type of some expression and treat it as some dynamic blob to be recovered at some time in the future. Concretely, we’d like to implement this signature

    signature TAG =
    sig
      type dynamic
      type 'a tag

      val new    : unit -> 'a tag
      val tag    : 'a tag -> 'a -> dynamic
      val untag  : 'a tag -> dynamic -> 'a option
    end

The Implementation

So let’s start implementing the thing. First we need to decide what the type dynamic should be. I propose that it should be exn. The reason being that we can always extend exn in various ways so if we implement things with dynamic = exn we’ll have the ability to make dynamic “grow a new branch” to accommodate whatever we’re working with.

    structure Tag :> TAG =
    struct
      type dynamic = exn
    end

Ok, so what should tag be? Well it’s going to be type indexed obviously so that we can even talk about the signatures of (un)tag, but more importantly its purpose should be to tell us how to package something up into an exn so we can get it back out. The downside of this whole extensible data type thing is that if we forget about the constructor we used to make an exn it’s just lost forever! A tag will make sure that once we make a constructor to use with dynamic we won’t find ourselves with a dynamic and no way to inspect it.

The best way I can think of for doing this is to just back the (un)tag operations straight into the implementation of the type.

    structure Tag :> TAG =
    struct
      type dynamic = exn
      type 'a tag  = {into : 'a -> exn, out : exn -> 'a option}
    end

Now this makes it look like tags could perform arbitrary operations in the process of tagging and untagging, but really we’re going to implement it so it’s all very simple and efficient.

In particular, we’re now in a position to define our three core operators

    structure Tag :> TAG =
    struct
      type dynamic = exn
      type 'a tag  = {into : 'a -> exn, out : exn -> 'a option}

      fun new () : 'a tag =
        let
          exception Fresh of 'a
        in
          { into = Fresh
          , out = fn e =>
              case e of Fresh a => SOME a | _ => NONE
          }
        end

      fun tag {into, out} = into
      fun untag {into, out} = out
    end

Now tag and untag are pretty simple because we basically implemented them up in new so let’s look carefully at that. We start by first minting a new constructor for exn. We know that this will not clash with any other exception in existence, no one else can raise it or handle it unless we explicitly give them this constructor. Now while we have access to it, we bundle the constructor into the tag record we’re making.

into is quite easy to implement because it’s just constructor application. out is also straightforward, all we do is pattern match to see if the given exn is correct. All we do in the actual matching bit is see if we’ve been given something made with our Fresh constructor and return the included a if we did. The handling everything else is important, otherwise this would explode horribly every time we failed to untag something.

And there’s a nice way of implementing the same sort of run time typing you get in dynamic languages in SML. One nice advantage of this over the usual

    datatype dynamic = INT of int | STRING of string | ...

approach is we can always extend our dynamic with user defined types. So we can do something like

    datatype foo = Foo of int
    val fooTag = Tag.new () : foo tag
    val d = Tag.tag fooTag (Foo 2)
    val SOME (Foo 2) = Tag.untag fooTag d

Wrap Up

There you go, this is just a very short post on a very short piece of code that let’s us do something fun. Some nice things you can do now

Use the inherently recursive nature of dynamic to write an infinite loop without direct recursion
Do the same thing, but without using exn and using the generative effect of allocating a reference instead
etc

comments powered by Disqus

Two Different Flavors of Type Theory

Danny Gratzer — Sun, 27 Sep 2015 00:00:00 UT

Posted on September 27, 2015

Tags: types

So summer seems to be about over. I’m very happy with mine, I learned quite a lot. In particular over the last few months I’ve been reading and fiddling with a different kind of type theory than I was used to: computational type theory. This is the type theory that underlies Nuprl (or JonPRL cough cough).

One thing that stood out to me was that you could do all these absolutely crazy things in this system that seemed impossible after 3 years of Coq and Agda. In this post I’d like to sketch some of the philosophical differences between CTT and a type theory more in the spirit of CiC.

Formal Type Theory and Props-as-Types #1

First things first, let’s go over the more familiar notion of type theory. To develop one of these type theories you start by discussing some syntax. You lay out the syntax for some types and some terms

A ::= Σ x : A. A | Π x : A. A | ⊤ | ⊥ | ...
M ::= M M | λ x : A. M |  | π₁ M | ⋆ | ...

And now we want to describe the all important M : A relation. This tells us that some term has some type. It’s is inductively defined from a finite set of inferences. Ideally, it’s even decidable for philosophical reasons I’ve never cared too much about. In fact, it’s this relation that really governs our whole type theory, everything else is going to stem from this.

As an afterthought, we may decide that we want to identify certain terms which other terms this is called definitional equality. It’s another inductively defined (and decidable) judgment M ≡ N : A. Two quick things to note here

Definitional equality is completely arbitrary; it exists in the way it does because we defined it that way and for no other reason
The complexity of proving M ≡ N : A is independent of the complexity of A

The last point is some concern because it means that equality for functions is never going to be right for what we want. We have this uniformly complex judgment M ≡ N : A but when A = Π x : B. C the complexity should be greater and dependent on the complexity of B and C. That’s how it works in math after all, equality at functions is defined pointwise, something we can’t really do here if ≡ is to be decidable or just be of the same complexity no matter the type.

Now we can do lots of things with our theory. One thing we almost always want to do is now go back and build an operational semantics for our terms. This operational semantics should be some judgment M ↦ M with the property that M ↦ N will imply that M ≡ N. This gives us some computational flavor in our type theory and lets us run the pieces of syntax we carved out with M : A.

But these terms that we’ve written down aren’t really programs. They’re just serializations of the collections of rules we’ve applied to prove a proposition. There’s no ingrained notion of “running” an M since it’s built on after the fact. What we have instead is this ≡ relation which just specifies which symbols we consider equivalent but even it is was defined arbitrarily. There’s no reason we ≡ needs to be a reasonable term rewriting system or anything. If we’re good at our jobs it will be, sometimes (HoTT) it’s not completely clear what that computation system is even though we’re working to find it. So I’d describe a (good) formal type theory as an axiomatic system like any other that we can add a computational flavor to.

This leads to the first interpretation of the props-as-types correspondence. This states that the inductively defined judgments of a logic give rise to a type theory whose terms are proof terms for those same inductively defined judgments. It’s an identification of similar looking syntactic systems. It’s useful to be sure if you want to develop a formal type theory, but it gives us less insight into the computational nature of a logic because we’ve reflected into a type theory which we have no reason to suspect has a reasonable computational characterization.

Behavioural/Computational Type Theory and Props-as-Types #2

Now we can look at a second flavor of type theory. In this setting the way we order our system is very different. We start with an programming language, a collection of terms and an untyped evaluation relation between them. We don’t necessarily care about all of what’s in the language. As we define types later we’ll say things like “Well, the system has to include at least X” but we don’t need to exhaustively specify all of the system. It follows that we have actually no clue when defining the type theory how things compute. They just compute somehow. We don’t really even want the system to be strongly normalizing, it’s perfectly valid to take the lambda calculus or Perl (PerlPRL!).

So we have some terms and ↦, on top of this we start by defining a notion of equality between terms. This equality is purely computational and has no notion of types yet (like M ≡ N : A) because we have no types yet. This equality is sometimes denoted ~, we usually define it as M ~ N if and only if M ↦ O(Ms) if and only if N ↦ O(Ns) and if they terminate than Ms ~ Ns. By this I mean that two terms are the same if they compute in the same way, either by diverging or running to the same value built from ~ equal components. For more on this, you could read Howe’s paper.

So now we still have a type theory with no types.. To fix this we go off an define inferences to answer three questions.

What other values denote types equal to it? (A = B)
What values are in the type? (a ∈ A)
What values are considered equal at that type? (a = b ∈ A)

The first questions is usually answered in a boring way, for instance, we would say that Π x : A. B = Π x : A'. B' if we know that A = A' and B = B' under the assumption that we have some x ∈ A. We then specify two and three. There we just give the rules for demonstrating that some value, which is a program existing entirely independently of the type we’re building, is in the type. Continuing with functions, we might state that

  e x ∈ B (x ∈ A)
———————————————————
  e ∈ Π x : A. B

Here I’m using _ (_) as syntax for a hypothetical judgment, we have to know that e ∈ B under the assumption that we know that x ∈ A. Next we have to decide what it means for two values to be equal as functions. We’re going to do this behaviourally, by specifying that they behave as equal programs when used as functions. Since we use functions by applying them all we have to do is specify that they behave equally on application

 v x = v' x ∈ B (x ∈ A)
————————————————————————
  v = v' ∈ Π x : A. B

Equality is determined on a per type basis. Furthermore, it’s allowed to use the equality of smaller types in its definition. This means that when defining equality for Π x : A. B we get to use the equalities for A and B! We make no attempt to maintain either decidability or uniform complexity in the collections of terms specified by _ = _ ∈ _ as we did with ≡. As another example, let’s have a look at the equality type.

 A = A'  a = a' ∈ A  b = b' ∈ A
 ————————————————————————————————
    I(a; b; A) = I(a'; b'; A')


   a = b ∈ A
 ——————————————
 ⋆ ∈ I(a; b; A)

     a = b ∈ A
 ——————————————————
 ⋆ = ⋆ ∈ I(a; b; A)

Things to notice here, first off the various rules depend on the rules governing membership and equality in A as we should expect. Secondly, ⋆ (the canonical occupant of I(...)) has no type information. There’s no way to reconstruct whatever reasoning went into proving a = b ∈ A because there’s no computational content in it. The thing on the left of the ∈ only describes the portions of our proof that involve computation and equalities in computational type theory are always computationally trivial. Therefore, they get the same witness no matter the proof, no matter the types involved. Finally, the infamous equality reflection rule is really just the principle of inversion that we’re allowed to use in reasoning about hypothetical judgments.

This leads us to the second cast of props-as-types. This one states that constructive proof has computational character. Every proof that we write in a logic like this gives us back an (untyped) program which we can run as appropriate for the theorem we’ve proven. This is the idea behind Kleene’s realizability model. Similar to what we’d do with a logical relation we define what each type means by defining the class of appropriate programs that fit its specification. For example, we defined functions to be the class of things that apply and proofs of equality are ⋆ when the equality is true and there are no proofs when it’s false. Another way of phrasing this correspondence is types-as-specs. Types are used to identify a collection of terms that may be used in some particular way instead of merely specifying the syntax of their terms. To read a bit more about this see Stuart Allen and Bob Harper’s work on the do a good job of explaining how this plays out for type theory.

Building Proof Assistants

A lot of the ways we actually interact with type theories is not on the blackboard but through some proof assistant which mechanizes the tedious aspects of using a type theory. For formal type theory this is particularly natural. It’s decidable whether M : A holds so the user just writes a term and says “Hey this is a proof of A” and the computer can take care of all the work of checking it. This is the basic experience we get with Coq, Agda, Idris, and others. Even ≡ is handled without us thinking about it.

With computational type theory life is a little sadder. We can’t just write terms like we would for a formal type theory because M ∈ A isn’t decidable! We need to help guide the computer through the process of validating that our term is well typed. This is the price we pay for having an exceptionally rich notion of M = N ∈ A and M ∈ A, there isn’t a snowball’s chance in hell of it being decidable ¹. To make this work we switch gears and instead of trying to construct terms we start working with what’s called a program refinement logic, a PRL. A PRL is basically a sequent calculus with a central judgment of

H ≫ A ◁ e

This is going to be set up so that H ⊢ e ∈ A holds, but there’s a crucial difference. With ∈ everything was an input. To mechanize it we would write a function accepting a context and two terms and checking whether one is a member of the other. With H ≫ A ◁ e only H and A are inputs, e should be thought of as an output. What we’ll do with this judgment is work with a tactic language to construct a derivation of H ≫ A without even really thinking with that ◁ e and the system will use our proof to construct the term for us. So in Agda when I want to write a sorting function what I might do is say

    sort : List Nat → List Nat
    sort xs = ...

I just give the definition and Agda is going to do the grunt work to make sure that I don’t apply a nat to a string or something equally nutty. In a system like (Jon|Nu|Meta|λ)prl what we do instead is define the type that our sorting function ought to have and use tactics to prove the existence of a realizer for it. By default we don’t really specify what exactly that realizer. For example, if I was writing JonPRL maybe I’d say

    ||| Somehow this says a list of nats is a sorted version of another
    Operator sorting : (0; 0).

    Theorem sort : [(xs : List Nat) {ys : List Nat | is-sorting(ys; xs)}] {
      ||| Tactics go here.
    }

I specify a sufficiently strong type so that if I can construct a realizer for it then I clearly have constructed a sorting algorithm. Of course we have tactics which let us say things “I want to use this realizer” and then we have to go off and show that the candidate realizer is a validate realizer. In that situation we’re actually acting as a type checker, constructing a derivation implying e ∈ A.

Wrap Up

Well, that’s this summer in a nutshell. Before I finish I had one more possible look on things. Computational type theory is not concerned with something being provable in an axiomatic system, rather it’s about describing constructions. Brouwer’s core idea is that a proof is a mental construction and computational type theory is a system for proving that a particular a computable process actually builds the correct object. It’s a translation of Brouwer’s notion of proof into terms a computer scientist might be interested in.

To be clear, this is the chance of the snowball not melting. Not the snowball’s chances of being able to decide whether or not M ∈ A holds. Though I suppose they’re roughly the same.↩

comments powered by Disqus

Type is not in Type

Danny Gratzer — Wed, 26 Aug 2015 00:00:00 UT

Posted on August 26, 2015

Tags: jonprl, types, haskell

I was reading a recent proposal to merge types and kinds in Haskell to start the transition to dependently typed Haskell. One thing that caught my eye as I was reading it was that this proposal adds * :: * to the type system. This is of some significance because it means that once this is fully realized, Haskell will be inconsistent (as a logic) in a new way! Of course, this isn’t a huge deal since Haskell is already woefully inconsistent with

unsafePerformIO
Recursive bindings
Recursive types
Exceptions
…

So it’s not like we’ll be entering new territory here. All that it means is that there’s a new way to inhabit every type in Haskell. If you were using Haskell as a proof assistant you were already in for a rude awakening I’m afraid :)

This is an issue of significance though for languages like Idris or Agda where such a thing would actually render proofs useless. Famously, Martin-Löf’s original type theory did have Type : Type (or * :: * in Haskell spelling) and Girard managed to derive a contradiction (Girard’s paradox). I’ve always been told that the particulars of this construction are a little bit complicated but to remember that Type : Type is bad.

In this post I’d like to prove that Type : Type is a contradiction in JonPRL. This is a little interesting because in most proof assistants this would work in two steps

Hack the compiler to add the rule Type : Type
Construct a contradiction and check it with the modified compiler

OK to be fair, in something like Agda you could use the compiler hacking they’ve already done and just say {-# OPTIONS --set-in-set #-} or whatever the flag is. The spirit of the development is the same though

In JonPRL, I’m just going to prove this as a regular implication. We have a proposition which internalizes membership and I’ll demonstrate not(member(U{i}; U{i})) is provable (U{i} is how we say Type in JonPRL). It’s the same logic as we had before.

Background on JonPRL

Before we can really get to the proof we want to talk about, we should go through some of the more advanced features of JonPRL we need to use.

JonPRL is a little different than most proof assistants, for example We can define a type of all closed terms in our language and whose equality is purely computational. This type is base. To prove that =(a; b; base) holds you have to prove ceq(a; b), the finest grain equality in JonPRL. Two terms are ceq if they

Both diverge, or
Run to the same outermost form and have ceq components

What’s particularly exciting is that you can substitute any term for any other term ceq to it, no matter at what type it’s being used and under what hypotheses. In fact, the reduce tactic (which performs beta reductions) can conceptually be thought of as substituting a bunch of terms for their weak-head-normal forms which are ceq to the original terms. The relevant literature behind this is found in Doug Howe’s “Equality in a Lazy Computation System”. There’s more in JonPRL in this regard, we also have the asymmetric version of ceq (called approx) but we won’t need it today.

Next, let’s talk about the image type. This is a type constructor with the following formation rule:

 H ⊢ A : U{i}        H ⊢ f : base
 —————————————————————————————————
      H ⊢ image(A; f) : U{i}

So here A is a type and f is anything. Things are going to be equal image if we can prove that they’re of the form f w and f w' where w = w' ∈ A. So image gives us the codomain (range) of a function. What’s pretty crazy about this is that it’s not just the range of some function A → B, we don’t really need a whole new type for that. It’s the range of literally any closed term we can apply. We can take the range of the Y combinator over pi types. We can take the range of lam(x. ⊥) over unit, anything we want!

This construct lets us define some really incredible things as a user of JonPRL. For example, the “squash” of a type is supposed to be a type which is occupied by <> (and only <>) if and only if there was an occupant of the original type. You can define these in HoTT with higher inductive types. Or, you can define these in this type theory as

    Operator squash : (0).
    [squash(A)] =def= [image(A; lam(x. <>))]

x ∈ squash(A) if and only if we can construct an a so that a ∈ A and lam(x. <>) a ~ x. Clearly x must be <> and we can construct such an a if and only if A is nonempty.

We can also define the set-union of two types. Something is supposed to be in the set union if and only if it’s in one or the other. Two define such a thing with an image type we have

    Operator union : (0).
    [union(A; B)] =def= [image((x : unit + unit) * decide(x; _.A; _.B); lam(x.snd(x)))]

This one is a bit more complicated. The domain of things we’re applying our function to this time is

    (x : unit + unit) * decide(x; _.A; _.B)

This is a dependent pair, sometimes called a Σ type. The first component is a boolean; if it is true the second component is of type A, and otherwise it’s of type B. So for every term of type A or B, there’s a term of this Σ type. In fact, we can recover that original term of type A or B by just grabbing the second component of the term! We don’t have to worry about the type of such an operation because we’re not creating something with a function type, just something in base.

unions let us define an absolutely critical admissible rule in our system. JonPRL has this propositional reflection of the equality judgment and membership, but in Martin-Löf’s type theory, membership is non-negatable. By this I mean that if we have some a so that a = a ∈ A doesn’t hold, we won’t be able to prove =(a; a; A) -> void. See in order to prove such a thing we first have to prove that =(a; b; A) -> void is a type, which means proving that =(a; a; A) is a type.

In order to prove that =(a; b; A) is a proposition we have to prove =(a; a; A), =(b; b; A), and =(A; A; U{i}). The process of proving these will actually also show that the corresponding judgments, a ∈ A, b ∈ A, and A ∈ U{i} hold.

However, in the case that a and b are the same term this is just the same as proving =(a; b; A)! So =(a; a; A) is a proposition only if it’s true. However, we can add a rule that says that =(a; b; A) is a proposition if a = a ∈ (A ∪ base) and similarly for b! This fixes our negatibility issue because we can just prove that =(a; a; base), something that may be true even if a is not equal in A. Before having a function take a member(...) was useless (member(a; A) is just thin sugar for =(a; a; A)! member(a; A) is a proposition if and only if a = a ∈ A holds, in other words, it’s a proposition if and only if it’s true! With this new rule, we can prove member(a; A) is a proposition if A ∈ U{i} and a ∈ base, a much weaker set of conditions that are almost always true. We can apply this special rule in JonPRL with eq-eq-base instead of just eq-cd like the rest of our equality rules.

The Main Result

Now let’s actually begin proving Russell’s paradox. To start with some notation.

    Infix 20 "∈" := member.
    Infix 40 "~" := ceq.
    Infix 60 "∪" := bunion.
    Prefix 40 "¬" := not.

This let’s us say a ∈ b instead of member(a; b). JonPRL recently grew this ability to add transparent notation to terms, it makes our theorems a lot prettier.

Next we define the central term to our proof:

    Operator Russell : ().
    [Russell] =def= [{x : U{i} | ¬ (x ∈ x)}]

Here we’ve defined Russell as shorthand for a subset type, in particular a subset of U{i} (the universe of types). x ∈ Russell if x ∈ U{i} and ¬ (x ∈ x). Now normally we won’t be able to prove that this is a type (specifically x ∈ x is going to be a problem), but in our case we’ll have some help from an assumption that U{i} ∈ U{i}.

Now we begin to define a small set of tactics that we’ll want. These tactics are really where the fiddly bits of using JonPRL’s tactic system come into play. If you’re just reading this for the intuition as to why Type ∈ Type is bad just skip this. You’ll still understand the construction even if you don’t understand these bits of the proof.

First we have a tactic which finds an occurrence of H : A + B in the context and eliminate it. This gives us two goals, one with an A and one with a B. To do this we use match, which gives us something like match goal with in Coq.

    Tactic break-plus {
      @{ [H : _ + _ |- _] => elim ; thin  }
    }.

Note the syntax [H : ... |- ...] to match on a sequent. In particular here we just have _ + _ and _. Next we have a tactic bunion-eq-right. It’s to help us work with bunions (unions). Basically it turns =(M; N; bunion(A; B)) into

    =(lam(x.snd(x)) <<>, M>; lam(x.snd(x)) <<>, N>; bunion(A; B))

This is actually helpful because it turns out that once we unfold bunion we have to prove that M and N are in an image type, remember that bunion is just a thin layer of sugar on top of image types. In order to prove something is in the image type it needs to be of the form f a where f in our case is lam(x. snd(x)).

This is done with

    Tactic bunion-eq-right {
      @{ [|- =(M; N; L ∪ R)] =>
           csubst [M ~ lam(x. snd(x)) ), M>] [h.=(h;_;_)];
           aux { unfold ; reduce; auto };
           csubst [N ~ lam(x. snd(x)) ), N>] [h.=(_;h;_)];
           aux { unfold ; reduce; auto };
      }
    }.

The key here is csubst. It takes a ceq as its first argument and a “targeting”. It then tries to replace each occurrence of the left side of the equality with the right. To find each occurrence the targeting maps a variable to each occurrence. We’re allowed to use wildcards in the targeting as well. It also relegates actually proving the equality into a new subgoal. It’s easy enough to prove so we demonstrate it with aux {unfold ; reduce; auto}.

We only need to apply this tactic after eq-eq-base, this applies that rule I mentioned earlier about proving equalities to be well-formed in a much more liberal environment. Therefore we wrap those two tactics into one more convenient package.

    Tactic eq-base-tac {
      @{ [|- =(=(M; N; A); =(M'; N'; A'); _)] =>
           eq-eq-base; auto;
           bunion-eq-right; unfold 
       }
    }.

There is one last tactic in this series, this one to prove that member(X; X) ∈ U{i'} is well-formed (a type). It starts by unfolding member into =(=(X; X; X); =(X; X; X); U{i}) and then applying the new tactic. Then we do other things. These things aren’t pretty. I suggest we just ignore them.

    Tactic impredicativity-wf-tac {
      unfold ; eq-base-tac;
      eq-cd; ?{@{[|- =(_; _; base)] => auto}};
      eq-cd @i'; ?{break-plus}; reduce; auto
    }.

Finally we have a tactic to prove that if we have not(P) and P existing in the context proves void. This is another nice application match

    Tactic contradiction {
      unfold ;
      @{ [H : P -> void, H' : P |- void] =>
           elim  [H'];
           unfold ;
           auto
       }
    }.

We start by unfolding not and implies. This gives us P -> void and P. From there, we just apply one to the other giving us a void as we wanted.

We’re now ready to prove our theorem. We start with

    Theorem type-not-in-type : [¬ (U{i} ∈ U{i})] {
    }.

We now have the main subgoal

Remaining subgoals:

[main] ⊢ not(member(U{i}; U{i}))

We can start by unfold not and implies. Remember that not isn’t a built in thing, it’s just sugar. By unfolding it we get the more primitive form, something that actually apply the intro tactic to.

    {
      unfold ; intro
    }

Once unfolded, we’d get a goal along the lines of member(U{i}; U{i}) -> void. We immediately apply intro to this though. Now we have two subgoals; one is the result of applying intro, namely a hypothesis x : member(U{i}; U{i}) and a goal void. The second subgoal is the “well-formedness” obligation.

We have to prove that member(U{i}; U{i}) is a type in order to apply the intro tactic. This is a crucial difference between Coq-like systems and these proof-refinement logics. The process of demonstrating that what you’re proving is a proposition is intermingled with actually constructing the proof. It means you get to apply all the normal mathematical tools you have for proving things to be true in order to prove that they’re types. This gives us a lot of flexibility, but at the cost of sometimes annoying subgoals. They’re annotated with [aux] (as opposed to [main]). This means we can target them all at once using with the aux tactics.

To summarize that whole paragraph as JonPRL would say it, our proof state is

[main]
1. x : member(U{i}; U{i})
⊢ void

[aux] ⊢ member(member(U{i}; U{i}); U{i'})

Let’s get rid of that auxiliary subgoal using that impredictivity-wf-tac, this subgoal is in fact exactly what it was made for.

    {
      unfold ; intro
      aux { impredicativity-wf-tac };
    }

This picks off that [aux] goal leaving us with just

[main]
1. x : member(U{i}; U{i})
⊢ void

Now we need to prove some lemmas. They state that Russell is actually a type. This is possible to do here and only here because we’ll need to actually use x in the process of proving this. It’s a very nice example of what explicitly proving well-formedness can give you! After all, the process of demonstrating that Russell is a type is nontrivial and only possible in this hypothetical context, rather than just hoping that JonPRL is clever enough to figure that out for itself we get to demonstrate it locally.

We’re going to use the assert tactic to get these lemmas. This lets us state a term, prove it as a subgoal and use it as a hypothesis in the main goal. If you’re logically minded, it’s cut.

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
    }

The thing in angle brackets is the name it will get in our hypothetical context for the main goal. This leaves us with two subgoals. The aux one being the assertion and the main one being allowed to assume it.

[aux]
1. x : member(U{i}; U{i})
⊢ member(Russell; U{i})

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
⊢ void

We can prove this by basically working our way towards using impredicativity-wf-tac. We’ll use aux again to target the aux subgoal. We’ll start by unfolding everything and applying eq-cd.

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
      aux {
        unfold ; eq-cd; auto;
      };
    }

Remember that Russell is {x : U{i} | ¬ (x ∈ x)}

We just applied eq-cd to a subset type (Russell), so we get two subgoals. One says that U{i} is a type, one says that if x ∈ U{i} then ¬ (x ∈ x) is also a type. In essence this just says that a subset type is a type if both components are types. The former goal is quite straightforward so we applied auto and take care of it. Now we have one new subgoal to handle

[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(not(member(x'; x')); not(member(x'; x')); U{i})

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
⊢ void

The second subgoal is just the rest of the proof, and the first subgoal is what we want to handle. It says that if we have a type x, then not(member(x; x)) is a type (albeit in ugly notation). To prove this we have to unfold not. So we’ll do this and apply eq-cd again.

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
      aux {
        unfold ; eq-cd; auto;
        unfold ; eq-cd; auto;
      };
    }

Remember that not(P) desugars to P -> void. Applying eq-cd is going to give us two subgoals, P is a type and void is a type. However, member(void; U{i}) is pretty easy to prove, so we apply auto again which takes care of one of our two new goals. Now we just have

[main]
1. x : =(U{i}; U{i}; U{i})
2. x' : U{i}
⊢ =(member(x'; x'); member(x'; x'); U{i})

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
⊢ void

Now we’re getting to the root of the issue. We’re trying to prove that member(x'; x') is a type. This is happily handled by impredicativity-wf-tac which will use our assumption that U{i} ∈ U{i} because it’s smart like that.

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
      aux {
        unfold ; eq-cd; auto;
        unfold ; eq-cd; auto;
        impredicativity-wf-tac
      };
    }

Now we just have that main goal with the assumption russell-wf added.

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
⊢ void

Now we have a similar well-formedness goal to assert and prove. We want to prove that ∈(Russell; Russell) is a type. This is easier though; we can prove it easily using impredicativity-wf-tac.

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
      aux {
        unfold ; eq-cd; auto;
        unfold ; eq-cd; auto;
        impredicativity-wf-tac
      };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { impredicativity-wf-tac; cum @i; auto };
    }

That cum @i is a quirk of impredicativity-wf-tac. It basically means that instead of proving =(...; ...; U{i'}) we can prove =(...; ...; U{i}) since U{i} is a universe below U{i'} and all universes are cumulative.

Our goal is now

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
⊢ void

Ok, so now the reasoning can start now that we have all these well-formedness lemmas. Our proof sketch is basically as follows

Prove that Russell ∈ Russell is false. This is because if Russell was in Russell then by definition of Russell it isn’t in Russell.
Since not(Russell ∈ Russell) holds, then Russell ∈ Russell holds.
Hilarity ensues.

Here’s the first assertion:

    {
      unfold ; intro;
      aux { impredicativity-wf-tac };

      assert [Russell ∈ U{i}] ;
      aux {
        unfold ; eq-cd; auto;
        unfold ; eq-cd; auto;
        impredicativity-wf-tac
      };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { impredicativity-wf-tac; cum @i; auto };

      assert [¬ (Russell ∈ Russell)] ;
    }

Here are our subgoals:

[aux]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
⊢ not(member(Russell; Russell))

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

We want to prove that first one. To start, let’s unfold that not and move member(Russell; Russell) to the hypothesis and use it to prove void. We do this with intro.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
      }
    }

Notice that the well-formedness goal that intro generated is handled by our assumption! After all, it’s just member(Russell; Russell) ∈ U{i}, we already proved it. Now our subgoals look like this

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Here’s our clever plan

Since Russell ∈ Russell, there’s an X : Russell so that ceq(Russell; X) holds
Since X : Russell, we can unfold it to say that X : {x ∈ U{i} | ¬ (x ∈ x)}
We can apply the elimination principle for subset types to X and derive that ¬ (X ∈ X)
Rewriting by ceq(Russell; X) gives ¬ (Russell; Russell)
Now we have a contradiction

Let’s start explaining this to JonPRL by introducing that X (here called R). We’ll assert an R : Russell such that R ~ Russell. We do this using dependent pairs (here written (x : A) * B(x)).

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };
      }
    }

We’ve proven this by intro. For proving dependent products we provide an explicit witness for the first component. Basically to prove (x : A) * B(x) we say intro [Foo]. We then have a goal Foo ∈ A and B(Foo). Since subgoals are fully independent of each other, we have to give the witness for the first component upfront. It’s a little awkward, Jon’s working on it :).

In this case we use intro [Russell]. After this we have to prove that this witness has type Russell and then prove the second component holds. Happily, auto takes care of both of these obligations so intro [Russell] @i; auto handles it all.

Now we promptly eliminate this pair. It gives us two new facts, that R : Russell and R ~ Russell hold.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin 
      }
    }

This leaves our goal as

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. x' : member(Russell; Russell)
5. s : Russell
6. t : ceq(s; Russell)
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Now let’s invert on the hypothesis that s : Russell; we want to use it to conclude that ¬ (s ∈ s) holds since that will give us ¬ (R ∈ R).

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin ;
        unfold ; elim #5;
      }
    }

Now that we’ve unfolded all of those Russells our goal is a little bit harder to read, remember to mentally substitute {x : U{i} | not(member(x; x))} as Russell.

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Now we use #7 to derive that not(member(Russell; Russell)) holds.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin ;
        unfold ; elim #5;

        assert [¬ member(Russell; Russell)];
        aux {
          unfold ;
        };
      }
    }

This leaves us with 3 subgoals, the first one being the assertion.

[aux]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
⊢ not(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}))

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Now to prove this, what we need to do is substitute the unfolded Russell for x''; from there it’s immediate by assumption. We perform the substitution with chyp-subst. This takes a direction in which to substitute, which hypothesis to use, and another targeting telling us where to apply the substitution.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin ;
        unfold ; elim #5;

        assert [¬ member(Russell; Russell)];
        aux {
          unfold ;
          chyp-subst ← #8 [h. ¬ (h ∈ h)];
        };
      }
    }

This leaves us with a much more tractable goal.

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
⊢ not(member(x''; x''))

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

We’d like to just apply assumption but it’s not immediately applicable due to some technically details (basically we can only apply an assumption in a proof irrelevant context but we have to unfold Russell and introduce it to demonstrate that it’s irrelevant). So just read what’s left as a (very) convoluted assumption.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin ;
        unfold ; elim #5;

        assert [¬ (Russell; Russell)];
        aux {
          unfold ;
          chyp-subst ← #8 [h. ¬ (h ∈ h)];
          unfold 
          intro; aux { impredicativity-wf-tac };
          contradiction
        };
      }
    }

Now we’re almost through this assertion, our subgoals look like this (pay attention to 9 and 4)

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member({x:U{i} | not(member(x; x))}; U{i})
3. russell-in-russell-wf : member(member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))}); U{i})
4. x' : member({x:U{i} | not(member(x; x))}; {x:U{i} | not(member(x; x))})
5. s : {x:U{i} | not(member(x; x))}
6. x'' : U{i}
7. [t'] : not(member(x''; x''))
8. t : ceq(x''; {x:U{i} | not(member(x; x))})
9. H : not(member(Russell; Russell))
⊢ void

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Once we unfold that Russell we have an immediate contradiction so unfold ; contradiction solves it.

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux {
        unfold ;
        intro @i; aux {assumption};
        assert [(R : Russell) * R ~ Russell] ;
        aux {
          intro [Russell] @i; auto
        };

        elim ; thin ;
        unfold ; elim #5;

        assert [¬ (Russell; Russell)];
        aux {
          unfold ;
          chyp-subst ← #8 [h. ¬ (h ∈ h)];
          unfold ;
          intro; aux { impredicativity-wf-tac };
          contradiction
        };

        unfold ; contradiction
      }
    }

This takes care of this subgoal, so now we’re back on the main goal. This time though we have an extra hypothesis which will provide the leverage we need to prove our next assertion.

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ void

Now we’re going to claim that Russell is in fact a member of Russell. This will follow from the fact that we’ve proved already that Russell isn’t in Russell (yeah, it seems pretty paradoxical already).

    {
      unfold ; intro;
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux { ... };

      assert [Russell ∈ Russell];
   }

Giving us

[aux]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
⊢ member(Russell; Russell)

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
5. H : member(Russell; Russell)
⊢ void

Proving this is pretty straightforward, we only have to demonstrate that not(Russell ∈ Russell) and Russell ∈ U{i}, both of which we have as assumptions. The rest of the proof is just more well-formedness goals.

First we unfold everything and apply eq-cd. This gives us 3 subgoals, the first two are Russell ∈ U{i} and ¬(Russell ∈ Russell). Since we have these as assumptions we’ll use main {assumption}. That will target both these goals and prove them immediately. Here by using main we avoid applying this to the well-formedness goal, which in this case actually isn’t the assumption.

    {
      unfold ; intro
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux { ... };

      assert [Russell ∈ Russell];
      aux {
        unfold ; eq-cd;
        unfold ;

        main { assumption };
      };
    }

This just leaves us with one awful well-formedness goal requiring us to prove that not(=(x; x; x)) is a type if x is a type. We actually proved something similar back when we prove that Russell was well-formed. The proof is the same as then, just unfold, eq-cd and impredicativity-wf-tac. We use ?{!{auto}} to only apply auto in a subgoal where it immediately proves it. Here ?{} says “run this or do nothing” and !{} says “run this, if it succeeds stop, if it does anything else, fail”. This is not an interesting portion of the proof, don’t burn too many cycles trying to figure this out.

    {
      unfold ; intro
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux { ... };

      assert [Russell ∈ Russell] ;
      aux {
        unfold ; eq-cd;
        unfold ;

        main { assumption };
        unfold ; eq-cd; ?{!{auto}};
        impredicativity-wf-tac;
      };
    }

Now we just have the final subgoal to prove. We’re actually in a position to do so now.

[main]
1. x : member(U{i}; U{i})
2. russell-wf : member(Russell; U{i})
3. russell-in-russell-wf : member(member(Russell; Russell); U{i})
4. russell-not-in-russell : not(member(Russell; Russell))
5. russell-in-russell : member(Russell; Russell)
⊢ void

Now that we’ve shown P and not(P) hold at the same time all we need to do is apply contradiction and we’re done.

    Theorem type-not-in-type [¬ (U{i} ∈ U{i})] {
      unfold ; intro
      aux { ... };

      assert [Russell ∈ U{i}] ;
      aux { ... };

      assert [(Russell ∈ Russell) ∈ U{i}] ;
      aux { ... };

      assert [¬ (Russell ∈ Russell)] ;
      aux { ... };

      assert [Russell ∈ Russell] ;
      aux { ... };

      contradiction
    }.

And there you have it, a complete proof of Russell’s paradox fully formalized in JonPRL! We actually proved a slightly stronger result than just that the type of types cannot be in itself, we proved that at any point in the hierarchy of universes (the first of which is Type/*/whatever) if you tie it off, you’ll get a contradiction.

Wrap Up

I hope you found this proof interesting. Even if you’re not at all interested in JonPRL, it’s nice to see that allowing one to have U{i} ∈ U{i} or * :: * gives you the ability to have a type like Russell and with it, inhabit void. I also find it especially pleasing that we can prove something like this in JonPRL; it’s growing up so fast.

Thanks to Jon for greatly improving the original proof we had

comments powered by Disqus

Solving Recursive Equations

Danny Gratzer — Fri, 14 Aug 2015 00:00:00 UT

Posted on August 14, 2015

Tags: types

I wanted to write about something related to all the stuff I’ve been reading for research lately. I decided to talk about a super cool trick in a field called domain theory. It’s a method of generating a solution to a large class of recursive equations.

In order to go through this idea we’ve got some background to cover. I wanted to make this post readable even if you haven’t read too much domain theory (you do need to know what a functor/colimit is though, nothing crazy though). We’ll start with a whirlwind tutorial of the math behind domain theory. From there we’ll transform the problem of finding a solution to an equation into something categorically tractable. Finally, I’ll walk through the construction of a solution.

I decided not to show an example of applying this technique to model a language because that would warrant its own post, hopefully I’ll write about that soon :)

Basic Domain Theory

The basic idea with domain theory comes from a simple problem. Suppose we want to model the lambda calculus. We want a collection of mathematical objects D so that we can treat element of D as a function D -> D and each function D -> D as an element of D. To see why this is natural, remember that we want to turn each program E into d ∈ D. If E = λ x. E' then we need to turn the function e ↦ [e/x]E' into a term. This means D → D needs to be embeddable in D. On the other hand, we might have E = E' E'' in which case we need to turn E' into a function D → D so that we can apply it. This means we need to be able to embed D into D → D.

After this we can turn a lambda calculus program into a specific element of D and reason about its properties using the ambient mathematical tools for D. This is semantics, understanding programs by studying their meaning in some mathematical structure. In our specific case that structure is D with the isomorphism D ≅ D → D. However, there’s an issue! We know that D can’t just be a set because then there cannot be such an isomorphism! In the case where D ≅ N, then D → D ≅ R and there’s a nice proof by diagonalization that such an isomorphism cannot exist.

So what can we do? We know there are only countably many programs, but we’re trying to state that there exists an isomorphism between our programs (countable) and functions on them (uncountable). Well the issue is that we don’t really mean all functions on D, just the ones we can model as lambda terms. For example, the function which maps all divergent programs to 1 and all terminating ones to 0 need not be considered because there’s no lambda term for it! How do we consider “computable” functions though? It’s not obvious since we define computable functions using the lambda calculus, what we’re trying to model here. Let’s set aside this question for a moment.

Another question is how do we handle this program: (λ x. x x) (λ x. x x)? It doesn’t have a value after all! It doesn’t behave like a normal mathematical function because applying it to something doesn’t give us back a new term, it just runs forever! To handle this we do something really clever. We stop considering just a collection of terms and instead look at terms with an ordering relation ⊑! The idea is that ⊑ represents definedness. A program which runs to a value is more defined than a program which just loops forever. Similarly, two functions behave the same on all inputs except for 0 where one loops we could say one is more defined than the other. What we’ll do is define ⊑ abstractly and then model programs into sets with such a relation defined upon them. In order to build up this theory we need a few definitions

A partially ordered set (poset) is a set A and a binary relation ⊑ where

a ⊑ a
a ⊑ b and b ⊑ c implies a ⊑ c
a ⊑ b and b ⊑ a implies a = b

We often just denote the pair as A when the ordering is clear. With a poset A, of particular interest are chains in it. A chain is collection of elements aᵢ so that aᵢ ⊑ aⱼ if i ≤ j. For example, in the partial order of natural numbers and ≤, a chain is just a run of ascending numbers. Another fundamental concept is called a least upper bound (lub). A lub of a subset P ⊆ A is an element of x ∈ A so that y ∈ P implies y ⊑ x and if this property holds for some z also in A, then x ⊑ z. So a least upper bound is just the smallest thing bigger than the subset. This isn’t always guaranteed to exist, for example, in our poset of natural numbers N, the subset N has no upper bounds at all! When such a lub does exist, we denote it with ⊔P. Some partial orders have an interesting property, all chains in them have least upper bounds. We call this posets complete partial orders or cpos.

For example while N isn’t a cpo, ω (the natural numbers + an element greater than all of them) is! As a quick puzzle, can you show that all finite partial orders are in fact CPOs?

We can define a number of basic constructions on cpos. The most common is the “lifting” operation which takes a cpo D and returns D⊥, a cpo with a least element ⊥. A cpo with such a least element is called “pointed” and I’ll write that as cppo (complete pointed partial order). Another common example, given two cppos, D and E, we can construct D ⊗ E. An element of this cppo is either ⊥ or where l ∈ D - {⊥} and r ∈ E - {⊥}. This is called the smash product because it “smashes” the ⊥s out of the components. Similarly, there’s smash sums D ⊕ E.

The next question is the classic algebraic question to ask about a structure: what are the interesting functions on it? We’ll in particular be interested in functions which preserve the ⊑ relation and the taking of lub’s on chains. For this we have two more definitions:

A function is monotone if x ⊑ y implies f(x) ⊑ f(y)
A function is continuous if it is monotone and for all chains C, ⊔ f(P) = f(⊔ P).

Notably, the collection of cppos and continuous functions form a category! This is because clearly x ↦ x is continuous and the composition of two continuous functions is continuous. This category is called Cpo. It’s here that we’re going to do most of our interesting constructions.

Finally, we have to discuss one important construction on Cpo: D → E. This is the set of continuous functions from D to E. The ordering on this is pointwise, meaning that f ⊑ g if for all x ∈ D, f(x) ⊑ g(x). This is a cppo where ⊥ is x ↦ ⊥ and all the lubs are determined pointwise.

This gives us most of the mathematics we need to do the constructions we’re going to want, to demonstrate something cool here’s a fun theorem which turns out to be incredibly useful: Any continuous function f : D → D on a cppo D has a least fixed point.

To construct this least point we need to find an x so that x = f(x). To do this, note first that x ⊑ f(x) by definition and by the monotonicity of f: f(x) ⊑ f(y) if x ⊆ y. This means that the collection of elements fⁱ(⊥) forms a chain with the ith element being the ith iteration of f! Since D is a cppo, this chain has an upper bound: ⊔ fⁱ(⊥). Moreover, f(⊔ fⁱ(⊥)) = ⊔ f(fⁱ(⊥)) by the continuity of f, but ⊔ fⁱ(⊥) = ⊥ ⊔ (⊔ f(fⁱ(⊥))) = ⊔ f(fⁱ(⊥)) so this is a fixed point! The proof that it’s a least fixed point is elided because typesetting in markdown is a bit of a bother.

So there you have it, very, very basic domain theory. I can now answer the question we weren’t sure about before, the slogan is “computable functions are continuous functions”.

Solving Recursive Equations in `Cpo`

So now we can get to the result showing domain theory incredibly useful. Remember our problem before? We wanted to find a collection D so that

D ≅ D → D

However it wasn’t clear how to do this due to size issues. In Cpo however, we can absolutely solve this. This huge result was due to Dana Scott. First, we make a small transformation to the problem that’s very common in these scenarios. Instead of trying to solve this equation (something we don’t have very many tools for) we’re going to instead look for the fixpoint of this functor

F(X) = X → X

The idea here is that we’re going to prove that all well behaved endofunctors on Cpo have fixpoints. By using this viewpoint we get all the powerful tools we normally have for reasoning about functors in category theory. However, there’s a problem: the above isn’t a functor! It has both positive and negative occurrences of X so it’s neither a co nor contravariant functor. To handle this we apply another clever trick. Let’s not look at endofunctors, but rather functors Cpoᵒ × Cpo → Cpo (I believe this should be attributed to Freyd). This is a binary functor which is covariant in the second argument and contravariant in the first. We’ll use the first argument everywhere there’s a negative occurrence of X and the second for every positive occurrence. Take note: we need things to be contravariant in the first argument because we’re using that first argument negatively: if we didn’t do that we wouldn’t have a functor.

Now we have

F(X⁻, X⁺) = X⁻ → X⁺

This is functorial. We can also always recover the original map simply by diagonalizing: F(X) = F(X, X). We’ll now look for an object D so that F(D, D) ≅ D. Not quite a fixed point, but still equivalent to the equation we were looking at earlier.

Furthermore, we need one last critical property, we want F to be locally continuous. This means that the maps on morphisms determined by F should be continuous so F(⊔ P, g) = ⊔ F(P, g) and vice-versa (here P is a set of functions). Note that such morphisms have an ordering because they belong to the pointwise ordered cppo we talked about earlier.

We have one final thing to set up before this proof: what about if there’s multiple non-isomorphic solutions to F? We want a further coherence condition that’s going to provide us with 2 things

An ability to uniquely determine a solution
A powerful proof technique that isolates us from the particulars of the construction

What we want is called minimal invariance. Suppose we have a D and an i : D ≅ F(D, D). This is the minimal invariant solution if and only if the least fixed point of f(e) = i⁻ ∘ F(e, e) ∘ i is id. In other words, we want it to be the case that

d = ⊔ₓ fˣ(⊥)(d) (d ∈ D)

I mentally picture this as saying that the isomorphism is set up so that for any particular d we choose, if we apply i, fmap over it, apply i again, repeat and repeat, eventually this process will halt and we’ll run out of things to fmap over. It’s a sort of a statement that each d ∈ D is “finite” in a very, very handwavy sense. Don’t worry if that didn’t make much sense, it’s helpful to me but it’s just my intuition. This property has some interesting effects though: it means that if we find such a D then (D, D) is going to be both the initial algebra and final coalgebra of F.

Without further ado, let’s prove that every locally continuous functor F. We start by defining the following

D₀ = {⊥}
Dᵢ  = F(Dᵢ₋₁, Dᵢ₋₁)

This gives us a chain of cppos that gradually get larger. How do we show that they’re getting larger? By defining an section from Dᵢ to Dⱼ where j = i + 1. A section is a function f which is paired with a (unique) function f⁰ so that f⁰f = id and ff⁰ ⊑ id. In other words, f embeds its domain into the codomain and f⁰ tells us how to get it out. Putting something in and taking it out is a round trip. Since the codomain may be bigger though taking something out and putting it back only approximates a round trip. Our sections are defined thusly

s₀ = x ↦ ⊥         r₀ = x ↦ ⊥
sᵢ  = F(rᵢ₋₁, sᵢ₋₁)   rᵢ = F(rᵢ₋₁, sᵢ₋₁)

It would be very instructive to work out that these definitions are actually sections and retractions. Since type-setting this subscripts is a little rough, if it’s clear from context I’ll just write r and s. Now we’ve got this increasing chain, we define an interesting object

 D = {x ∈ Πᵢ Dᵢ | x.(i-1) = r(x.i)}

In other words, D is the collection of infinitely large pairs. Each component if from one of those Dᵢs above and they cohere with each other so using s and r to step up the chain takes you from one component to the next. Next we define a way to go from a single Dᵢ to a D: upᵢ : Dᵢ → D where

upᵢ(x).j =  x    if i = j
         | rᵈ(x) if i - j = d > 0
         | sᵈ(x) if j - i = d > 0

Interestingly, note that πᵢ ∘ upᵢ = id (easy proof) and that upᵢ ∘ πᵢ ⊑ id (slightly harder proof). This means that we’ve got more sections lying around: every Dᵢ can be fed into D. Consider the following diagram

    s      s      s
D0 ——> D1 ——> D2 ——> ...

I claim that D is the colimit to this diagram where the collection of arrows mapping into it are given with upᵢ. Seeing this is a colimit follows from the fact that πᵢ ∘ upᵢ is just id. Specifically, suppose we have some object C and a family of morphisms cᵢ : Dᵢ → C which commute properly with s. We need to find a unique morphism h so that cᵢ = h ∘ upᵢ. Define h as ⊔ᵢ cᵢπᵢ. Then

h ∘ upⱼ = (⊔ji cᵢrʲsʲ) = (⊔j


The last step follows from the fact that rʲsʲ = id. Furthermore, sʲrʲ ⊑ id so cᵢsʲrʲ ⊑ cᵢ so that whole massive term just evaluates to cᵢ as required. So we have a colimit. Notice that if we apply F to each Dᵢ in the diagram we end up with a new diagram.
    s      s      s
D1 ——> D2 ——> D3 ——> ...
D is still the colimit (all we’ve done is shift the diagram over by one) but by identical reasoning to D being a colimit, so is F(D, D). This means we have a unique isomorphism i : D ≅ F(D, D). The fact that i is the minimal invariant follows from the properties we get from the fact that i comes from a colimit.
With this construction we can construct our model of the lambda calculus simply by finding the minimal invariant of the locally continuous functor F(D⁻, D⁺) = D⁻ → D⁺ (it’s worth proving it’s locally continuous). Our denotation is defined as [e]ρ ∈ D where e is a lambda term and ρ is a map of the free variables of e to other elements of D. This is inductively defined as
[λx. e]ρ = i⁻(d ↦ [e]ρ[x ↦ d])
[e e']ρ = i([e]ρ)([e']ρ)
[x]ρ = ρ(x)
Notice here that for the two main constructions we just use i and i⁻ to fold and unfold the denotations to treat them as functions. We could go on to prove that this denotation is sound and complete but that’s something for another post.
Wrap Up
That’s the main result I wanted to demonstrate. With this single proof we can actually model a very large class of programming languages into Cpo. Hopefully I’ll get around to showing how we can pull a similar trick with a relational structure on Cpo in order to prove full abstraction. This is nicely explained in Andrew Pitt’s “Relational Properties of Domains”.
If you’re interested in domain theory I learned from Gunter’s “Semantics of Programming Languages” book and recommend it.

          
          
          comments powered by Disqus



Learn Type Theory
Danny Gratzer — Fri, 14 Aug 2015 00:00:00 UT

    Posted on August 14, 2015
    


    
    Tags: types
    


I’ve been trying to write a blog post to this effect for a while now, hopefully this one will stick. I intend for this to be a bit more open-ended than most of my other posts, if you’re interested in seeing the updated version look here. Pull requests/issues are more than welcome on the repository. I hope you learn something from this.
Lots of people seem curious about type theory but it’s not at all clear how to go from no math background to understanding “Homotopical Patch Theory” or whatever the latest cool paper is. In this repository I’ve gathered links to some of the resources I’ve personally found helpful.
Reading Advice
I strongly urge you to start by reading one or more of the textbooks immediately below. They give a nice self-contained introduction and a foundation for understanding the papers that follow. Don’t get hung up on any particular thing, it’s always easier to skim the first time and read closely on a second pass.
The Resources
Textbooks

Practical Foundations of Programming Languages (PFPL)
I reference this more than any other book. It’s a very wide ranging survey of programming languages that assumes very little background knowledge. A lot people prefer the next book I mention but I think PFPL does a better job explaining the foundations it works from and then covers more topics I find interesting.

Online copy
Dead-tree copy

Types and Programming Languages (TAPL)
Another very widely used introductory book (the one I learned with). It’s good to read in conjunction with PFPL as they emphasize things differently. Notably, this includes descriptions of type inference which PFPL lacks and TAPL lacks most of PFPL’s descriptions of concurrency/interesting imperative languages. Like PFPL this is very accessible and well written.

Online supplements
Dead-tree copy

Advanced Topics in Types and Programming Languages (ATTAPL)
Don’t feel the urge to read this all at once. It’s a bunch of fully independent but excellent chapters on a bunch of different topics. Read what looks interesting, save what doesn’t. It’s good to have in case you ever need to learn more about one of the subjects in a pinch.

Online supplements
Dead-tree copy


Proof Assistants
One of the fun parts of taking in an interest in type theory is that you get all sorts of fun new programming languages to play with. Some major proof assistants are

Coq
Coq is one of the more widely used proof assistants and has the best introductory material by far in my opinion.

Official site
Software Foundations
Certified Programming with Dependent Types
The paper on the calculus of constructions
The paper on the calculus of inductive constructions (What Coq is based on)

Agda
Agda is in many respects similar to Coq, but is a smaller language overall. It’s relatively easy to learn Agda after Coq so I recommend doing that. Agda has some really interesting advanced constructs like induction-recursion.

Official site
Tutorial
Records tutorial
Conor McBride’s fun Agda code

Idris
It might not be fair to put Idris in a list of “proof assistants” since it really wants to be a proper programming language. It’s one of the first serious attempts at writing a programming language with dependent types for actual programming though.

Official site
Quick tutorial
A list of talks on Idris
David Christiansen’s cool talk

Twelf
Twelf is by far the simplest system in this list, it’s the absolute minimum a language can have and still be dependently typed. All of this makes it easy to pick up, but there are very few users and not a lot of introductory material which makes it a bit harder to get started with. It does scale up to serious use though.

Official site
Wiki Tutorials
My tutorial
The paper on LF, the underlying system of Twelf


Type Theory

The Works of Per Martin-Löf
Per Martin-Löf has contributed a ton to the current state of dependent type theory. So much so that it’s impossible to escape his influence. His papers on Martin-Löf Type Theory (he called it Intuitionistic Type Theory) are seminal.
If you’re confused by the papers above read the book in the next entry and try again. The book doesn’t give you as good a feel for the various flavors of MLTT (which spun off into different areas of research) but is easier to follow.

1972
1979
1984

Programming In Martin-Löf’s Type Theory
It’s good to read the original papers and here things from the horses mouth, but Martin-Löf is much smarter than us and it’s nice to read other people explanations of his material. A group of people at Chalmer’s have elaborated it into a book.

Online link

The Works of John Reynold’s
John Reynold’s works are similarly impressive and always a pleasure to read.

Types, Abstraction and Parametric Polymorphism (Parametricity for System F)
A Logic For Shared Mutable State
Course notes on separation logic
Course notes on denotational semantics

Computational Type Theory
While most dependent type theories (like the ones found in Coq, Agda, Idris..) are based on Martin-Löf later intensional type theories, computational type theory is different. It’s a direct descendant of his extensional type theory that has been heavily developed and forms the basis of NuPRL nowadays. The resources below describe the various parts of how CTT works.

Type Theory and its Meaning Explanations
A Non-Type-Theoretic Definition of Martin-Löf’s Types
Constructing a type system over operational semantics (Similar to the above, they’re helpful to read together)
Equality in Lazy Computation System (of general interest)
Naive Computational Type Theory
Innovations in CTT using NuPRL

Homotopy Type Theory
A new exciting branch of type theory. This exploits the connection between homotopy theory and type theory by treating types as spaces. It’s the subject of a lot of active research but has some really nice introductory resources even now.

The HoTT book
Student’s Notes on HoTT


Proof Theory

Frank Pfenning’s Lecture Notes
Over the years, Frank Pfenning has accumulated lecture notes that are nothing short of heroic. They’re wonderful to read and almost as good as being in one of his lectures.

Introductory Course
Linear Logic
Modal Logic


Category Theory
Learning category theory is necessary to understand some parts of type theory. If you decide to study categorical semantics, realizability, or domain theory eventually you’ll have to buckledown and learn a little at least. It’s actually really cool math so no harm done!

Category Theory for Computer Scientists
This is the absolute smallest introduction to category theory you can find that’s still useful for a computer scientist. It’s very light on what it demands for prior knowledge of pure math but doesn’t go into too much depth.

Early version available online
Dead-tree version

Category Theory
One of the better introductory books to category theory in my opinion. It’s notable in assuming relatively little mathematical background and for covering quite a lot of ground in a readable way.

Dead-tree version

Ed Morehouse’s Category Theory Lecture Notes
Another valuable piece of reading are these lecture notes. They cover a lot of the same areas as “Category Theory” so they can help to reinforce what you learned there as well giving you some of the author’s perspective on how to think about these things.

Online copy


Other Goodness

Gunter’s “Semantics of Programming Language”
While I’m not as big a fan of some of the earlier chapters, the math presented in this book is absolutely top-notch and gives a good understanding of how some cool fields (like domain theory) work.

Dead-tree version

OPLSS
The Oregon Programming Languages Summer School is a 2 week long bootcamp on PLs held annually at the university of Oregon. It’s a wonderful event to attend but if you can’t make it they record all their lectures anyways! They’re taught be a variety of lecturers but they’re all world class researchers.

2012
2013
2014
2015



          
          
          comments powered by Disqus



Coinduction in JonPRL for Low Low Prices
Danny Gratzer — Fri, 17 Jul 2015 00:00:00 UT

    Posted on July 17, 2015
    


    
    Tags: jonprl
    


So as a follow up to my prior tutorial on JonPRL I wanted to demonstrate a nice example of JonPRL being used to prove something

Interesting
Unreasonably difficult in Agda or the like
I think I’m asking to be shown up when I say stuff like this…

I would like to implement the conatural numbers in JonPRL but without a notion of general coinductive or even inductive types. Just the natural numbers. The fun bit is that we’re basically going to lift the definition of a coinductively defined set straight out of set theory into JonPRL!
Math Stuff
First, let’s go through some math. How can we formalize the notion of an coinductively defined type as we’re used to in programming languages? Recall that something is coinductively if it contains all terms so that we can eliminate the term according to the elimination form for our type. For example, Martin-Lof has proposed we view functions (Π-types) as coinductively defined. That is,
x : A ⊢ f(x) : B(x)
————————————————————
 f : Π x : A. B(x)
In particular, there’s no assertion that f needs to be a lambda, just that f(x) is defined and belongs to the right type. This view of “if we can use it, it’s in the type” applies to more than just functions. Let’s suppose we have a type with the following elimination form
L : List  M : A  x : Nat, y : List : A
——————————————————————————————————————
      case(L; M; x.y.N) : A
This is more familiar to Haskellers as
case L of
  [] -> M
  x :: y -> N
Now if we look at the coinductively defined type built from this elimination rule we have not finite lists, but streams! There’s nothing in this elimination rule that specifies that the list be finite in length for it to terminate. All we need to be able to do is evaluate the term to either a :: of a Nat and a List or nil. This means that
fix x. cons(0; x) : List
Let’s now try to formalize this by describing what it means to build a coinductively type up as a set of terms. In particular the types we’re interested in here are algebraic ones, constructed from sums and products.
Now unfortunately I’m going to be a little handwavy. I’m going to act is if we’ve worked out a careful set theoretic semantics for this programming language (like the one that exists for MLTT). This means that All the equations you see here are across sets and that these sets contain programs so that ⊢ e : τ means that e ∈ τ where τ on the right is a set.
Well we start with some equation of the form
Φ = 1 + Φ
This particular equation a is actually how we would go about defining the natural numbers. If I write it in a more Haskellish notation we’d have
data Φ = Zero | Succ Φ
Next, we transform this into a function. This step is a deliberate move so we can start applying the myriad tools we know of for handling this equation.
Φ(X) = 1 + X
We now want to find some X so that Φ(X) = X. If we can do this, then I claim that X is a solution to the equation given above since
X = Φ(X)
X = 1 + X
precisely mirrors the equation we had above. Such an X is called a “fixed point” of the function Φ. However, there’s a catch: there may well be more than one fixed point of a function! Which one do we choose? The key is that we want the coinductively defined version. Coinduction means that we should always be able to examine a term in our type and its outermost form should be 1 + ???. Okay, let’s optimistically start by saying that X is ⊤ (the collection of all terms).
Ah okay, this isn’t right. This works only so long as we don’t make any observations about a term we claim is in this type. The minute we pattern match, we might have found we claimed a function was in our type! I have not yet managed to pay my rent by saying “OK, here’s the check… but don’t try to use it and it’s rent”. So perhaps we should try something else. Okay, so let’s not say ⊤, let’s say
X = ⊤ ⋂ Φ(⊤)
Now, if t ∈ X, we know that t ∈ 1 + ???. This means that if we run e ∈ X, we’ll get the correct outermost form. However, this code is still potentially broken:
    case e of
      Inl _ -> ...
      Inr x -> case e of
                 Inl _ -> ...
                 Inr _ -> ...
This starts off as being well typed, but as we evaluate, it may actually become ill typed. If we claimed that this was a fixed point to our language, our language would be type-unsafe. This is an unappealing quality in a type theory.
Okay, so that didn’t work. What if we fixed this code by doing
X = ⊤ ⋂ Φ(⊤) ⋂ Φ(Φ(⊤))
Now this fixes the above code, but can you imagine a snippet of code where this still gets stuck? So each time we intersect X with Φ(X) we get a new type which behaves like the real fixed point so long as we only observe n + 1 times where X behaves like the fixed point for n observations. Well, we can only make finitely many observations so let’s just iterate such an intersection
X = ⋂ₙ Φⁿ(⊤)
So if e ∈ X, then no matter how many times we pattern match and examine the recursive component of e we know that it’s still in ⋂ₙ Φⁿ(⊤) and therefore still in X! In fact, it’s easy to prove that this is the case with two lemmas

If X ⊆ Y then Φ(X) ⊆ Φ(Y)
If I have a collection S of sets, then ⋂ Φ(S) = Φ(⋂ S) where we define Φ on a collection of sets by applying Φ to each component.

These two properties state the monotonicity and cocontinuity of Φ. In fact, cocontinuity should imply monotonicity (can you see how?). We then may show that
Φ(⋂ₙ Φⁿ(⊤)) = ⋂ₙ Φ(Φⁿ(⊤))
             = ⊤ ⋂ (⋂ₙ Φ(Φⁿ(⊤)))
             = ⋂ₙ Φⁿ(⊤)
As desired.
The Code
Now that we have some idea of how to formalize coinduction, can we port this to JonPRL? Well, we have natural numbers and we can take the intersection of types… Seems like a start. Looking at that example, we first need to figure out what ⊤ corresponds to. It should include all programs, which sounds like the type base in JonPRL. However, it also should be the case that x = y ∈ ⊤ for all x and y. For that we need an interesting trick:
    Operator top : ().
    [top] =def= [isect(void; _.void)].
In prettier notation,
top ≙ ⋂ x : void. void
Now x ∈ top if x ∈ void for all _ ∈ void. Hey wait a minute… No such _ exists so the if is always satisfied vacuously. Ok, that’s good. Now x = y ∈ top if for all _ ∈ void, x = y ∈ void. Since no such _ exists again, all things are in fact equal in void. We can even prove this within JonPRL
    Theorem top-is-top :
      [isect(base; x.
       isect(base; y.
       =(x; y; top)))] {
      unfold ; auto
    }.
This proof is really just:

Unfold all the definitions.
Hey! There’s a x : void in my context! Tell me more about that.

Now the fact that x ∈ top is a trivial corollary since our theorem tells us that x = x ∈ top and the former is just sugar for the latter. With this defined, we can now write down a general operator for coinduction!
    Operator corec : (1).
    [corec(F)] =def= [isect(nat; n. natrec(n; top; _.x. so_apply(F;x)))].
To unpack this, corec takes one argument which binds one variable. We then intersect the type natrec(n; top; _.x.so_apply(F;x)) for all n ∈ nat. That natrec construct is really saying Fⁿ(⊤), it’s just a little obscured. Especially since we have to use so_apply, a sort of “meta-application” which lets us apply a term binding a variable to another term. This should look familiar, it’s just how we defined fixed point of a Φ!
For a fun demo, let’s define an F so that cofix(F) will give us the conatural numbers. I know that the natural numbers come from the least fixed point of X ↦ 1 + X (because I said so above, so it must be so) so let’s define that.
    Operator conatF : (0).
    [conatF(X)] =def= [+(unit; X)].
This is just that X ↦ 1 + X I wrote above in JonPRL land instead of math notation. Next we need to actually define conatural numbers using corec.
    Operator conat : ().
    [conat] =def= [corec(R. conatF(R))].
Now I’ve defined this, but that’s no fun unless we can actual build some terms so that member(X; conat). Specifically I want to prove two things to start

member(czero; conat)
fun(member(M; conat); _.member(csucc(M); conat))

This states that conat is closed under some zero and successor operations. Now what should those operations be? Remember what I said before, that we had this correspondence?
X    ↦   1   +   X
Nat     Zero   Suc X
Now remember that conat is isect(nat; n....) and when constructing a member of isect we’re not allowed to mention that n in it (as opposed to fun where we do exactly that). So that means czero has to be a member of top and sum(unit; ...). The top bit is easy, everything is in top! That diagram above suggests inl of something in unit
    Operator czero : ().
    [czero] =def= [inl(<>)].
So now we want to prove that this in fact in conat.
    Theorem zero-wf : [member(czero; conat)] {

    }.
Okay loading this into JonPRL gives
⊢ czero ∈ conat
From there we start by unfolding all the definitions
    {
       unfold 
    }
This gives us back the desugared goal
⊢ inl(<>) ∈ ⋂n ∈ nat. natrec(n; top; _.x.+(unit; x))
Next let’s apply all the obvious introductions so that we’re in a position to try to prove things
    unfold ; auto
This gives us back
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ natrec(n; top; _.x.+(unit; x))
Now we’re stuck. We want to show inl is in something, but we’re never going to be able to do that until we can reduce that natrec(n; top; _.x.+(unit; x)) to a canonical form. Since it’s stuck on n, let’s induct on that n. After that, let’s immediately reduce.
    {
      unfold ; auto; elim #1; reduce
    }
Now we have to cases, the base and inductive case.
1. [n] : nat
⊢ inl(<>) = inl(<>) ∈ top


1. [n] : nat
2. n' : nat
3. ih : inl(<>) = inl(<>) ∈ natrec(n'; top; _.x.+(unit; x))
⊢ inl(<>) = inl(<>) ∈ +(unit; natrec(n'; top; _.x.+(unit; x)))
Now that we have canonical terms on the right of the ∈m, let’s let auto handle the rest.
    Theorem zero-wf : [member(czero; conat)] {
      unfold ; auto; elim #1; reduce; auto
    }.
So now we have proven that czero is in the correct type. Let’s figure out csucc? Going by our noses, inl corresponded to czero and our diagram says that inr should correspond to csucc. This gives us
    Operator csucc : (0).
    [csucc(M)] =def= [inr(M)].
Now let’s try to prove the corresponding theorem for csucc
    Theorem succ-wf : [isect(conat; x. member(csucc(x); conat))] {
    }.
Now we’re going to start off this proof like we did with our last one. Unfold everything, apply the introduction rules, and induct on n.
    {
      unfold ; auto; elim #2; reduce
    }
Like before, we now have two subgoals:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
⊢ inr(x) = inr(x) ∈ ⋂_ ∈ void. void


1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr(x) = inr(x) ∈ +(unit; natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)))
The first one looks pretty easy, that’s just foo ∈ top, I think auto should handle that.
    {
      unfold ; auto; elim #2; reduce;
      auto
    }
This just leaves one goal to prove
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now, as it turns out, this is nice and easy: look at what our first assumption says! Since x ∈ isect(nat; n.Foo) and our goal is to show that x ∈ Foo(n') this should be as easy as another call to elim.
    {
      unfold ; auto; elim #2; reduce;
      auto; elim #1 [n']; auto
    }
Note that the [n'] bit there lets us supply the term we wish to substitute for n while eliminating. This leaves us here:
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ x = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
Now a small hiccup: we know that y = x is in the right type. so x = x in the right type. But how do we prove this? The answer is to substitute all occurrences of x for y. This is written
    {
      unfold ; auto; elim #2; reduce;
      auto; elim #1 [n']; auto;
      hyp-subst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
    }
There are three arguments here, a direction to substitute, an index telling us which hypothesis to use as the equality to substitute with and finally, a term [h. ...]. The idea with this term is that each occurrence of h tells us where we want to substitute. In our case we used h in two places: both where we use x, and the direction says to replace the right hand side of the equality with the left side of the equality.
Actually running this gives
1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ y = y ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))


1. [x] : ⋂n ∈ nat. natrec(n; ⋂_ ∈ void. void; _.x.+(unit; x))
2. [n] : nat
3. n' : nat
4. ih : inr(x) = inr(x) ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
5. y : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
6. z : y = x ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
7. h : natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ h = h ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x)) ∈ U{i}
The first goal is the result of our substitution and it’s trivial; auto will handle this now. The second goal is a little strange. It basically says that the result of our substitution is still a well-formed type. JonPRL’s thought process is something like this

You said you were substituting for things of this type here. However, I know that just because x : A doesn’t mean we’re using it in all those spots as if it has type A. What if you substitute things equal in top (always equal) for when they’re being used as functions! This would let us prove that zero ∈ Π(...) or something silly. Convince me that when we fill in those holes with something of the type you mentioned, the goal is still a type (in a universe).

However, these well-formedness goals usually go away with auto. This completes our theorem in fact.
    Theorem succ-wf : [isect(conat; x. member(csucc(x); conat))] {
      unfold ; auto; elim #2; reduce;
      auto; elim #1 [n']; auto;
      hyp-subst ← #6 [h.=(h; h; natrec(n'; isect(void; _.void); _.x.+(unit;x)))];
      auto
    }.
Okay so we now have something kind of number-ish, with zero and successor. But in order to demonstrate that this is the conatural numbers there’s one big piece missing.
The Clincher
The thing that distinguishes the conatural numbers from the inductive variety is the fact that we include infinite terms. In particular, I want to show that Ω (infinitely many csuccs) belongs in our type.
In order to say Ω in JonPRL we need recursion. Specifically, we want to write
    [omega] =def= [csucc(omega)].
But this doesn’t work! Instead, we’ll define the Y combinator and say
    Operator omega : ().
    [omega] =def= [Y(x.csucc(x))].
So what should this Y be? Well the standard definition of Y is
Y(F) = (λ x. F (x x)) (λ x. F (x x))
Excitingly, we can just say that in JonPRL; remember that we have a full untyped computation system after all!
    Operator Y : (1).
    [Y(f)] =def= [ap(lam(x.so_apply(f;ap(x;x)));lam(x.so_apply(f;ap(x;x))))].
This is more or less a direct translation, we occasionally use so_apply for the reasons I explained above. As a fun thing, try to prove that this is a fixed point, eg that
    isect(U{i}; A. isect(fun(A; _.A); f . ceq(Y(f); ap(f; Y(f)))))
The complete proof is in the JonPRL repo under example/computational-equality.jonprl. Anyways, we now want to prove
    Theorem omega-wf : [member(omega; conat)] {

    }.
Let’s start with the same prelude
    {
      *{unfold }; auto; elim #1;
    }
Two goals just like before
1. [n] : nat
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(zero; ⋂_ ∈ void. void; _.x.+(unit; x))


1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
The goals start to get fun now. I’ve also carefully avoided using reduce ike we did before. The reason is simple, if we reduce in the second goal, our ih will reduce as well and we’ll end up completely stuck in a few steps (try it and see). So instead we’re going to finesse it a bit.
First let’s take care of that first goal. We can tell JonPRL to apply some tactics to just the first goal with the focus tactic
    {
      *{unfold }; auto; elim #1;
      focus 0 #{reduce 1; auto};
    }
Here reduce 1 says “reduce by only one step” since really omega will diverge if we just let it run. This takes care of the first goal leaving just
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Here’s the proof sketch for what’s left

Reduce the goal by one step but carefully avoid touching the ih
Step everything by one
The result follows by intro and assumption

You can stop here or you can see how we actually do this. It’s somewhat tricky. The basic complication is that there’s no built-in tactic for 1. Instead we use a new type called ceq which is “computational equality”. It ranges between two terms, no types involved here. It’s designed to work thusly if ceq(a; b), either

a and b run to weak-head normal form (canonical verifications) with the same outermost form, and all the inner operands are ceq
a and b both diverge

So if ceq(a; b) then a and b “run the same”. What’s a really cool upshot of this is that if ceq(a; b) then if a = a ∈ A and b = b ∈ A then a = b ∈ A! ceq is the strictest equality in our system and we can rewrite with it absolutely everywhere without regards to types. Proving this requires showing the above definition forms a congruence (two things are related if their subcomponents are related).
This was a big deal because until Doug Howe came up with this proof NuPRL/CTT was awash with rules trying to specify this idea chunk by chunk and showing those rules were valid wasn’t trivial. Actually, you should read that paper: it’s 6 pages and the proof concept comes up a lot.
So, in order to do 1. we’re going to say “the goal and the goal if we step it twice are computationally equal” and then use this fact to substitute for the stepped version. The tactic to use here is called csubst. It takes two arguments

The ceq we’re asserting
Another h. term to guide the rewrite

    {
      *{unfold }; auto; elim #1;
      focus 0 #{reduce 1; auto};
      csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
                  inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
         [h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
    }
This leaves us with two goals
1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ ceq((λx. inr(x[x]))[λx. inr(x[x])]; inr((λx. inr(x[x]))[λx. inr(x[x])]))


1. [n] : nat
2. n' : nat
3. ih : (λx. inr(x[x]))[λx. inr(x[x])] = (λx. inr(x[x]))[λx. inr(x[x])] ∈ natrec(n'; ⋂_ ∈ void. void; _.x.+(unit; x))
⊢ inr((λx. inr(x[x]))[λx. inr(x[x])]) = inr((λx. inr(x[x]))[λx. inr(x[x])]) ∈ natrec(succ(n'); ⋂_ ∈ void. void; _.x.+(unit; x))
Now we have two goals. The first is that ceq proof obligation. The second is our goal post-substitution. The first one can easily be dispatched by step. step let’s us prove ceq by saying

ceq(a; b) holds if
a steps to a' in one step
ceq(a'; b)

This will leave us with ceq(X; X) which auto can handle. The second term is.. massive. But also simple. We just need to step it once and we suddenly have inr(X) = inr(X) ∈ sum(_; A) where X = X ∈ A is our assumption! So that can also be handled by auto as well. That means we need to run step on the first goal, reduce 1 on the second, and auto on both.
    Theorem omega-wf : [member(omega; conat)] {
      unfolds; unfold ; auto; elim #1;
      focus 0 #{reduce 1; auto};
      csubst [ceq(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))));
                  inr(ap(lam(x.inr(ap(x;x))); lam(x.inr(ap(x;x))))))]
             [h.=(h;h; natrec(succ(n'); isect(void; _. void); _.x.+(unit; x)))];
      [step, reduce 1]; auto
    }.
And we’ve just proved that omega ∈ conat, a term that is certainly the canonical (heh) example of coinduction in my mind.
Wrap Up
Whew, I actually meant for this to be a short blog post but that didn’t work out so well. Hopefully this illustrated a cool trick in computer science (intersect your way to coinduction) and in JonPRL.
Funnily enough before this was written no one had actually realized you could do coinduction in JonPRL. I’m still somewhat taken with the fact that a very minimal proof assistant like JonPRL is powerful enough to let you do this by giving you such general purpose tools as family intersection and a full computation system to work with. Okay that’s enough marketing from me.
Cheers.
Huge thanks to Jon Sterling for the idea on how to write this code and then touching up the results

          
          
          comments powered by Disqus



A Basic Tutorial on JonPRL
Danny Gratzer — Mon, 06 Jul 2015 00:00:00 UT

    Posted on July  6, 2015
    


    
    Tags: jonprl, types
    


JonPRL switched to ASCII syntax so I’ve updated this post accordingly
I was just over at OPLSS for the last two weeks. While there I finally met Jon Sterling in person. What was particularly fun is that for that last few months he’s been creating a proof assistant called JonPRL in the spirit of Nuprl. As it turns out, it’s quite a fun project to work on so I’ve implemented a few features in it over the last couple of days and learned more or less how it works.
Since there’s basically no documentation on it besides the readme and of course the compiler so I thought I’d write down some of the stuff I’ve learned. There’s also a completely separate post on the underlying type theory for Nuprl and JonPRL that’s very interesting in its own right but this isn’t it. Hopefully I’ll get around to scribbling something about that because it’s really quite clever.
Here’s the layout of this tutorial

First we start with a whirlwind tutorial. I’ll introduce the basic syntax and we’ll go through some simple proofs together
I’ll then dive into some of the rational behind JonPRL’s theory. This should help you understand why some things work how they do
I’ll show off a few of JonPRL’s more unique features and (hopefully) interest you enough to start fiddling on your own

Getting JonPRL
JonPRL is pretty easy to build and install and having it will make this post more enjoyable. You’ll need smlnj since JonPRL is currently written in SML. This is available in most package managers (including homebrew) otherwise just grab the binary from the website. After this the following commands should get you a working executable

git clone ssh://git@github.com/jonsterling/jonprl
cd jonprl
git submodule init
git submodule update
make (This is excitingly fast to run)
make test (If you’re doubtful)

You should now have an executable called jonprl in the bin folder. There’s no prelude for jonprl so that’s it. You can now just feed it files like any reasonable compiler and watch it spew (currently difficult-to-decipher) output at you.
If you’re interested in actually writing JonPRL code, you should probably install David Christiansen’s Emacs mode. Now that we’re up and running, let’s actually figure out how the language works
The Different Languages in JonPRL
JonPRL is composed of really 3 different sorts of mini-languages

The term language
The tactic language
The language of commands to the proof assistant

In Coq, these roughly correspond to Gallina, Ltac, and Vernacular respectively.
The Term Language
The term language is an untyped language that contains a number of constructs that should be familiar to people who have been exposed to dependent types before. The actual concrete syntax is composed of 3 basic forms:

We can apply an “operator” (I’ll clarify this in a moment) with op(arg1; arg2; arg3).
We have variables with x.
And we have abstraction with x.e. JonPRL has one construct for binding x.e built into its syntax, that things like lam or fun are built off of.

An operator in this context is really anything you can imagine having a node in an AST for a language. So something like lam is an operator, as is if or pair (corresponding to (,) in Haskell). Each operator has a piece of information associated with it, called its arity. This arity tells you how many arguments an operator takes and how many variables x.y.z. ... each is allowed to bind. For example, with lam has an arity is written (1) since it takes 1 argument which binds 1 variable. Application (ap) has the arity (0; 0). It takes 2 arguments neither of which bind a variable.
So as mentioned we have functions and application. This means we could write (lamx.x) y in JonPRL as ap(lam(x.x); y). The type of functions is written with fun. Remember that JonPRL’s language has a notion of dependence so the arity is (0; 1). The construct fun(A; x.B) corresponds to (x : A) → B in Agda or forall (x : A), B in Coq.
We also have dependent sums as well (prods). In Agda you would write (M , N) to introduce a pair and prod A lam x → B to type it. In JonPRL you have pair(M; N) and prod(A; x.B). To inspect a prod we have spread which let’s us eliminate a pair pair. Eg spread(0; 2) and you give it a prod in the first spot and x.y.e in the second. It’ll then replace x with the first component and y with the second. Can you think of how to write fst and snd with this?
There’s sums, so inl(M), inr(N) and +(A; B) corresponds to Left, Right, and Either in Haskell. For case analysis there’s decide which has the arity (0; 1; 1). You should read decide(M; x.N; y.P) as something like
    case E of
      Left x -> L
      Right y -> R
In addition we have unit and <> (pronounced axe for axiom usually). Neither of these takes any arguments so we write them just as I have above. They correspond to Haskell’s type- and value-level () respectively. Finally there’s void which is sometimes called false or empty in theorem prover land.
You’ll notice that I presented a bunch of types as if they were normal terms in this section. That’s because in this untyped computation system, types are literally just terms. There’s no typing relation to distinguish them yet so they just float around exactly as if they were lam or something! I call them types because I’m thinking of later when we have a typing relation built on top of this system but for now there are really just terms. It was still a little confusing for me to see fun(unit; _.unit) in a language without types, so I wanted to make this explicit.
Now we can introduce some more exotic terms. Later, we’re going to construct some rules around them that are going to make it behave that way we might expect but for now, they are just suggestively named constants.

U{i}, the ith level universe used to classify all types that can be built using types other than U{i} or higher. It’s closed under terms like fun and it contains all the types of smaller universes
=(0; 0; 0) this is equality between two terms at a type. It’s a proposition that’s going to precisely mirror what’s going on later in the type theory with the equality judgment
member(0; 0) this is just like = but internalizes membership in a type into the system. Remember that normally “This has that type” is a judgment but with this term we’re going to have a propositional counterpart to use in theorems.

In particular it’s important to distinguish the difference between ∈ the judgment and member the term. There’s nothing inherent in member above that makes it behave like a typing relation as you might expect. It’s on equal footing with flibbertyjibberty(0; 0; 0).
This term language contains the full untyped lambda calculus so we can write all sorts of fun programs like
    lam(f.ap(lam(x.ap(f;(ap(x;x)))); lam(x.ap(f;(ap(x;x)))))
which is just the Y combinator. In particular this means that there’s no reason that every term in this language should normalize to a value. There are plenty of terms in here that diverge and in principle, there’s nothing that rules out them doing even stranger things than that. We really only depend on them being deterministic, that e ⇒ v and e ⇒ v' implies that v = v'.
Tactics
The other big language in JonPRL is the language of tactics. Luckily, this is very familiarly territory if you’re a Coq user. Unluckily, if you’ve never heard of Coq’s tactic mechanism this will seem completely alien. As a quick high level idea for what tactics are:
When we’re proving something in a proof assistant we have to deal with a lot of boring mechanical details. For example, when proving A → B → A I have to describe that I want to introduce the A and the B into my context, then I have to suggest using that A the context as a solution to the goal. Bleh. All of that is pretty obvious so let’s just get the computer to do it! In fact, we can build up a DSL of composable “proof procedures” or /tactics/ to modify a particular goal we’re trying to prove so that we don’t have to think so much about the low level details of the proof being generated. In the end this DSL will generate a proof term (or derivation in JonPRL) and we’ll check that so we never have to trust the actual tactics to be sound.
In Coq this is used to great effect. In particular see Adam Chlipala’s book to see incredibly complex theorems with one-line proofs thanks to tactics.
In JonPRL the tactic system works by modifying a sequent of the form H ⊢ A (a goal). Each time we run a tactic we get back a list of new goals to prove until eventually we get to trivial goals which produce no new subgoals. This means that when trying to prove a theorem in the tactic language we never actually see the resulting evidence generated by our proof. We just see this list of H ⊢ As to prove and we do so with tactics.
The tactic system is quite simple, to start we have a number of basic tactics which are useful no matter what goal you’re attempting to prove

id a tactic which does nothing
t1; t2 this runs the t1 tactic and runs t2 on any resulting subgoals
*{t} this runs t as long as t does something to the goal. If t ever fails for whatever reason it merely stops running, it doesn’t fail itself
?{t} tries to run t once. If t fails nothing happens
!{t} runs t and if t does anything besides complete the proof it fails. This means that !{id} for example will always fail.
t1 | t2 runs t1 and if it fails it runs t2. Only one of the effects for t1 and t2 will be shown.
t; [t1, ..., tn] first runs t and then runs tactic ti on subgoal ith subgoal generated by t
trace "some words" will print some words to standard out. This is useful when trying to figure out why things haven’t gone your way.
fail is the opposite of id, it just fails. This is actually quite useful for forcing backtracking and one could probably implement a makeshift !{} as t; fail.

It’s helpful to see this as a sort of tree, a tactic takes one goal to a list of a subgoals to prove so we can imagine t as this part of a tree
      H
      |
———–————————— (t)
H'  H''  H'''
If we have some tactic t2 then t; t2 will run t and then run t2 on H, H', and H''. Instead we could have t; [t1, t2, t3] then we’ll run t and (assuming it succeeds) we’ll run t1 on H', t2 on H'', and t3 on H'''. This is actually how things work under the hood, composable fragments of trees :)
Now those give us a sort of bedrock for building up scripts of tactics. We also have a bunch of tactics that actually let us manipulate things we’re trying to prove. The 4 big ones to be aware of are

intro
elim #NUM
eq-cd
mem-cd

The basic idea is that intro modifies the A part of the goal. If we’re looking at a function, so something like H ⊢ fun(A; x.B), this will move that A into the context, leaving us with H, x : A ⊢ B.
If you’re familiar with sequent calculus intro runs the appropriate right rule for the goal. If you’re not familiar with sequent calculus intro looks at the outermost operator of the A and runs a rule that applies when that operator is to the right of a the ⊢.
Now one tricky case is what should intro do if you’re looking at a prod? Well now things get a bit dicey. We’d might expect to get two subgoals if we run intro on H ⊢ prod(A; x.B), one which proves H ⊢ A and one which proves H ⊢ B or something, but what about the fact that x.B depends on whatever the underlying realizer (that’s the program extracted from) the proof of H ⊢ A! Further, Nuprl and JonPRL are based around extract-style proof systems. These mean that a goal shouldn’t depend on the particular piece of evidence proving of another goal. So instead we have to tell intro up front what we want the evidence for H ⊢ A to be is so that the H ⊢ B section may use it.
To do this we just give intro an argument. For example say we’re proving that · ⊢ prod(unit; x.unit), we run intro [<>] which gives us two subgoals · ⊢ member(<>; unit) and · ⊢ unit. Here the [] let us denote the realizer we’re passing to intro. In general any term arguments to a tactic will be wrapped in []s. So the first goal says “OK, you said that this was your realizer for unit, but is it actually a realizer for unit?” and the second goal substitutes the given realizer into the second argument of prod, x.unit, and asks us to prove that. Notice how here we have to prove member(<>; unit)? This is where that weird member type comes in handy. It let’s us sort of play type checker and guide JonPRL through the process of type checking. This is actually very crucial since type checking in Nuprl and JonPRL is undecidable.
Now how do we actually go about proving member(<>; unit)? Well here mem-cd has got our back. This tactic transforms member(A; B) into the equivalent form =(A; A; B). In JonPRL and Nuprl, types are given meaning by how we interpret the equality of their members. In other words, if you give me a type you have to say

What canonical terms are in that terms
What it means for two canonical members to be equal

Long ago, Stuart Allen realized we could combine the two by specifying a partial equivalence relation for a type. In this case rather than having a separate notion of membership we check to see if something is equal to itself under the PER because when it is that PER behaves like a normal equivalence relation! So in JonPRL member is actually just a very thin layer of sugar around = which is really the core defining notion of typehood. To handle = we have eq-cd which does clever things to handle most of the obvious cases of equality.
Finally, we have elim. Just like intro let us simplify things on the right of the ⊢, elim let’s us eliminate something on the left. So we tell elim to “eliminate” the nth item in the context (they’re numbered when JonPRL prints them) with elim #n.
Just like with anything, it’s hard to learn all the tactics without experimenting (though a complete list can be found with jonprl --list-tactics). Let’s go look at the command language so we can actually prove some theorems.
Commands
So in JonPRL there are only 4 commands you can write at the top level

Operator
[oper] =def= [term] (A definition)
Tactic
Theorem

The first three of these let us customize and extend the basic suite of operators and tactics JonPRL comes with. The last actually lets us state and prove theorems.
The best way to see these things is by example so we’re going to build up a small development in JonPRL. We’re going to show that products are monoid with unit up to some logical equivalence. There are a lot of proofs involved here

prod(unit; A) entails A
prod(A; unit) entails A
A entails prod(unit; A)
A entails prod(A; unit)
prod(A; prod(B; C)) entails prod(prod(A; B); C)
prod(prod(A; B); C) entails prod(A; prod(B; C))

I intend to prove 1, 2, and 5. The remaining proofs are either very similar or fun puzzles to work on. We could also prove that all the appropriate entailments are inverses and then we could say that everything is up to isomorphism.
First we want a new snazzy operator to signify nondependent products since writing prod(A; x.B) is kind of annoying. We do this using operator
    Operator prod : (0; 0).
This line declares prod as a new operator which takes two arguments binding zero variables each. Now we really want JonPRL to know that prod is sugar for prod. To do this we use =def= which gives us a way to desugar a new operator into a mess of existing ones.
    [prod(A; B)] =def= [prod(A; _.B)].
Now we can change any occurrence of prod(A; B) for prod(A; _.B) as we’d like. Okay, so we want to prove that we have a monoid here. What’s the first step? Let’s verify that unit is a left identity for prod. This entails proving that for all types A, prod(unit; A) ⊃ A and A ⊃ prod(unit; A). Let’s prove these as separate theorems. Translating our first statement into JonPRL we want to prove
    fun(U{i}; A.
    fun(prod(unit; A); _.
    A))
In Agda notation this would be written
    (A : Set) → (_ : prod(unit; A)) → A
Let’s prove our first theorem, we start by writing
    Theorem left-id1 :
      [fun(U{i}; A.
       fun(prod(unit; A); _.
       A))] {
      id
    }.
This is the basic form of a theorem in JonPRL, a name, a term to prove, and a tactic script. Here we have id as a tactic script, which clearly doesn’t prove our goal. When we run JonPRL on this file (C-c C-l if you’re in Emacs) you get back
[XXX.jonprl:8.3-9.1]: tactic 'COMPLETE' failed with goal:
⊢ funA ∈ U{i}. (prod(unit; A)) => A

Remaining subgoals:
⊢ funA ∈ U{i}. (prod(unit; A)) => A
So focus on that Remaining subgoals bit, that’s what we have left to prove, it’s our current goal. Now you may notice that this outputted goal is a lot prettier than our syntax! That’s because currently in JonPRL the input and outputted terms may not match, the latter is subject to pretty printing. In general this is great because you can read your remaining goals, but it does mean copying and pasting is a bother. There’s nothing to the left of that ⊢ yet so let’s run the only applicable tactic we know. Delete that id and replace it with
    {
      intro
    }.
The goal now becomes
Remaining subgoals:
1. A : U{i}
⊢ (prod(unit; A)) => A

⊢ U{i} ∈ U{i'}
Two ⊢s means two subgoals now. One looks pretty obvious, U{i'} is just the universe above U{i} (so that’s like Set₁ in Agda) so it should be the case that U{i} ∈ U{i'} by definition! So the next tactic should be something like [???, mem-cd; eq-cd]. Now what should that ??? be? Well we can’t use elim because there’s one thing in the context now (A : U{i}), but it doesn’t help us really. Instead let’s run unfold . This is a new tactic that’s going to replace that prod with the definition that we wrote earlier.
    {
      intro; [unfold , mem-cd; eq-cd]
    }
Notice here that , behinds less tightly than ; which is useful for saying stuff like this. This gives us
Remaining subgoals:

1. A : U{i}
⊢ (unit × A) => A
We run intro again
    {
      intro; [unfold , mem-cd; eq-cd]; intro
    }
Now we are in a similar position to before with two subgoals.
    Remaining subgoals:

    1. A : U{i}
    2. _ : unit × A
    ⊢ A


    1. A : U{i}
    ⊢ unit × A ∈ U{i}
The first subgoal is really what we want to be proving so let’s put a pin in that momentarily. Let’s get rid of that second subgoal with a new helpful tactic called auto. It runs eq-cd, mem-cd and intro repeatedly and is built to take care of boring goals just like this!
    {
      intro; [unfold , mem-cd; eq-cd]; intro; [id, auto]
    }
Notice that we used what is a pretty common pattern in JonPRL, to work on one subgoal at a time we use []’s and ids everywhere except where we want to do work, in this case the second subgoal.
Now we have
Remaining subgoals:

1. A : U{i}
2. _ : unit × A
⊢ A
Cool! Having a pair of unit × A really ought to mean that we have an A so we can use elim to get access to it.
    {
      intro; [unfold , mem-cd; eq-cd]; intro; [id, auto];
      elim #2
    }
This gives us
Remaining subgoals:

1. A : U{i}
2. _ : unit × A
3. s : unit
4. t : A
⊢ A
We’ve really got the answer now, #4 is precisely our goal. For this situations there’s assumption which is just a tactic which succeeds if what we’re trying to prove is in our context already. This will complete our proof
    Theorem left-id1 :
      [fun(U{i}; A.
       fun(prod(unit; A); _.
       A))] {
      intro; [unfold , mem-cd; eq-cd]; intro; [id, auto];
      elim #2; assumption
    }.
Now we know that auto will run all of the tactics on the first line except unfold , so what we just unfold  first and run auto? It ought to do all the same stuff.. Indeed we can shorten our whole proof to unfold ; auto; elim #2; assumption. With this more heavily automated proof, proving our next theorem follows easily.
    Theorem right-id1 :
      [fun(U{i}; A.
       fun(prod(A; unit); _.
       A))] {
      unfold ; auto; elim #2; assumption
    }.
Next, we have to prove associativity to complete the development that prod is a monoid. The statement here is a bit more complex.
    Theorem assoc :
      [fun(U{i}; A.
       fun(U{i}; B.
       fun(U{i}; C.
       fun(prod(A; prod(B;C)); _.
       prod(prod(A;B); C)))))] {
      id
    }.
In Agda notation what I’ve written above is
    assoc : (A B C : Set) → A × (B × C) → (A × B) × C
    assoc = ?
Let’s kick things off with unfold ; auto to deal with all the boring stuff we had last time. In fact, since x appears in several nested places we’d have to run unfold quite a few times. Let’s just shorten all of those invocations into *{unfold }
    {
      *{unfold }; auto
    }
This leaves us with the state
Remaining subgoals:
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ A


1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ B


1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
⊢ C
In each of those goals we need to take apart the 4th hypothesis so let’s do that
    {
      *{unfold }; auto; elim #4
    }
This leaves us with 3 subgoals still
1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ A


1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ B


1. A : U{i}
2. B : U{i}
3. C : U{i}
4. _ : A × B × C
5. s : A
6. t : B × C
⊢ C
The first subgoal is pretty easy, assumption should handle that. In the other two we want to eliminate 6 and then we should be able to apply assumption. In order to deal with this we use | to encode that disjunction. In particular we want to run assumption OR elim #6; assumption leaving us with
    {
      *{unfold }; auto; elim #4; (assumption | elim #6; assumption)
    }
This completes the proof!
    Theorem assoc :
      [fun(U{i}; A.
       fun(U{i}; B.
       fun(U{i}; C.
       fun(prod(A; prod(B;C)); _.
       prod(prod(A;B); C)))))] {
      *{unfold }; auto; elim #4; (assumption | elim #6; assumption)
    }.
As a fun puzzle, what needs to change in this proof to prove we can associate the other way?
What on earth did we just do!?
So we just proved a theorem.. but what really just happened? I mean how did we go from “Here we have an untyped computation system which types just behaving as normal terms” to “Now apply auto and we’re done!”. In this section I’d like to briefly sketch the path from untyped computation to theorems.
The path looks like this

We start with our untyped language and its notion of computation
We already discussed this in great depth before.
We define a judgment a = b ∈ A.
This is a judgment, not a term in that language. It exists in whatever metalanguage we’re using. This judgment is defined across 3 terms in our untyped language (I’m only capitalizing A out of convention). This is supposed to represent that a and b are equal elements of type A. This also gives meaning to typehood: something is a type in CTT precisely when we know what the partial equivalence relation defined by - = - ∈ A on canonical values is.
Notice here that I said partial. It isn’t the case that a = b ∈ A presupposes that we know that a : A and b : A because we don’t have a notion of : yet!
In some sense this is where we depart from a type theory like Coq or Agda’s. We have programs already and on top of them we define this 3 part judgment which interacts which computation in a few ways I’m not specifying. In Coq, we would specify one notion of equality, generic over all types, and separately specify a typing relation.
From here we can define the normal judgments of Martin Lof’s type theory. For example, a : A is a = a ∈ A. We recover the judgment A type with A = A ∈ U (where U here is a universe).
This means that inhabiting a universe A = A ∈ U, isn’t necessarily inductively defined but rather negatively generated. We specify some condition a term must satisfy to occupy a universe.

Hypothetical judgments are introduced in the same way they would be in Martin-Lof’s presentations of type theory. The idea being that H ⊢ J if J is evident under the assumption that each term in H has the appropriate type and furthermore that J is functional (respects equality) with respect to what H contains. This isn’t really a higher order judgment, but it will be defined in terms of a higher order hypothetical judgment in the metatheory.
With this we have something that walks and quacks like normal type theory. Using the normal tools of our metatheory we can formulate proofs of a : A and do normal type theory things. This whole development is building up what is called “Computational Type Theory”. The way this diverges from Martin-Lof’s extensional type theory is subtle but it does directly descend from Martin-Lof’s famous 1979 paper “Constructive Mathematics and Computer Programming” (which you should read. Instead of my blog post).
Now there’s one final layer we have to consider, the PRL bit of JonPRL. We define a new judgment, H ⊢ A [ext a]. This is judgment is cleverly set up so two properties hold

H ⊢ A [ext a] should entail that H ⊢ a : A or H ⊢ a = a ∈ A
In H ⊢ A [ext a], a is an output and H and A are inputs. In particular, this implies that in any inference for this judgment, the subgoals may not use a in their H and A.

This means that a is completely determined by H and A which justifies my use of the term output. I mean this in the sense of Twelf and logic programming if that’s a more familiar phrasing. It’s this judgment that we see in JonPRL! Since that a is output we simply hide it, leaving us with H ⊢ A as we saw before. When we prove something with tactics in JonPRL we’re generating a derivation, a tree of inference rules which make H ⊢ A evident for our particular H and A! These rules aren’t really programs though, they don’t correspond one to one with proof terms we may run like they would in Coq. The computational interpretation of our program is bundled up in that a.
To see what I mean here we need a little bit more machinery. Specifically, let’s look at the rules for the equality around the proposition =(a; b; A). Remember that we have a term <> lying around,
     a = b ∈ A
————————————————————
<> = <> ∈ =(a; b; A)
So the only member of =(a; b; A) is <> if a = b ∈ A actually holds. First off, notice that <> : A and <> : B doesn’t imply that A = B! In another example, lam(x. x) ∈ fun(A; _.A) for all A! This is a natural consequence of separating our typing judgment from our programming language. Secondly, there’s not really any computation in the e of H ⊢ =(a; b; A) (e). After all, in the end the only thing e could be so that e : =(a; b; A) is <>! However, there is potentially quite a large derivation involved in showing =(a; b; A)! For example, we might have something like this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption
Now we write derivations of this sequent up side down, so the thing we want to show starts on top and we write each rule application and subgoal below it (AI people apparently like this?). Now this was quite a derivation, but if we fill in the missing [ext e] for this derivation from the bottom up we get this
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; B)
———————————————————————————————————————————————— Substitution [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(a; b; A)
———————————————————————————————————————————————— Symmetry     [ext <>]
x : =(A; B; U{i}); y : =(b; a; A) ⊢ =(b; a; A)
———————————————————————————————————————————————— Assumption   [ext x]
Notice how at the bottom there was some computational content (That x signifies that we’re accessing a variable in our context) but than we throw it away right on the next line! That’s because we find that no matter what the extract was that let’s us derive =(b; a; A), the only realizer it could possible generate is <>. Remember our conditions, if we can make evident the fact that b = a ∈ A then <> ∈ =(b; a; A). Because we somehow managed to prove that b = a ∈ A holds, we’re entitled to just use <> to realize our proof. This means that despite our somewhat tedious derivation and the bookkeeping that we had to do to generate that program, that program reflects none of it.
This is why type checking in JonPRL is woefully undecidable: in part, the realizers that we want to type check contain none of the helpful hints that proof terms in Coq would. This also means that extraction from JonPRL proofs is built right into the system and we can actually generate cool and useful things! In Nuprl-land, folks at Cornell actually write proofs and use this realizers to run real software. From what Bob Constable said at OPLSS they can actually get these programs to run fast (within 5x of naive C code).
So to recap, in JonPRL we

See H ⊢ A
Use tactics to generate a derivation of this judgment
Once this derivation is generated, we can extract the computational content as a program in our untyped system

In fact, we can see all of this happen if you call JonPRL from the command line or hit C-c C-c in emacs! On our earlier proof we see
Operator prod : (0; 0).
⸤prod(A; B)⸥ ≝ ⸤A × B⸥.

Theorem left-id1 : ⸤⊢ funA ∈ U{i}. (prod(unit; A)) => A⸥ {
  fun-intro(A.fun-intro(_.prod-elim(_; _.t.t); prod⁼(unit⁼; _.hyp⁼(A))); U⁼{i})
} ext {
  lam_. lam_. spread(_; _.t.t)
}.

Theorem right-id1 : ⸤⊢ funA ∈ U{i}. (prod(A; unit)) => A⸥ {
  fun-intro(A.fun-intro(_.prod-elim(_; s._.s); prod⁼(hyp⁼(A); _.unit⁼)); U⁼{i})
} ext {
  lam_. lam_. spread(_; s._.s)
}.

Theorem assoc : ⸤⊢ funA ∈ U{i}. funB ∈ U{i}. funC ∈ U{i}. (prod(A; prod(B; C))) => prod(prod(A; B); C)⸥ {
  fun-intro(A.fun-intro(B.fun-intro(C.fun-intro(_.independent-prod-intro(independent-prod-intro(prod-elim(_;
  s.t.prod-elim(t; _._.s)); prod-elim(_; _.t.prod-elim(t;
  s'._.s'))); prod-elim(_; _.t.prod-elim(t; _.t'.t')));
  prod⁼(hyp⁼(A); _.prod⁼(hyp⁼(B); _.hyp⁼(C)))); U⁼{i}); U⁼{i});
  U⁼{i})
} ext {
  lam_. lam_. lam_. lam_. ⟨⟨spread(_; s.t.spread(t; _._.s)), spread(_; _.t.spread(t; s'._.s'))⟩, spread(_; _.t.spread(t; _.t'.t'))⟩
}.
Now we can see that those Operator and ≝ bits are really what we typed with =def= and Operator in JonPRL, what’s interesting here are the theorems. There’s two bits, the derivation and the extract or realizer.
{
  derivation of the sequent · ⊢ A
} ext {
  the program in the untyped system extracted from our derivation
}
We can move that derivation into a different proof assistant and check it. This gives us all the information we need to check that JonPRL’s reasoning and helps us not trust all of JonPRL (I wrote some of it so I’d be a little scared to trust it :). We can also see the computational bit of our proof in the extract. For example, the computation involved in taking A × unit → A is just lam_. lam_. spread(_; s._.s)! This is probably closer to what you’ve seen in Coq or Idris, even though I’d say the derivation is probably more similar in spirit (just ugly and beta normal). That’s because the extract need not have any notion of typing or proof, it’s just the computation needed to produce a witness of the appropriate type. This means for a really tricky proof of equality, your extract might just be <>! Your derivation however will always exactly reflect the complexity of your proof.
Killer features
OK, so I’ve just dumped about 50 years worth of hard research in type theory into your lap which is best left to ruminate for a bit. However, before I finish up this post I wanted to do a little bit of marketing so that you can see why one might be interested in JonPRL (or Nuprl). Since we’ve embraced this idea of programs first and types as PERs, we can define some really strange types completely seamlessly. For example, in JonPRL there’s a type ⋂(A; x.B), it behaves a lot like fun but with one big difference, the definition of - = - ∈ ⋂(A; x.B) looks like this
a : A ⊢ e = e' ∈ [a/x]B
————————————————————————
   e = e' ∈ ⋂(A; x.B)
Notice here that e and e' may not use a anywhere in their bodies. That is, they have to be in [a/x]B without knowing anything about a and without even having access to it.
This is a pretty alien concept that turned out to be new in logic as well (it’s called “uniform quantification” I believe). It turns out to be very useful in PRL’s because it lets us declare things in our theorems without having them propogate into our witness. For example, we could have said
    Theorem right-id1 :
          [⋂(U{i}; A.
           fun(prod(A; unit); _.
           A))] {
          unfold ; auto; elim #2; assumption
        }.
With the observation that our realizer doesn’t need to depend on A at all (remember, no types!). Then the extract of this theorem is
lamx. spread(x; s._.s)
There’s no spurious lam _. ... at the beginning! Even more wackily, we can define subsets of an existing type since realizers need not have unique types
e = e' ∈ A  [e/x]P  [e'/x]P
————————————————————————————
  e = e' ∈ subset(A; x.P)
And in JonPRL we can now say things like “all odd numbers” by just saying subset(nat; n. ap(odd; n)). In intensional type theories, these types are hard to deal with and still the subject of open research. In CTT they just kinda fall out because of how we thought about types in the first place. Quotients are a similarly natural conception (just define a new type with a stricter PER) but JonPRL currently lacks them (though they shouldn’t be hard to add..).
Finally, if you’re looking for one last reason to dig into **PRL, the fact that we’ve defined all our equalities extensionally means that several very useful facts just fall right out of our theory
    Theorem fun-ext :
      [⋂(U{i}; A.
       ⋂(fun(A; _.U{i}); B.
       ⋂(fun(A; a.ap(B;a)); f.
       ⋂(fun(A; a.ap(B;a)); g.

       ⋂(fun(A; a.=(ap(f; a); ap(g; a); ap(B; a))); _.
       =(f; g; fun(A; a.ap(B;a))))))))] {
      auto; ext; ?{elim #5 [a]}; auto
    }.
This means that two functions are equal in JonPRL if and only if they map equal arguments to equal output. This is quite pleasant for formalizing mathematics for example.
Wrap Up
Whew, we went through a lot! I didn’t intend for this to be a full tour of JonPRL, just a taste of how things sort of hang together and maybe enough to get you looking through the examples. Speaking of which, JonPRL comes with quite a few examples which are going to make a lot more sense now.
Additionally, you may be interested in the documentation in the README which covers most of the primitive operators in JonPRL. As for an exhaustive list of tactics, well….
Hopefully I’ll be writing about JonPRL again soon. Until then, I hope you’ve learned something cool :)
A huge thanks to David Christiansen and Jon Sterling for tons of helpful feedback on this

          
          
          comments powered by Disqus



Proving Cut Admissibility in Twelf
Danny Gratzer — Mon, 29 Jun 2015 00:00:00 UT

    Posted on June 29, 2015
    


    
    Tags: twelf, types
    


Veering wildly onto the theory side compared to my last post, I’d like to look at some more Twelf code today. Specifically, I’d like to prove a fun theorem called cut admissibility (or elimination) for a particular logic: a simple intuitionistic propositional sequent calculus. I chucked the code for this over here.
Background
If those words didn’t make any sense, here’ an incomplete primer on what we’re doing here. First of all we’re working with a flavor of logic called “sequent calculus”. Sequent calculus describes a class of logics characterized by using studying “sequents”, a sequent is just an expression Γ ⊢ A saying “A is true under the assumption that the set of propositions, Γ, are true”. A sequent calculus defines a couple things

What exactly A is, a calculus defines what propositions it talks about
For us we’re only interested in a few basic connectives, so our calculus can talk about true, false, A and B (A ∧ B), A or B (A ∨ B), and A implies B (A ⇒ B).
Rules for inferring Γ ⊢ A holds. We can use these inference rules to build up proofs of things in our system.
In sequent calculus there are two sorts of inference rules, left and right. A left rule takes a fact that we know and let’s us reason backwards to other things we must know hold. A right rule let’s us take the thing we’re trying to prove and instead prove smaller, simpler things.
More rules will follow in the Twelf code but for a nice example consider the left and right rules for ∧,
  Γ, A, B ⊢ C
 ———————————————
  Γ, A ∧ B ⊢ C

 Γ ⊢ A    Γ ⊢ B
 ———————————————
    Γ ⊢ A ∧ B
The left rule says if we know that A ∧ B is true, we can take it apart and try to prove our goal with assumptions that A and B are true. The right rule says to prove that A ∧ B is true we need to prove A is true and B is true. A proof in this system is a true of these rules just like you’d expect in a type theory or natural deduction.

We also tacitly assume a bunch of boring rules called structural rules about our sequents hold, so that we can freely duplicate, drop and swap assumptions in Γ. For a less awful introduction to sequent calculus Frank Pfenning has some good notes.
Now we want to prove a particular (meta)theorem about sequent calculus

if Γ ⊢ A
and Γ, A ⊢ B
then Γ ⊢ B

This theorem means a couple different things for example, our system is consistent and our system also admits lemmas. As it turns out, proving this theorem is hard. The basic complication is that we don’t know what form either of the first two proofs.
The Twelf Stuff
We now formalize our sequent calculus in Twelf. First we declare a type and some constants to represent propositions.
    prop  : type.

    =>    : prop -> prop -> prop. %infix right 4 =>.
    true  : prop.
    false : prop.
    /\    : prop -> prop -> prop. %infix none 5 /\.
    \/    : prop -> prop -> prop. %infix none 5 \/.
Notice here that we use infix to let us write A /\ B => C. Having specified these we now define what a proof is in this system. This is structured a little differently than you’d be led to believe from the above. We have an explicit type proof which is inhabited by “proof terms” which serve as a nice shortcut to those trees generated by inference rules. Finally, we don’t explicitly represent Γ, instead we have this thing called hyp which is used to represent a hypothesis in Γ. Left rules manipulate use these hypotheses and introduce new ones. Pay attention to /‌\/l and /‌\/r since you’ve seen the handwritten equivalents.
    proof   : type.
    hyp     : type.

    init    : hyp -> proof.

    =>/r    : (hyp -> proof) -> proof.
    =>/l    : (hyp -> proof) -> proof -> hyp -> proof.

    true/r  : proof.

    false/l : hyp -> proof.

    /\/r    : proof -> proof -> proof.
    /\/l    : (hyp -> hyp -> proof) -> hyp -> proof.

    \//r1   : proof -> proof.
    \//r2   : proof -> proof.
    \//l    : (hyp -> proof) -> (hyp -> proof) -> hyp -> proof.
The right rules are at least a little intuitive, the left rules are peculiar. Essentially we have a weird CPS-y feel going on here, to decompose a hypothesis you hand the hyp to the constant along with a continuation which takes the hypotheses you should get out of the decomposition. For example for ‌\/ we have to right rules (think Left and Right), then one left rule which takes two continuations and one hyp (think either). Finally, that init thing is the only way to actually take a hypothesis and use it as a proof.
We now want to unite these two pieces of syntax with a typing judgment letting us say that a proof proves some particular prop.
    of         : proof -> prop -> type.
    hof        : hyp   -> prop -> type.

    of/init    : of (init H) A
                  <- hof H A.

    of/=>/r    : of (=>/r F) (A => B)
                  <- ({h} hof h A -> of (F h) B).
    of/=>/l    : of (=>/l C Arg F) U
                  <- hof F (A => B)
                  <- of Arg A
                  <- ({h} hof h B -> of (C h) U).

    of/true/r  : of true/r true.

    of/false/l : of (false/l H) A
                  <- hof H false.

    of//\/r    : of (/\/r R L) (A /\ B)
                  <- of L A
                  <- of R B.
    of//\/l    : of (/\/l C H) U
                  <- hof H (A /\ B)
                  <- ({h}{h'} hof h A -> hof h' B -> of (C h h') U).

    of/\//r1   : of (\//r1 L) (A \/ B)
                  <- of L A.
    of/\//r2   : of (\//r2 R) (A \/ B)
                  <- of R B.
    of/\//l    : of (\//l R L H) C
                  <- hof H (A \/ B)
                  <- ({h} hof h A -> of (L h) C)
                  <- ({h} hof h B -> of (R h) C).
In order to handle hypotheses we have this hof judgment which handles typing various hyps. We introduce it just like we introduce hyps in those continuation-y things for left rules. Sorry for dumping so much code on you all at once: it’s just a lot of machinery we need to get working in order to actually start formalizing cut.
I would like to point out a few things about this formulation of sequent calculus though. First off, it’s very Twelf-y, we use the LF context to host the context of our logic using HOAS. We also basically just have void as the type of hypothesis! Look, there’s no way to construct a hypothesis, let alone a typing derivation hof! The idea is that we’ll just wave our hands at Twelf and say "consider our theorem in a context with hyps and hofs with
    %block someHofs : some {A : prop} block {h : hyp}{deriv : hof h A}.
In short, Twelf is nifty.
The Theorem
Now we’re almost in a position to state cut admissibility, we want to say something like
    cut : of Lemma A
            -> ({h} hyp h A -> of (F h) B)
            -> of ??? B
But what should that ??? be? We could just say “screw it it’s something” and leave it as an output of this lemma but experimentally (an hour of teeth gnashing later) it’s absolutely not worth the pain. Instead let’s do something clever.
Let’s first define an untyped version of cut which works just across proofs without mind to typing derivations. We can’t declare this total because it’s just not going to work for ill-typed things, we can give it a mode though (it’s not needed) just as mechanical documentation.
    cut : proof -> (hyp -> proof) -> proof -> type.
    %mode cut +A +B -C.
The goal here is we’re going to state our main theorem as
    of/cut : {A} of P A
              -> ({h} hof h A -> of (F h) B)
              -> cut P F C
              -> of C B
              -> type.
    %mode of/cut +A +B +C -D -E.
Leaving that derivation of cut as an output. This let’s us produce not just a random term but instead a proof that that term makes “sense” somehow along with a proof that it’s well typed.
cut is going to mirror the structure of of/cut so we now need to figure out how we’re going to structure our proof. It turns out a rather nice way to do this is to organize our cuts into 4 categories. The first one are “principle” cuts, they’re the ones where we have a right rule as our lemma and we immediately decompose that lemma in the other term with the corresponding left rule. This is sort of the case that we drive towards everywhere and it’s where the substitution bit happens.
First we have some simple cases
    trivial : cut P' ([x] P) P.
    p/init1 : cut (init H) ([x] P x) (P H).
    p/init2 : cut P ([x] init x) P.
In trivial we don’t use the hypothesis at all so we’re just “strengthening” here. p/init1 and p/init2 deal with the init rule on the left or right side of the cut, if it’s on the left we have a hypothesis of the appropriate type so we just apply the function. If it’s on the left we have a proof of the appropriate type so we just return that. In the more interesting cases we have the principle cuts for some specific connectives.
    p/=>   : cut (=>/r F) ([x] =>/l ([y] C y x) (Arg x) x) Out'
              <- ({y} cut (=>/r F) (C y) (C' y))
              <- cut (=>/r F) Arg Arg'
              <- cut Arg' F Out
              <- cut Out C' Out'.
    p//\   : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
              <- ({x}{y} cut (/\/r R L) (C x y) (C' x y))
              <- ({x} cut R (C' x) (Out x))
              <- cut L Out Out'.
    p/\//1 : cut (\//r1 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
              <- ({x} cut (\//r1 L) (C1 x) (C1' x))
              <- cut L C1' Out.
    p/\//2 : cut (\//r2 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
              <- ({x} cut (\//r2 L) (C2 x) (C2' x))
              <- cut L C2' Out.
Let’s take a closer look at p/=>, the principle cut for =>. First off, our inputs are =>/r F and ([x] =>/l ([y] C y x) (Arg x) x). The first one is just a “function” that we’re supposed to substitute into the second. Now the second is comprised of a continuation and an argument. Notice that both of these depend on x! In order to handle this the first two lines of the proof
           <- ({y} cut (=>/r F) (C y) (C' y))
           <- cut (=>/r F) Arg Arg'
Are to remove that dependence. We get back a C' and an Arg' which doesn’t use the hyp (x). In order to do this we just recurse and cut the =>/r F out of them. Notice that both the type and the thing we’re substituting are the same size, what decreases here is what we’re substituting into. Now we’re ready to actually do some work. First we need to get a term representing the application of F to Arg'. This is done with cut since it’s just substitution
              <- cut Arg' F Out
But this isn’t enough, we don’t need the result of the application, we need the result of the continuation! So we have to cut the output of the application through the continuation
              <- cut Out C' Out'.
This code is kinda complicated. The typed version of this took me an hour since after 2am I am charitably called an idiot. However this same general pattern holds with all the principle cuts

The subterms of the target could depend on what we’re substituting for, so get rid of that dependence with a recursive call.
Get the “result”, which is just the term corresponding to the hypothesis the continuation is expecting. This is pretty trivial in all cases except => since it’s just lying about in an input.
Get the actual result by cutting 2. through the continuation.

Try to work through the case for /\ now
    p//\   : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
              <- ({x}{y} cut (/\/r R L) (C x y) (C' x y))
              <- ({x} cut R (C' x) (Out x))
              <- cut L Out Out'.
After principle cuts we really just have a number of boring cases whose job it is to recurse. The first of these is called rightist substitution because it comes up if the term on the right (the part using the lemma) has a right rule first. This means we have to hunt in the subterms to go find where we’re actually using the lemma.
    r/=> : cut P ([x] (=>/r ([y] F y x))) (=>/r ([y] F' y))
            <- ({x} cut P (F x) (F' x)).
    r/true : cut P ([x] true/r) true/r.
    r//\ : cut P ([x] /\/r (R x) (L x)) (/\/r R' L')
            <- cut P L L'
            <- cut P R R'.
    r/\/1 : cut P ([x] \//r1 (L x)) (\//r1 L')
             <- cut P L L'.
    r/\/2 : cut P ([x] \//r2 (R x)) (\//r2 R')
             <- cut P R R'.
Nothing here should be surprising keeping in mind that all we’re doing here is recursing. The next set of cuts is called leftist substitution. Here we are actually recursing on the term we’re trying to substitute.
    l/=>    : cut (=>/l ([y] C y) Arg H) ([x] P x) (=>/l ([x] C' x) Arg H)
               <- ({x} cut (C x) P (C' x)).
    l/false : cut (false/l H) P (false/l H).
    l//\    : cut (/\/l ([x][y] C x y) H) P (/\/l ([x][y] C' x y) H)
               <- ({x}{y} cut (C x y) P (C' x y)).
    l/\/    : cut (\//l ([y] R  y) ([y] L  y) H) ([x] P x)
                  (\//l ([y] R' y) ([y] L' y) H)
               <- ({x} cut (L x) P (L' x))
               <- ({x} cut (R x) P (R' x)).
It’s the same game but just a different target, we’re now recursing on the continuation because that’s where we somehow created a proof of A. This means that on l/=> we’re substation left term which has three parts

A continuation, hyp B to proof of C
An argument of type A
A hypothesis of type A -> B

Now we’re only interesting in how we created that proof of C, that’s the only relevant part of this substitution. The output of this case is going to have that left rule, =>/l ??? Arg H so we have where ??? is a replacement of C that we get by cutting C through P “pointwise”. This comes through on the recursive call
               <- ({x} cut (C x) P (C' x)).
For one more case, consider the left rule for \/
    l/\/    : cut (\//l R L H) P
We start by trying to cut a left rule into P so we need to produce a left rule in the output with different continuations, something like
               (\//l R' L' H)
Now what should R' and L' be? In order to produce them we’ll throw up a pi so we can get L x, a proof with the appropriate type to cut again. With that, we can recurse and get back the new continuation we want.
               <- ({x} cut (L x) P (L' x))
               <- ({x} cut (R x) P (R' x)).
There’s one last class of cuts to worry about, think about the cases we’ve covered so far

The left term is a left rule (leftist substitution)
The right term is a right rule (rightist substitution)
The terms match up and we substitute

So what happens if we have a left rule on the right and a right rule on the left, but they don’t “match up”. By this I mean that the left rule in that right term works on a different hypothesis than the one that the function it’s wrapped in provides. In this case we just have to recurse some more
    lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
             <- cut P Arg Arg'
             <- ({y} cut P (C y) (C' y)).
    lr//\ : cut P ([x] /\/l ([y][z] C y z x) H) (/\/l C' H)
             <- ({y}{z} cut P (C y z) (C' y z)).
    lr/\/ : cut P ([x] \//l ([y] R y x) ([y] L y x) H) (\//l R' L' H)
             <- ({y} cut P (L y) (L' y))
             <- ({y} cut P (R y) (R' y)).
When we have such an occurrence we just do like we did with right rules.
Okay, now that we’ve handled all of these cases we’re ready to type the damn thing.
    of/cut : {A} of P A
              -> ({h} hof h A -> of (F h) B)
              -> cut P F C
              -> of C B
              -> type.
    %mode of/cut +A +B +C -D -E.
Honestly this is less exciting than you’d think. We’ve really done all the creative work in constructing the cut type family. All that’s left to do is check that this is correct. As an example, here’s a case that exemplifies how we verify all left-right commutative cuts.
    - : of/cut _ P ([x][h] of/=>/l ([y][yh] C y yh x h) (A x h) H)
         (lr/=> CC CA) (of/=>/l C' A' H)
         <- of/cut _ P A CA A'
         <- ({y}{yh} of/cut _ P (C y yh) (CC y) (C' y yh)).
We start by trying to show that
    lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
             <- cut P Arg Arg'
             <- ({y} cut P (C y) (C' y)).
Is type correct. To do this we have a derivation P that the left term is well-typed. Notice that I’ve overloaded P here, in the rule lr/=> P was a term and now it’s a typing derivation for that term. Next we have a typing derivation for [x] =>/l ([y] C y x) (Arg x) H. This is a function which takes two arguments. x is a hypothesis, the same as in lr/=>, however now we have h which is a hof derivation that h has a type. There’s only one way to type a usage of the left rule for =>, with of/=>/l so we have that next.
Finally, our output is on the next line in two parts. First we have a derivation for cut showing how to construct the “cut out term” in this case. Next we have a new typing derivation that again uses of/=>/l. Notice that both of these depend on some terms we get from the recursive calls here.
Since we’ve gone through all the cases already and done all the thinking, I’m not going to reproduce it all here. The intuition for how cut works is really best given by the untyped version with the understand that we check that it’s correct with this theorem as we did above.
Wrap Up
To recap here’s what we did

Define sequent calculus’ syntax
Define typing rules across it
Define how an untyped version of cut works across it
Validate the correctness of cut by

Hope that was a little fun, cheers!

          
          
          comments powered by Disqus



Examining Hackage: pipes
Danny Gratzer — Mon, 01 Jun 2015 00:00:00 UT

    Posted on June  1, 2015
    


    
    Tags: haskell
    


It’s been a while since I did one of these “read a package and write about it” posts. Part of this is that it turns out that most software is awful and writing about code I read just makes me grumpy. However I found something nice to write about! In this post I’d like to close a somewhat embarrassing gap in my knowledge: we’re going to walk through streaming library.
I know that both lists and lazy-IO are kind of.. let’s say fragile but have neglected learning one of these fancy libraries that aim to solve those problems. Today we’ll be looking at one of those libraries, pipes!
Pipes provides one core type Proxy and a few operations on it, like await and yield. We can pair together a pipeline of operations which can send data to their neighbors and request more data from them as they need them. With these coroutine like structures we can nicely implement efficient, streaming computations.
Getting The Code
As always this starts by getting our hands on the code with the
~ $ cabal get pipes
~ $ cd pipes-4.1.5/
Now from here we can query all the available files to see what we’re up against
~/pipes-4.1.5 $ wc -l **/*.hs | sort -nr
  4796 total
  1513 src/Pipes/Tutorial.hs
   854 src/Pipes/Core.hs
   836 src/Pipes/Prelude.hs
   517 src/Pipes.hs
   380 src/Pipes/Lift.hs
   272 tests/Main.hs
   269 src/Pipes/Internal.hs
    85 benchmarks/PreludeBench.hs
    68 benchmarks/LiftBench.hs
     2 Setup.hs
So the first thing I notice is that there’s this great honking module called Pipes.Tutorial which houses a brief introduction to the pipes package. I skimmed this before starting but it doesn’t really seem to explain the implementation details.. If you don’t really know what pipes is, read this tutorial now. After doing so you have exactly my knowledge of pipes!
The next interesting module here is Pipes.Internal. I’ve found that .Internal modules seem to house the fundamental bits of the package so we’ll start there.
Pipes.Internal
This module starts with an emphatic warning
    {-| This is an internal module, meaning that it is unsafe to
        import unless you understand the risks. -}
So this seems like a perfect place to start without really understanding this library :D It exports a few different functions and one type:
    module Pipes.Internal (
        -- * Internal
          Proxy(..)
        , unsafeHoist
        , observe
        , X
        , closed
        ) where
I recognize one of those types: Proxy as the central type behind the whole pipes concept, it is the type of component in the pipe line. Let’s look at how it’s actually implemented
    data Proxy a' a b' b m r
        = Request a' (a  -> Proxy a' a b' b m r )
        | Respond b  (b' -> Proxy a' a b' b m r )
        | M          (m    (Proxy a' a b' b m r))
        | Pure    r
So two of those constructors, M and Pure, look pretty vanilla. The first one let’s us lift an action in the underlying monad m, into Proxy. It’s a little bit weird instead of having M (m r) we instead have M (m (Proxy ...)) however this doesn’t seem like a big deal because we have Pure to promote an r to a Proxy .... r. So we can lift some m r to Proxy a' a b' b m r with M . fmap Pure. It’s still not clear to me why this is a benefit though.
The first two constructors are really cool though, Request and Respond. The first thing that pops into my head is that this looks like a sort of free-monad pattern. Look how we’ve got

The piece of data a user should input to an action (and Request and Respond are definitely actions)
This continuation for a second argument which takes a term of the type returned by the action to another piece of pipe

This would make a lot of sense, free monad transformers nicely give rise to coroutines which are very much in line with pipes. Because of this free monad like shape, I expect that the monad instance will be like free monads and behave “like substitution”. We should chase down the leaves of this Proxy (including under lambdas) and replace each Pure r with f r for >>= and Pure (f a) for fmap.
To check if we’re right, we go down one line
    instance Monad m => Functor (Proxy a' a b' b m) where
        fmap f p0 = go p0 where
            go p = case p of
                Request a' fa  -> Request a' (\a  -> go (fa  a ))
                Respond b  fb' -> Respond b  (\b' -> go (fb' b'))
                M          m   -> M (m >>= \p' -> return (go p'))
                Pure    r      -> Pure (f r)
This looks like what I had in mind, we run down p and in the first 3 branches we recurse. I’ll admit it looks a little intimidating but after staring at it for a bit I realized that the first 3 lines are all just variations on fmap go! Indeed, we can rewrite this to
            go p = case p of
                Request a' fa  -> Request a' (fmap go fa)
                Respond b  fb' -> Respond b  (fmap go fb')
                M          m   -> M (fmap go m)
                Pure    r      -> Pure (f r)
This makes the idea a bit clearer in my mind. Let’s look at the applicative instance next!
    instance Monad m => Applicative (Proxy a' a b' b m) where
        pure      = Pure
        pf <*> px = go pf where
            go p = case p of
                Request a' fa  -> Request a' (\a  -> go (fa  a ))
                Respond b  fb' -> Respond b  (\b' -> go (fb' b'))
                M          m   -> M (m >>= \p' -> return (go p'))
                Pure    f      -> fmap f px
        (*>) = (>>)
First note that pure = Pure which isn’t a stunner just from the naming. In <*> we have the same sort of pattern as in fmap. We race down the “function” side of <*> and whenever we reach a Pure we have a function from a -> b, with that function we call fmap on the structure on the “argument” side. So we’re kind of gluing that px onto that pf by changing each Pure f to fmap f px.
Finally we have the monad instance. Of course the return implementation is the same as for pure but (>>=) = _bind so the implementation of _bind has been chucked out of the instance itself. It turns out there’s a good reason for that: _bind has a bunch of rewrite rules attached to it.
    _bind
        :: Monad m
        => Proxy a' a b' b m r
        -> (r -> Proxy a' a b' b m r')
        -> Proxy a' a b' b m r'
    p0 `_bind` f = go p0 where
        go p = case p of
            Request a' fa  -> Request a' (\a  -> go (fa  a ))
            Respond b  fb' -> Respond b  (\b' -> go (fb' b'))
            M          m   -> M (m >>= \p' -> return (go p'))
            Pure    r      -> f r
Now excitingly the implementation of bind is almost exactly what we had before! Now instead of Pure f -> fmap f px it’s Pure r -> f r so we have something more like substitution than gluing.
Now that Proxy is a monad, we can make it a monad transformer!
    instance MonadTrans (Proxy a' a b' b) where
        lift m = M (m >>= \r -> return (Pure r))
So we need to take an m a and return Proxy a' a b' b m a, we want to use M :: m (Proxy a' a b' b m a) but we have an m a, by fmaping Pure we’re good to go.
From here on out it’s just a series of not so exciting MTL instances so we’ll skip those.. There’s a couple interesting things left though! Before we get to them recall the monad transformer laws

return  = lift . return
m >>= f = lift m >>= (lift . f)

In other words, lift should “commute” with the two operations of the monad type class. This isn’t actually true by default with Proxy, for example
return a = Pure a
lift (return a) = M (fmap Pure (return a)) = M (return (Pure a))
To solve this we have observe. This function is supposed to normalize a Proxy so that these laws hold.
    observe :: Monad m => Proxy a' a b' b m r -> Proxy a' a b' b m r
    observe p0 = M (go p0) where
        go p = case p of
            Request a' fa  -> return (Request a' (\a  -> observe (fa  a )))
            Respond b  fb' -> return (Respond b  (\b' -> observe (fb' b')))
            M          m'  -> m' >>= go
            Pure    r      -> return (Pure r)
Note that go takes a Proxy a' a b' b m r and returns m (Proxy a' a b' b m r). By doing this, we can route stick everything in m with return except for M m' which we just unwrap and keep going. This means return (Pure a) = go (Pure a) which is what is required for the monad transformer laws to hold.
Finally, the last thing in this file is X which is used to represent the type for communication that cannot happen. So if we have a pipe at the beginning of the pipeline, it shouldn’t be able to ask for input from another pipe.
    newtype X = X X

    closed :: X -> a
    closed (X x) = closed x
And there are no non-bottom expressions which occupy this type so we’re good. Now that we’ve seen the internal implementation of most of Proxy we can go look at the infrastructure pipes provides around this. Again going by the names, now that we’ve covered the internals it makes sense to move onto Pipes.Core.
Pipes.Core
Pipes.Core seems much closer to the actual user interface of the library, we can see that it exports a bunch of familiar sounding names:
    module Pipes.Core (
          Proxy
        , runEffect
        , respond
        , (/>/)
        , (//>)
        , request
        , (\>\)
        , (>\\)
        , push
        , (>~>)
        , (>>~)
        , pull
        , (>+>)
        , (+>>)
        , reflect
        , X
        , Effect
        , Producer
        , Pipe
        , Consumer
        , Client
        , Server
        , Effect'
        , Producer'
        , Consumer'
        , Client'
        , Server'
        , (\<\)
        , (/)
        , (<~<)
        , (~<<)
        , (<+<)
        , (<\\)
        , (//<)
        , (<<+)
        , closed
        ) where
Now a few of those we’ve seen before, namely Proxy, X, and closed. Notice that Proxy is exported abstractly here so we can’t write code which violates the monad transformer laws using this module.
The first new function is called runEffect, but it has the type
    Monad m => Effect m r -> m r
Which sounds great! I however have no clue what an effect is so let’s dig around the type exports first. There are a few type synonyms here
    type Effect = Proxy X () () X
    type Producer b = Proxy X () () b
    type Pipe a b = Proxy () a () b
    type Consumer a = Proxy () a () X
    type Client a' a = Proxy a' a () X
    type Server b' b = Proxy X () b' b
    type Effect' m r = forall x' x y' y . Proxy x' x y' y m r
    type Producer' b m r = forall x' x . Proxy x' x () b m r
    type Consumer' a m r = forall y' y . Proxy () a y' y m r
    type Server' b' b m r = forall x' x . Proxy x' x b' b m r
    type Client' a' a m r = forall y' y . Proxy a' a y' y m r
Even though this looks like a lot, about half of these are actually duplicates which just use -XRankNTypes instead of explicitly using X. An Effect as seen above is Proxy X () () X.. I had to double check this but proxy takes 6 type arguments, here they are in order

a' is the type of things that we can send up a Request
a is the type of things which a request will return
b' is what we may be sent to respond to
b is what we may respond with
m is the underlying monad we may use for effects
r is the return value

So an Effect can only request things if it can produce an X, and it will get back a () from its requests, and it can only respond with an X and will get back a () after responding. Since we can never produce an X an Effect can never request or respond.
Similarly, a Producer can respond to things with bs, but it will only ever get back a () after a response and it can never request something. A Consumer is the dual, never responding but can request, it can only hand the code responding a () though.
Also in there are Clients and Servers which seem to be like a Consumer and a Producer but that can actually send meaningful messages with a request and receive something interesting with a respond instead of just ().
Okay, with these type synonyms in mind let’s go look at some code! Since an Effect can’t request or respond, it’s really equivalent to just some monadic action.
    runEffect :: Monad m => Effect m r -> m r
    runEffect = go
      where
        go p = case p of
            Request v _ -> closed v
            Respond v _ -> closed v
            M       m   -> m >>= go
            Pure    r   -> return r
This let’s us write runEffect which just uses the absurdity of producing a v :: X in order to turn a Proxy into an m.
runEffect is also the first function we’ve seen to actually escape the Proxy monad as well! It let’s us convert a self-contained pipeline into just an effect which should mean it comes up basically everywhere, just like runStateT.
Since the Proxy monad is abstract, we need some functions to actually be able to request things. Thus we have respond
    respond :: Monad m => a -> Proxy x' x a' a m a'
    respond a = Respond a Pure
This is actually pretty trivial, we have a constructor after all whose job it is to Respond to things so we just use that with the a we have as a response. Since we have no interesting continuation yet, but we need something of type a' -> Proxy x' x a' a m a' we just use Pure. This should be very familiar to users of free monads (remember that Pure = return)!
Next is something interesting, we’ve seen a lot of ways of manipulating a pipe, but never actually a way of combining two pipes so that they interact, our next function does that.
    (/>/)
        :: Monad m
        => (a -> Proxy x' x b' b m a')
        -> (b -> Proxy x' x c' c m b')
        -> (a -> Proxy x' x c' c m a')
    (fa />/ fb) a = fa a //> fb
Here we have two arguments, both functions to pipelines and we return a pipeline as output. Notice here that the first Proxy is something which is going to respond with things of type b and expect something of type b' in return and our second function is going to map bs to a Proxy which returns a b'. This means we can replace each Respond in the first with a call to the second function and pipe the output into our continuation for that Respond. Indeed this matches up with the return type so I anticipate that it what shall happen. However, this function shells out to another right below it so we’ll have to look at it to confirm.
    (//>)
        :: Monad m
        =>       Proxy x' x b' b m a'
        -> (b -> Proxy x' x c' c m b')
        ->       Proxy x' x c' c m a'
    p0 //> fb = go p0
      where
        go p = case p of
            Request x' fx  -> Request x' (\x -> go (fx x))
            Respond b  fb' -> fb b >>= \b' -> go (fb' b')
            M          m   -> M (m >>= \p' -> return (go p'))
            Pure       a   -> Pure a
The interesting line here is Respond b fb' -> ... which does exactly what I thought it ought to (I feel clever). In that line we run the function we have in the second argument with the data the first argument was Responding with. We sort of “intercept” a message intended for downstream and just handle it right there. Since we do this for all things Responding with bs we now only respond with cs hence the change in type. It doesn’t effect the upstream type, but we can now take something producing values and transform them to instead run some other computation (perhaps producing something else).
In a limited case we can do something like
    intercept :: Monad m
              => (b -> c)
              -> Proxy a' a b' b m r
              -> Proxy a' a b' c m r
    intercept f p = p //> respond . f
Cool! Now up next seems to be the dual of what we’ve just looked at.
    request :: Monad m => a' -> Proxy a' a y' y m a
    request a' = Request a' Pure
This is just what we had with respond but using Request instead. Similarly we ahve a counterpart for />/. It again shells out to a similar, pointful, function >\\
    (\>\)
        :: Monad m
        => (b' -> Proxy a' a y' y m b)
        -> (c' -> Proxy b' b y' y m c)
        -> (c' -> Proxy a' a y' y m c)
    (fb' \>\ fc') c' = fb' >\\ fc' c'

    (>\\)
        :: Monad m
        => (b' -> Proxy a' a y' y m b)
        ->        Proxy b' b y' y m c
        ->        Proxy a' a y' y m c
    fb' >\\ p0 = go p0
      where
        go p = case p of
            Request b' fb  -> fb' b' >>= \b -> go (fb b)
            Respond x  fx' -> Respond x (\x' -> go (fx' x'))
            M          m   -> M (m >>= \p' -> return (go p'))
            Pure       a   -> Pure a
I’d expect that this function does sort of what the other did before. It’ll take Requests and “answer” them inline by replacing it with a call to the other function. In fact, when you think about what the hell is the difference between a request and a response? They’re completely symmetric! They both transmit information sending one type in one direction and one type in another. So we should have exactly the same code that just happens to use Request instead of Respond. which is indeed what we have.
The only real difference here is in the argument order which hints at the fact that we’re going to break symmetry sooner or later, it just hasn’t happened yet.
Next up is
    push :: Monad m => a -> Proxy a' a a' a m r
    push = go
      where
        go a = Respond a (\a' -> Request a' go)
push takes a seed a and chucks it down the pipeline. Once it gets a response, it throws it up the pipeline with Request and when it gets a response (something of type a) it starts the whole process over again. Now the process starts by sending values down, there’s no reason why we can’t do the reverse and start by asking for a value
    pull :: Monad m => Proxy a' a a' a m r
    pull = go
      where
        go a' = Request a' (\a -> Respond a go)
which conveniently is right near by. Now push and pull each give rise to a form of composition which takes two Proxys and glues them together. The first is
    (>~>)
        :: Monad m
        => (_a -> Proxy a' a b' b m r)
        -> ( b -> Proxy b' b c' c m r)
        -> (_a -> Proxy a' a c' c m r)
This takes two Proxys which can communicate with each other and gives back a Proxy which has internalized this dialogue. This shells out to the pointful version, >>~
    (>>~)
        :: Monad m
        =>       Proxy a' a b' b m r
        -> (b -> Proxy b' b c' c m r)
        ->       Proxy a' a c' c m r
    p >>~ fb = case p of
        Request a' fa  -> Request a' (\a -> fa a >>~ fb)
        Respond b  fb' -> fb' +>> fb b
        M          m   -> M (m >>= \p' -> return (p' >>~ fb))
        Pure       r   -> Pure r
For this code we walk down the tree and recurse in all cases except where we have a Response. This should send some information to that function we got as an argument and then use that response to continue, so we want some way of taking a Proxy b' b c' c m r and a b' -> Proxy a' a b' b m r and giving back a Proxy a' a c' c m r. This looks like the exact dual to >>~ and indeed is the equivalent in the pull version.
    (+>>)
        :: Monad m
        => (b' -> Proxy a' a b' b m r)
        ->        Proxy b' b c' c m r
        ->        Proxy a' a c' c m r
    fb' +>> p = case p of
        Request b' fb  -> fb' b' >>~ fb
        Respond c  fc' -> Respond c (\c' -> fb' +>> fc' c')
        M          m   -> M (m >>= \p' -> return (fb' +>> p'))
        Pure       r   -> Pure r
This does the exact opposite of >>~. It walks around recursing until we get to a Request, this should transfer control up to that function b' -> Proxy ... and it does by calling >>~. So these two operators >>+ and >>~ work together to join up to Proxy functions by using one to answer the other’s Request and Responds. The symmetry breaking here is who should we inspect “first” so to speak. If we start with the upstream one than the second one is only run when a value is push-ed down to it and if we start with the former we only run the upstream version when we pull something from it. Nifty.
One thing to note, what happens when one of these Proxys give up and return? This potential situation is reflected in the fact that both of these Proxys must return an r. Therefore, whenever one of these returns and we’re currently running it (the upstream for >>~, downstream for >>+) we can just return the value and be done with the whole thing. In this sense composing a Proxy has this short circuiting property, at any point in the pipeline you can just give up and return something!
Remember before how I was ranting about how Request and Respond were really the same damn thing, it turns out I’m not the only one who thought that
    reflect :: Monad m => Proxy a' a b' b m r -> Proxy b b' a a' m r
    reflect = go
      where
        go p = case p of
            Request a' fa  -> Respond a' (\a  -> go (fa  a ))
            Respond b  fb' -> Request b  (\b' -> go (fb' b'))
            M          m   -> M (m >>= \p' -> return (go p'))
            Pure    r      -> Pure r
Looking at the type here is really telling, all we do to switch the upstream and downstream ends is swap the constructors Request and Respond! That actually wraps up the core of pipes, the rest is just a bunch of synonyms with the arguments flipped!
Now that we’ve finished up Pipes.Core it’s not clear where to go so I decided to go look at the top level Pipes module since between the .Internal and .Core modules we should have covered a lot of it. It turns out the top level only imports those two modules so we can now go through that!
Pipes
Really the top level package Pipes just re exports some stuff and defines some thin layers of the rest
    module Pipes (
          Proxy
        , X
        , Effect
        , Effect'
        , runEffect
        , Producer
        , Producer'
        , yield
        , for
        , (~>)
        , (<~)
        , Consumer
        , Consumer'
        , await
        , (>~)
        , (~<)
        , Pipe
        , cat
        , (>->)
        , (<-<)
        , ListT(..)
        , runListT
        , Enumerable(..)
        , next
        , each
        , every
        , discard
        , module Control.Monad
        , module Control.Monad.IO.Class
        , module Control.Monad.Trans.Class
        , module Control.Monad.Morph
        , module Data.Foldable
        )
Now what haven’t we seen, the first thing is this yield construct which turns out to be a snazzier name for respond with a nicer type.
    yield :: Monad m => a -> Producer' a m ()
    yield = respond
Similarly, for is just a synonym (//>) (first joiner we went through) and ~> is the point free version. On the other end we have stuff overlaying request and friends but they’re not quite symmetric
    await :: Monad m => Consumer' a m a
    await = request ()

    (>~)
        :: Monad m
        => Proxy a' a y' y m b
        -> Proxy () b y' y m c
        -> Proxy a' a y' y m c
    p1 >~ p2 = (\() -> p1) >\\ p2
So we need to cope with the fact request can actually transfer interesting data down as well as up, in the basic case though we just assume that we’re dealing with ()s. Also note that >~ is biased to the downstream Proxy, it starts by running it and whenever we actually request something (by sending up a ()) we run p1. This function lets us compose Proxys, not functions to Proxys so that’s one nice effect.
Finally, we see our first example of a pipe
    cat :: Monad m => Pipe a a m r
    cat = pull ()
cat works by requesting something upstream immediately and passing it downstream. Nothing interesting except that it combines great with other Proxys. Say for example we have a random number generator, we can easily create a Proxy producing random numbers with
    randoms = lift getRandomNumber >~ cat
we use >~ to replace each request in cat with a call to getRandomNumber which will be immediately pushed downstream. Similarly, we can use cat to push everything into some computation. If we want to debug a pipe by just printing everything we could say
    printAll = for cat (lift . print)
So cat is a nice way of lifting something to work across Proxys of values if nothing else.
Next is the common case of composing to Proxys,
    (>->)
        :: Monad m
        => Proxy a' a () b m r
        -> Proxy () b c' c m r
        -> Proxy a' a c' c m r
    p1 >-> p2 = (\() -> p1) +>> p2
>-> makes it easy to join up to Proxys that don’t send any interesting data “up” with requests. >-> starts by running p2 using +>> and whenever p2 requests something it goes and runs p1 for a while. This function lets us connect a Pipe to Pipe or Producer to Consumer for example.
Finally, we wrap up this module with the definition of ListT inside it. Using Producer we can define a nonbroken version of ListT
    newtype ListT m a = Select { enumerate :: Producer a m () }

    instance (Monad m) => Functor (ListT m) where
        fmap f p = Select (for (enumerate p) (\a -> yield (f a)))

    instance (Monad m) => Applicative (ListT m) where
        pure a = Select (yield a)
        mf <*> mx = Select (
            for (enumerate mf) (\f ->
            for (enumerate mx) (\x ->
            yield (f x) ) ) )

    instance (Monad m) => Monad (ListT m) where
        return a = Select (yield a)
        m >>= f  = Select (for (enumerate m) (\a -> enumerate (f a)))
        fail _   = mzero
What’s kinda nifty here is we just use a Producer returning a () to represent our list. Here we can use for to access every yield x which corresponds to our “list” having an entry x! From there this is really just the standard set of instances for a list! In particular >>= is mapConcat for producers. That about wraps up this module and I’ll end the blog post with it.
Wrap Up
I didn’t actually go through all of pipes here, just the “core operations” which everything else is built on top of. In particular, I urge you to go read how Pipes.Prelude is implemented. Just like implementing the Haskell prelude is a good exercise the same is true of pipes.
It turned out that pipes isn’t all that awful on the inside, it’s a library built around a specific free-monad like structure which a couple different methods of joining two computations together. In particular there were a few different notions of composition which really define pipes

Sequential composition: running one machine and then another is given with >>=
Take one computation and replace all its Responds with another function using //> or for in non-infix speak
Take one computation and replace all its Requests with another function using >\\
Take two computations and compose them together so a request from one is satisfied by running the other until a respond using >>+ and >>~

Hope you learned as much as I did, cheers.

          
          
          comments powered by Disqus



Compiling a Lazy Language in 1,000 words
Danny Gratzer — Tue, 19 May 2015 00:00:00 UT

    Posted on May 19, 2015
    


    
    Tags: compilers, haskell
    


I’m a fan of articles like this one which set out to explain a really complicated subject in 600 words or less. I wanted to write one with a similar goal for compiling a language like Haskell. To help with this I’ve broken down what most compilers for a lazy language do into 5 different phases and spent 200 words explaining how they work. This isn’t really intended to be a tutorial on how to implement a compiler, I just want to make it less magical.
I assume that you know how a lazy functional language looks (this isn’t a tutorial on Haskell) and a little about how your machine works since I make a few references to how some lower level details are compiled. These will make more sense if you know such things, but they’re not necessary.
And the word-count-clock starts… now.
Parsing
Our interactions with compilers usually involve treating them as a huge function from string to string. We give them a string (our program) and it gives us back a string (the compiled code). However, on the inside the compiler does all sorts of stuff to that string we gave it and most of those operations are inconvenient to do as string operations. In the first part of the compiler, we convert the string into an abstract syntax tree. This is a data structure in the compiler which represents the string, but in

A more abstract way, it doesn’t have details such as whitespace or comments
A more convenient way, it let’s the compiler perform the operations it wants efficiently

The process of going String -> AST is called “parsing”. It has a lot of (kinda stuffy IMO) theory behind it. This is the only part of the compiler where the syntax actually matters and is usually the smallest part of the compiler.
Examples:

Purescript
Elm

Type Checking
Now that we’ve constructed an abstract syntax tree we want to make sure that the program “makes sense”. Here “make sense” just means that the program’s types are correct. The process for checking that a program type checks involves following a bunch of rules of the form “A has type T if B has type T1 and C has type…”. All of these rules together constitute the type system for our language. As an example, in Haskell f a has the type T2 if f has the type T1 -> T2 and a has the type T1.
There’s a small wrinkle in this story though: most languages require some type inference. This makes things 10x harder because we have to figure the types of everything as we go! Type inference isn’t even possible in a lot of languages and some clever contortions are often needed to be inferrable.
However, once we’ve done all of this the program is correct enough to compile. Past type checking, if the compiler raises an error it’s a compiler bug.
Examples:

A type inferencer for Mini-ML

Optimizations/Simplifications
Now that we’re free of the constraints of having to report errors to the user things really get fun in the compiler. Now we start simplifying the language by converting a language feature into a mess of other, simpler language features. Sometimes we convert several features into specific instances of one more general feature. For example, we might convert our big fancy pattern language into a simpler one by elaborating each case into a bunch of nested cases.
Each time we remove a feature we end up with a slightly different language. This progression of languages in the compiler are called the “intermediate languages” (ILs). Each of these ILs have their own AST as well! In a good compiler we’ll have a lot of ILs as it makes the compiler much more maintainable.
An important part of choosing an IL is making it amenable to various optimizations. When the compiler is working with each IL it applies a set of optimizations to the program. For example

Constant folding, converting 1 + 1 to 2 during compile time
Inlining, copy-pasting the body of smaller functions where they’re called
Fusion, turning multiple passes over a datastructure into a single one

Examples:

Pattern matching
A nice demonstration of many ILs

Spineless, Tagless, and Generally Wimpy IL
At some point in the compiler, we have to deal with the fact we’re compiling a lazy language. One nice way is to use a spineless, tagless, graph machine (STG machine).
How an STG machine works is a little complicated but here’s the gist

An expression becomes a closure/thunk, a bundling of code to compute the expressoin and the data it needs. These closure may depend on several arguments being supplied
We have a stack for arguments and another for continuations. A continuation is some code which takes the value returned from an expression and does something with it, like pattern match on it
To evaluate an expression we push the arguments it needs onto the stack and “enter” the corresponding closure, running the code in it
When the expression has evaluated itself it will pop the next continuation off the stack and give it the resulting value

During this portion of the compiler, we’d transform out last IL into a C-like language which actually works in terms of pushing, popping, and entering closures.
The key idea here that makes laziness work is that a closure defers work! It’s not a value, it’s a recipe for how to compute a value when we need it. Also note, all calls are tail calls since function calls are just a special case of entering a closure.
Another really beautiful idea in the STG machine is that closures evaluate themselves. This means closures present a uniform interface no matter what, all the details are hidden in that bundled up code. (I’m totally out of words to say this, but screw it it’s really cool).
Examples:

My writeup
ezyang’s (better) writeup
The paper
Another paper

Code Generation
Finally, after converting to compiling STG machine we’re ready to output the target code. This bit is very dependent on what exactly we’re targeting.
If we’re targeting assembly, we have a few things to do. First, we have to switch from using variables to registers. This process is called register allocation and we basically slot each variable into an available register. If we run out, we store variables in memory and load it in as we need it.
In addition to register allocation, we have to compile those C-like language constructs to assembly. This means converting procedures into a label and some instructions, pattern matches into something like a jump table and so on. This is also where we’d apply low-level, bit-twiddling optimizations.
Examples:

LLVM Code Generation
Any good compilers book

Conclusion
Okay, clock off.
Hopefully that was helpful even if you don’t care that much about lazy languages (most of these ideas apply in any compiler). In particular, I hope that you now believe me when I say that lazy languages aren’t magical. In fact, the worry of how to implement laziness only really came up in one section of the compiler!
Now I have a question for you dear reader, what should I elaborate on? With summer ahead, I’ll have some free time soon. Is there anything else that you would like to see written about? (Just not parsing please)

          
          
          comments powered by Disqus



A Proof of Church Rosser in Twelf
Danny Gratzer — Tue, 05 May 2015 00:00:00 UT

    Posted on May  5, 2015
    


    
    Tags: twelf, types
    


An important property in any term rewriting system, a system of rules for saying one term can be rewritten into another, is called confluence. In a term rewriting system more than one rule may apply at a time, confluence states that it doesn’t matter in what order we apply these rules. In other words, there’s some sort of diamond property in our system
                 Starting Term
                    /     \
                   /       \
          Rule 1  /         \ Rule 2
                 /           \
                /             \
               B               C
                \              /
         A bunch \     of     / rules later
                  \          /
                   \        /
                    \      /
                 Same end point
In words (and not a crappy ascii picture)

Suppose we have some term A
The system lets us rewrite A to B
The system lets us rewrite A to C

Then two things hold

The system lets us rewrite B to D in some number of rewrites
The system lets us rewrite C to D with a different series of rewrites

In the specific case of lambda calculus, confluence is referred to as the “Church-Rosser Theorem”. This theorem has several important corollaries, including that the normal forms of any lambda term is unique. To see this, remember that a normal form is always “at the bottom” of diamonds like the one we drew above. This means that if some term had multiple steps to take, they all must converge before one of them reaches a normal form. If any of them did hit a normal form first, they couldn’t complete the diamond.
Proving Church-Rosser
In this post I’d like to go over a proof of the Church Rosser theorem in Twelf, everyone’s favorite mechanized metalogic. To follow along if you don’t know Twelf, perhaps some shameless self linking will help.
We need to start by actually defining lambda calculus. In keeping with Twelf style, we laugh at those restricted by the bounds of inductive types and use higher order abstract syntax to get binding for free.
    term : type.
    ap   :  term -> term  -> term.
    lam  : (term -> term) -> term.
We have to constructors, ap, which applies one term to another. The interesting one here is lam which embeds the LF function space, term -> term into term. This actually makes sense because term isn’t an inductive type, just a type family with a few members. There’s no underlying induction principle with which we can derive contradictions. To be perfectly honest I’m not sure how the proof of soundness of something like Twelf %total mechanism proceeds. If a reader is feeling curious, I believe this is the appropriate paper to read.
With this, something like λx. x x as lam [x] ap x x.
Now on to evaluation. We want to talk about things as a term rewriting system, so we opt for a small step evaluation approach.
    step     : term -> term -> type.
    step/b   : step (ap (lam F) A) (F A).
    step/ap1 : step (ap F A) (ap F' A)
                <- step F F'.
    step/ap2 : step (ap F A) (ap F A')
                <- step A A'.
    step/lam : step (lam [x] M x) (lam [x] M' x)
                <- ({x} step (M x) (M' x)).

    step* : term -> term -> type.
    step*/z : step* A A.
    step*/s : step* A C
               <- step A B
               <- step* B C.
We start with the 4 sorts of steps you can make in this system. 3 of them are merely “if you can step somewhere else, you can pull the rewrite out”, I’ve heard these referred to as compatibility rules. This is what ap1, ap2 and lam do, lam being the most interesting since it deals with going under a binder. Finally, the main rule is step/b which defines beta reduction. Note that HOAS gives us this for free as application.
Finally, step* is for a series of steps. We either have no steps, or a step followed by another series of steps. Now we want to prove a couple theorems about our system. These are mostly the lifting of the “compatibility rules” up to working on step*s. The first is the lifting of ap1.
     step*/left : step* F F' -> step* (ap F A) (ap F' A) -> type.
     %mode +{F : term} +{F' : term} +{A : term} +{In : step* F F'}
     -{Out : step* (ap F A) (ap F' A)} (step*/left In Out).

     - : step*/left step*/z step*/z.
     - : step*/left (step*/s S* S) (step*/s S'* (step/ap1 S))
          <- step*/left S* S'*.

     %worlds (lam-block) (step*/left _ _).
     %total (T) (step*/left T _).
Note, the mode specification I’m using a little peculiar. It needs to be this verbose because otherwise A mode-errors. Type inference is peculiar.
The theorem says that if F steps to F' in several steps, for all A, ap F A steps to ap F' A in many steps. The actual proof is quite boring, we just recurse and apply step/ap1 until everything type checks.
Note that the world specification for step*/left is a little strange. We use the block lam-block because later one of our theorem needs this. The block is just
%block lam-block : block {x : term}.
We need to annotate this on all our theorems because Twelf’s world subsumption checker isn’t convinced that lam-block can subsume the empty worlds we check some of our theorems in. Ah well.
Similarly to step*/left there is step*/right. The proof is 1 character off so I won’t duplicate it.
    step*/right : step* A A' -> step* (ap F A) (ap F A') -> type.
Finally, we have step/lam, the lifting of the compatibility rule for lambdas. This one is a little more fun since it actually works by pattern matching on functions.
     step*/lam : ({x} step* (F x) (F' x))
                  -> step* (lam F) (lam F')
                  -> type.
     %mode step*/lam +A -B.

     - : step*/lam ([x] step*/z) step*/z.
     - : step*/lam ([x] step*/s (S* x) (S x))
          (step*/s S'* (step/lam S))
          <- step*/lam S* S'*.

     %worlds (lam-block) (step*/lam _ _).
     %total (T) (step*/lam T _).
What’s fun here is that we’re inducting on a dependent function. So the first case matches [x] step*/z and the second [x] step*/s (S* x) (S x). Other than that we just use step/lam to lift up S and recurse to lift up S* in the second case.
We need one final (more complicated) lemma about substitution. It states that if A steps to A', then F A steps to F A' in many steps for all F. This proceeds by induction on the derivation that A steps to A'. First off, here’s the formal statement in Twelf
This is the lemma that actually needs the world with lam-blocks
    subst : {F} step A A' -> step* (F A) (F A') -> type.
    %mode subst +A +B -C.
Now the actual proof. The first two cases are for constant functions and the identity function
    - : subst ([x] A) S step*/z.
    - : subst ([x] x) S (step*/s step*/z S).
In the case of the constant functions the results of F A and F A' are the same so we don’t need to step at all. In the case of the identity function we just step with the step from A to A'.
In the next case, we deal with nested lambdas.
     - : subst ([x] lam ([y] F y x)) S S'*
          <- ({y} subst (F y) S (S* y))
          <- step*/lam S* S'*.
Here we recurse, but we carefully do this under a pi type. The reason for doing this is because we’re recursing on the open body of the inner lambda. This has a free variable and we need a pi type in order to actually apply F to something to get at the body. Otherwise this just uses step*/lam to lift the step across the body to the step across lambdas.
Finally, application.
     - : subst ([x] ap (F x) (A x)) S S*
          <- subst F S F*
          <- subst A S A*
          <- step*/left F* S1*
          <- step*/right A* S2*
          <- join S1* S2* S*.
This looks complicated, but isn’t so bad. We first recurse, and then use various compatibility lemmas to actually plumb the results of the recursive calls to the right parts of the final term. Since there are two individual pieces of stepping, one for the argument and one for the function, we use join to slap them together.
With this, we’ve got all our lemmas
    %worlds (lam-block) (subst _ _ _).
    %total (T) (subst T _ _).
The Main Theorem
Now that we have all the pieces in place, we’re ready to state and prove confluence. Here’s our statement in Twelf
    confluent : step A B -> step A C -> step* B D -> step* C D -> type.
    %mode confluent +A +B -C -D.
Unfortunately, there’s a bit of a combinatorial explosion with this. There are approximately 3 * 3 * 3 + 1 = 10 cases for this theorem. And thanks to the lemmas we’ve proven, they’re all boring.
First we have the cases where step A B is a step/ap1.
     - : confluent (step/ap1 S1) (step/ap1 S2) S1'* S2'*
          <- confluent S1 S2 S1* S2*
          <- step*/left S1* S1'*
          <- step*/left S2* S2'*.
     - : confluent (step/ap1 S1) (step/ap2 S2)
          (step*/s step*/z (step/ap2 S2))
          (step*/s step*/z (step/ap1 S1)).
     - : confluent (step/ap1 (step/lam F) : step (ap _ A) _) step/b
          (step*/s step*/z step/b) (step*/s step*/z (F A)).
In the first case, we have two ap1s. We recurse on the smaller S1 and S2 and then immediately use one of our lemmas to lift the results of the recursive call, which step the function part of the the ap we’re looking at, to work across the whole ap term. In the second case there, we’re stepping the function in one, and the argument in the other. In order to bring these to a common term we just apply the first step to the resulting term of the second step and vice versa. This means that we’re doing something like this
                 F A
                /   \
           S1  /     \ S2
              /       \
            F' A     F  A'
              \       /
           S2  \     /  S1
                \   /
                F' A'
This clearly commutes so this case goes through. For the final case, we’re applying a lambda to some term so we can beta reduce. On one side we step the body of the lambda some how, and on the other we immediately substitute. Now we do something clever. What is a proof that lam A steps to lam B? It’s a proof that for any x, A x steps to B x. In fact, it’s just a function from x to such a step A x to B x. So we have that lying around in F. So to step from the beta-reduced term G A to G' A all we do is apply F to A! The other direction is just beta-reducing ap (lam G') A to the desired G' A.
In the next set of cases we deal with ap2!
     - : confluent (step/ap2 S1) (step/ap2 S2) S1'* S2'*
          <- confluent S1 S2 S1* S2*
          <- step*/right S1* S1'*
          <- step*/right S2* S2'*.
     - : confluent (step/ap2 S1) (step/ap1 S2)
          (step*/s step*/z (step/ap1 S2))
          (step*/s step*/z (step/ap2 S1)).
     - : confluent (step/ap2 S) (step/b : step (ap (lam F) _) _)
          (step*/s step*/z step/b) S1*
          <- subst F S S1*.
The first two cases are almost identical to what we’ve seen before. The key difference here is in the third case. This is again where we’re stepping something on one side and beta-reducing on the other. We can’t use the nice free stepping provided by F here since we’re stepping the argument, not the function. For this we appeal to subst which let’s us step F A to F A' using S1* exactly as required. The other direction is trivial just like it was in the ap1 case, we just have to step ap (lam F) A' to F A' which is done with beta reduction.
I’m not going to detail the cases to do with step/b as the first argument because they’re just mirrors of the cases we’ve looked at before. That only leaves us with one more case, the case for step/lam.
     - : confluent (step/lam F1) (step/lam F2) F1'* F2'*
          <- ({x} confluent (F1 x) (F2 x) (F1* x) (F2* x))
          <- step*/lam F1* F1'*
          <- step*/lam F2* F2'*.
This is just like all the other “diagonal” cases, like confluent (ap1 S1) (ap1 S2) .... We first recurse (this time using a pi to unbind the body of the lambda) and then use compatibility rules in order to get something we can give back from confluent. And with this, we can actually prove that lambda calculus is confluent.
    %worlds (lam-block) (confluent _ _ _ _).
    %total (T) (confluent T _ _ _).
Wrap Up
We went through a fairly significant proof here, but the end results were interesting at least. One nice thing this proof illustrates is how well HOAS lets us encode these proofs. It’s a very Twelf-y approach to use lambdas to represent bindings. All in all, it’s a fun proof.

          
          
          comments powered by Disqus



Bracket Abstraction: The Smallest PL You've Ever Seen
Danny Gratzer — Fri, 01 May 2015 00:00:00 UT

    Posted on May  1, 2015
    


    
    Tags: types, haskell
    


It’s well known that lambda calculus is an extremely small, Turing Complete language. In fact, most programming languages over the last 5 years have grown some (typed and or broken) embedding of lambda calculus with aptly named lambdas.
This is wonderful and everything but lambda calculus is actually a little complicated. It’s centred around binding and substituting for variables, while this is elegant it’s a little difficult to formalize mathematically. It’s natural to wonder whether we can avoid dealing with variables by building up all our lambda terms from a special privileged few.
These systems (sometimes called combinator calculi) are quite pleasant to model formally, but how do we know that our system is complete? In this post I’d like to go over translating any lambda calculus program into a particular combinator calculus, SK calculus.
What is SK Combinator Calculus?
SK combinator calculus is a language with exactly 3 types of expressions.

We can apply one term to another, e e,
We have one term s
We another term k

Besides the obvious ones, there are two main rules for this system:

s a b c = (a c) (b c)
k a b = a

And that’s it. What makes SK calculus so remarkable is how minimal it is. We now show that it’s Turing complete by translating lambda calculus into it.
Bracket Abstraction
First things first, let’s just define how to represent both SK calculus and lambda calculus in our Haskell program.
    data Lam = Var Int | Ap Lam Lam | Lam Lam
    data SK  = S | K | SKAp SK SK
Now we begin by defining a translation from a simplified lambda calculus to SK calculus. This simplified calculus is just SK supplemented with variables. By defining this step, the actual transformation becomes remarkably crisp.
    data SKH = Var' Int | S' | K' | SKAp' SKH SKH
Note that while SKH has variables, but no way to bind them. In order to remove a variable, we have bracket. bracket has the property that replacing Var 0 in a term, e, with a term, e', is the same as SKAp (bracket e) e'.
    -- Remove one variable
    bracket :: SKH -> SKH
    bracket (Var' 0) = SKAp' (SKAp' S' K') K'
    bracket (Var' i) = Var' (i - 1)
    bracket (SKAp' l r) = SKAp' (SKAp' S' (bracket l)) (bracket r)
    bracket x = x
If we’re at Var 0 we replace the variable with the term s k k. This has the property that (s k k) A = A. It’s traditional to abbreviate s k k as i (leading to the name SKI calculus) but i is strictly unnecessary as we can see.
If we’re at an application, we do something really clever. We have two terms which both have a free variable, so we bracket them and use S to supply the free variable to both of them! Remember that
s (bracket A) (bracket B) C = ((bracket A) C) ((bracket B) C)
which is exactly what we require by the specification of bracket.
Now that we have a way to remove free variables from an SKH term, we can close off a term with no free variables to give back a normal SK term.
    close :: SKH -> SK
    close (Var' _) = error "Not closed"
    close S' = S
    close K' = K
    close (SKAp' l r) = SKAp (close l) (close r)
Now our translator can be written nicely.
    l2h :: Lam -> SKH
    l2h (Var i) = Var' i
    l2h (Ap l r) = SKAp' (l2h l) (l2h r)
    l2h (Lam h) = bracket (l2h h)

    translate :: Lam -> SK
    translate = close . l2h
l2h is the main worker in this function. It works across SKH’s because it needs to deal with open terms during the translation. However, during the process we repeatedly call bracket so every time we go under a binder we call bracket afterwards, removing the free variable we just introduced.
This means that if we call l2h on a closed lambda term we get back a closed SKH term. This justifies using close after the toplevel call to l2h in translate which wraps up our conversion.
For funsies I decided to translate the Y combinator and got back this mess
(s ((s ((s s) ((s k) k))) ((s ((s s) ((s ((s s) k)) k))) ((s ((s s) k)) k))))
((s ((s s) ((s k) k))) ((s ((s s) ((s ((s s) k)) k))) ((s ((s s) k)) k)))
Completely useless, but kinda fun to look at. More interestingly, the canonical nonterminating lambda term is λx. x x which gives back s i i, much more readable.
Wrap Up
Now that we’ve performed this translation we have a very nice proof of the turing completeness of SK calculus. This has some nice upshots, folks who study things like realizability models of constructive logics use Partial Combinatory Algebras a model of computation. This is essentially an algebraic model of SK calculus.
If nothing else, it’s really quite crazy that such a small language is possible of simulating any computable function across numbers.

          
          
          comments powered by Disqus



Compiling With CPS
Danny Gratzer — Thu, 30 Apr 2015 00:00:00 UT

    Posted on April 30, 2015
    


    
    Tags: compilers, haskell
    


Hello folks. It’s been a busy month so I haven’t had much a chance to write but I think now’s a good time to talk about another compiler related subject: continuation passing style conversion.
When you’re compiling a functional languages (in a sane way) your compiler mostly consists of phases which run over the AST and simplify it. For example in a language with pattern matching, it’s almost certainly the case that we can write something like
    case x of
      (1, 2) -> True
      (_, _) -> False
Wonderfully concise code. However, it’s hard to compile nested patterns like that. In the compiler, we might simplify this to
    case x of
     (a, b) -> case a of
                 1 -> case b of
                        2 -> True
                        _ -> False
                 _ -> False
note to future me, write a pattern matching compiler
We’ve transformed our large nested pattern into a series of simpler, unnested patterns. The benefit here is that this maps more straight to a series of conditionals (or jumps).
Now one of the biggest decisions in any compiler is what to do with expressions. We want to get rid of complicated nested expressions because chances are our compilation target doesn’t support them. In my second to last post we transformed a functional language into something like SSA. In this post, we’re going to walk through a different intermediate representation: CPS.
What is CPS
CPS is a restriction of how a functional language works. In CPS we don’t have nested expressions anymore. We instead have a series of lets which telescope out and each binds a “flat” expressions. This is the process of “removing expressions” from our language. A compiler probably is targeting something with a much weaker notion of expressions (like assembly) and so we change our tree like structure into something more linear.
Additionally, no functions return. Instead, they take a continuation and when they’re about to return they instead pass their value to it. This means that conceptually, all functions are transformed from a -> b to (a, b -> void) -> void. Logically, this is actually a reasonable thing to do. This corresponds to mapping a proposition b to ¬ ¬ b. What’s cool here is that since each function call calls a continuation instead of returning its result, we can imagine that each function just transferring control over to some other part of the program instead of returning. This leads to a very slick and efficient way of implementing CPSed function calls as we’ll see.
This means we’d change something like
    fact n = if n == 0 then 1 else n * fact (n - 1)
into
    fact n k =
      if n == 0
      then k 1
      else let n' = n - 1 in
           fact n' (\r ->
                          let r' = n * r in
                          k r')
To see what’s going on here we

Added an extra argument to fact, its return continuation
In the first branch, we pass the result to the continuation instead of returning it
In the next branch, we lift the nested expression n - 1 into a flat let binding
We add an extra argument to the recursive call, the continuation
In this continuation, we apply multiply the result of the recursive call by n (Note here that we did close over n, this lambda is a real lambda)
Finally, we pass the final result to the original continuation k.

The only tree-style-nesting here comes from the top if expression, everything else is completely linear.
Let’s formalize this process by converting Simply Typed Lambda Calculus (STLC) to CPS form.
STLC to CPS
First things first, we specify an AST for normal STLC.
    data Tp = Arr Tp Tp | Int deriving Show

    data Op = Plus | Minus | Times | Divide

    -- The Tp in App refers to the return type, it's helpful later
    data Exp a = App (Exp a) (Exp a) Tp

               | Lam Tp (Scope () Exp a)
               | Num Int
                 -- No need for binding here since we have Minus
               | IfZ (Exp a) (Exp a) (Exp a)
               | Binop Op (Exp a) (Exp a)
               | Var a
We’ve supplemented our lambda calculus with natural numbers and some binary operations because it makes things a bit more fun. Additionally, we’re using bound to deal with bindings for lambdas. This means there’s a terribly boring monad instance lying around that I won’t bore you with.
To convert to CPS, we first need to figure out how to convert our types. Since CPS functions never return we want them to go to Void, the unoccupied type. However, since our language doesn’t allow Void outside of continuations, and doesn’t allow functions that don’t go to Void, let’s bundle them up into one new type Cont a which is just a function from a -> Void. However, this presents us with a problem, how do we turn an Arr a b into this style of function? It seems like our function should take two arguments, a and b -> Void so that it can produce a Void of its own. However, this requires products since currying isn’t possible with the restriction that all functions return Void! Therefore, we supplement our CPS language with pairs and projections for them.
Now we can write the AST for CPS types and a conversion between Tp and it.
    data CTp = Cont CTp | CInt | CPair CTp CTp

    cpsTp :: Tp -> CTp
    cpsTp (Arr l r) = Cont $ CPair (cpsTp l) (Cont (cpsTp r))
    cpsTp Int = CInt
The only interesting thing here is how we translate function types, but we talked about that above. Now for expressions.
We want to define a new data type that encapsulates the restrictions of CPS. In order to do this we factor out our data types into “flat expressions” and “CPS expressions”. Flat expressions are things like values and variables while CPS expressions contain things like “Jump to this continuation” or “Branch on this flat expression”. Finally, there’s let expressions to perform various operations on expressions.
    data LetBinder a = OpBind Op (FlatExp a) (FlatExp a)
                     | ProjL a
                     | ProjR a
                     | Pair (FlatExp a) (FlatExp a)

    data FlatExp a = CNum Int | CVar a | CLam CTp a (CExp a)

    data CExp a = Let a (LetBinder a) (CExp a)
                | CIf (FlatExp a) (CExp a) (CExp a)
                | Jump (FlatExp a) (FlatExp a)
                | Halt (FlatExp a)
Lets let us bind the results of a few “primitive operations” across values and variables to a fresh variable. This is where things like “incrementing a number” happen. Additionally, in order to create a pair or access its elements we need to us a Let.
Notice that here application is spelled Jump hinting that it really is just a jmp and not dealing with the stack in any way. They’re all jumps we can not overflow the stack as would be an issue with a normal calling convention. To seal of the chain of function calls we have Halt, it takes a FlatExp and returns it as the result of the program.
Expressions here are also parameterized over variables but we can’t use bound with them (for reasons that deserve a blogpost-y rant :). Because of this we settle for just ensuring that each a is globally unique.
So now instead of having a bunch of nested Exps, we have flat expressions which compute exactly one thing and linearize the tree of expressions into a series of flat ones with let binders. It’s still not quite “linear” since both lambdas and if branches let us have something tree-like.
We can now define conversion to CPS with one major helper function
    cps :: (Eq a, Enum a)
        => Exp a
        -> (FlatExp a -> Gen a (CExp a))
        -> Gen a (CExp a)
This takes an expression, a “continuation” and produces a CExp. We have some monad-gen stuff going on here because we need unique variables. The “continuation” is an actual Haskell function. So our function breaks an expression down to a FlatExp and then feeds it to the continuation.
    cps (Var a) c = c (CVar a)
    cps (Num i) c = c (CNum i)
The first two cases are easy since variables and numbers are already flat expressions, they go straight into the continuation.
    cps (IfZ i t e) c = cps i $ \ic -> CIf ic <$> cps t c <*> cps e c
For IfZ we first recurse on the i. Then once we have a flattened computation representing i, we use CIf and recurse.
    cps (Binop op l r) c =
      cps l $ \fl ->
      cps r $ \fr ->
      gen >>= \out ->
      Let out (OpBind op fl fr) <$> c (CVar out)
Like before, we use cps to recurse on the left and right sides of the expression. This gives us two flat expressions which we use with OpBind to compute the result and bind it to out. Now that we have a variable for the result we just toss it to the continuation.
    cps (Lam tp body) c = do
      [pairArg, newCont, newArg] <- replicateM 3 gen
      let body' = instantiate1 (Var newArg) body
      cbody <- cps body' (return . Jump (CVar newCont))
      c (CLam (cpsTp tp) pairArg
         $ Let newArg  (ProjL pairArg)
         $ Let newCont (ProjR pairArg)
         $ cbody)
Converting a lambda is a little bit more work. It needs to take a pair so a lot of the work is projecting out the left component (the argument) and the right component (the continuation). With those two things in hand we recurse in the body using the continuation supplied as an argument. The actual code makes this process look a little out of order. Remember that we only use cbody once we’ve bound the projections to newArg and pairArg respectively.
    cps (App l r tp) c = do
      arg <- gen
      cont <- CLam (cpsTp tp) arg <$> c (CVar arg)
      cps l $ \fl ->
        cps r $ \fr ->
        gen >>= \pair ->
        return $ Let pair (Pair fr cont) (Jump fl (CVar pair))
For application we just create a lambda for the current continuation. We then evaluate the left and right sides of the application using recursive calls. Now that we have a function to jump to, we create a pair of the argument and the continuation and bind it to a name. From there, we just jump to fl, the function. Turning the continuation into a lambda is a little strange, it’s also we needed an annotation for App. The lambda uses the return type of the application and constructs a continuation that maps a to c a. Note that c a is a Haskell expressions with the type CExp a.
    convert :: Exp Int -> CExp Int
    convert = runGen . flip cps (return . Halt)
With this, we’ve written a nice little compiler pass to convert expressions into their CPS forms. By doing this we’ve “eliminated expressions”. Everything is now flat and evaluation basically proceeds by evaluating one small computation and using the result to compute another and another.
There’s still some things left to compile out before this is machine code though

Closures - these can be converted to explicitly pass records with closure conversion
Hoist lambdas out of nested scope - this gets rid of anonymous functions, something we don’t have in C or assembly
Make allocation explicit - Allocate a block of memory for a group of let statements and have them explicitly move the results of their computations to it
Register allocation - Cleverly choose whether to store some particular variable in a register or load it in as needed.

Once we’ve done these steps we’ve basically written a compiler. However, they’re all influenced by the fact that we’ve compiled out expressions and (really) function calls with our conversion to CPS, it makes the process much much simpler.
Wrap Up
CPS conversion is a nice alternative to something like STG machines for lazy languages or SSA for imperative ones. As far as I’m aware the main SML interpreter (SML/NJ) compiles code in this way. As does Ur/Web if I’m not crazy. Additionally, the course entitled “Higher Order, Typed Compilation” which is taught here at CMU uses CPS conversion to make compiling SML really quite pleasant.
In fact, someone (Andrew Appel?) once wrote a paper that noted that SSA and CPS are actually the same. The key difference is that in SSA we merge multiple blocks together using the phi function. In CPS, we just let multiple source blocks jump to the same destination block (continuation). You can see this in our conversion of IfZ to CPS, instead of using phi to merge in the two branches, they both just use the continuation to jump to the remainder of the program. It makes things a little simpler because no one person needs to worry about
Finally, if you’re compiling a language like Scheme with call/cc, using CPS conversion makes the whole thing completely trivial. All you do is define call/cc at the CPS level as
call/cc (f, c) = f ((λ (x, c') → c x), c)
So instead of using the continuation supplied to us in the expression we give to f, we use the one for the whole call/cc invocation! This causes us to not return into the body of f but instead to carry on the rest of the program as if f had returned whatever value x is. This is how my old Scheme compiler did things, I put figuring out how to implement call/cc off for a week before I realized it was a 10 minute job!
Hope this was helpful!

          
          
          comments powered by Disqus



SML for Haskellers
Danny Gratzer — Fri, 24 Apr 2015 00:00:00 UT

    Posted on April 24, 2015
    


    
    Tags: sml, haskell
    


Inspired by ezyang’s OCaml for Haskellers I decided to write something similar for SML. If you already know OCaml I also recommend Adam Chlipala’s guide
I’ll follow mostly the same structure as Edward’s article so we’ll have
{- Haskell -}
(* SML *)
What Do They Have in Common
SML and Haskell have quite a lot in common
Common types:
()   | Int | Integer    | Char | Bool | String | Double | (A,  B,  C)
unit | int | IntInf.int | char | bool | string | real   | (A * B * C)
Literals:
() | 1 | 'a'  | True | "hello" | 3.14 | (1, 2, 3)
() | 1 | #'a' | true | "hello" | 3.14 | (1, 2, 3)
Common operators
== | /= | not | &&      | ||     | ++ | !!
=  | <> | not | andalso | orelse | ^  | String.sub
Type variables:
a  b
'a 'AnythingGoes
Function application:
f x y z
f x y z
Lambdas:
\x -> ...
fn x => ...
If:
if True then 1 else 0
if true then 1 else 0
Pattern matching
case x of
  Nothing -> ...
  Just a -> ...

case x of
   NONE => ...
 | SOME a => ...
Top level functions support pattern matching in both:
factorial 0 = 1
factorial n = n * factorial (n - 1)

fun factorial 0 = 1
  | factorial n = n * factorial (n - 1)
Top level bindings can be declared without the sugar for currying as well.
f = \x -> \y -> x
val f = fn x => fn y => x
We can have top level patterns in both as well:
 (a, b) = (1, 2)
 val (a, b) = (1, 2)
Type synonyms:
type Three a b = (a, a, b)
type ('a, 'b) three = 'a * 'a * 'b
Data types:
data List a = Cons a (List a) | Nil
datatype 'a list = Cons of 'a * 'a list | Nil
Notice that in ML data type constructors can only take on argument. This means they often end up taking a tuple (or record). They are however normal functions unlike in OCaml.
Type annotations:
f :: a -> a
f x = x

fun f (x : 'a) : 'a = x
Type annotations for expressions:
(1 + 1 :: Int)
(1 + 1 :  int)
Let bindings:
let x = 1     in x + x
let val x = 1 in x + x end
Declare a new mutable reference:
newIORef True
ref true
Modify a mutable reference:
setIORef r False
r := false
Read a mutable reference:
readIORef r
! r
Making exceptions:
data MyExn = Exn String; instance Exception ... where
exception Exn of string
Raising an exception:
throw (Exn "uh oh")
raise Exn "uh oh"
Catching an exception:
catch e $ \(Exn s) -> s
e handle Exn s => s
Since SML isn’t a purely functional language, none of the last couple of constructs listed live in anything monadic like their Haskell siblings. The type of r := false is just unit, not IO () or something.
What Is SML Missing
Aside from the obvious things, like SML being strict so it’s missing pervasive lazy evaluation, SML is missing some things from Haskell.
the biggest gap I stumble across in SML is the lack of higher kinded polymorphism:
data Fix f = Fix (f (Fix f))
datatype 'f fix = Fix of ('f fix) 'f (* Urk, syntax error *)
Even applying a type variable is a syntax error! As this might suggest to you, SML’s type system is much simpler than what we have in Haskell. It doesn’t have a notion of type families, GADTs, fancy kinds, data type promotion, etc, etc. SML is really limited to the areas of the Haskell type system you’d be accustomed to after reading Learn You A Haskell! Just algebraic data types, functions, and polymorphism.
Aside from this, SML doesn’t have guards, nor a lot of syntactic sugar that Haskell has. A nice exception to this is lambda cases, which is written
fn 0 => 1
 | 1 => 2
 | n => 0
Additionally, SML doesn’t have significant indentation which means that occasionally awkward parenthesis is necessary. For example
case x of
   true  => (case !r of
              x => x + 1)
 | false => (r := 1; 2)
The parenthesis are mandatory.
On the stranger side, SML has records (discussed later) but they don’t have a functional updating operation. This is a pain to be honest. Also related, SML has a somewhat nasty habit of allowing for ad-hoc overloading in the way most languages do: certain expressions are “blessed” with unutterable types that must be inferred from context. There are only a few of these, +, *, and record accessors being among them. I’m personally not a huge fan, but in practice this is almost never an issue.
Finally ML doesn’t have Haskell-style type classes. I don’t miss them, some people would.
What Is Haskell Missing (in Comparison)
Aside from the obvious things, like Haskell being lazy so it’s missing pervasive eager evaluation, SML does have a couple of interesting things.
Of course SML has actual modules. I’ve explained a bit about them earlier. This alone is reason enough to write some ML. Additionally, SML has a saner notion of records. Records are a type in and of themselves. This means we can have something like
    type coord = {x : int, y : int}
However, since this is just a type synonym we don’t actually need to declare it. Accessors are written #x to access the field x from a record. SML doesn’t have a very advanced record system so #x isn’t typeable. It’s overloaded to access a field from some record and the concrete record must be inferrable from context. This often means that while we can have free floating records, the inference woes make us want to wrap them in constructors like so
data coord = Coord of {x : int, y : int}
This has the nice upshot that record accessors aren’t horrible broken with multiple constructors. Let’s say we had
datatype user = Person {firstName : string, lastName : string}
              | Robot  {owner : string, guid : int}
We can’t apply #firstName to an expression of type user. It’s ill-typed since user isn’t a record, it has a constructor which contains a record. In order to apply #firstName we have to pattern match first.
Finally, SML has a real, honest to goodness specification. In fact, SML is so well specified it’s been completely mechanized. There is an actual mechanized proof that SML is typesafe. The practical up shot of this is that SML is rock solid. There’s a definitive right answer to what a program should do and that answer is “whatever that one true implementation does”. In fact, there are actually a lot of SML compilers and they’re all reasonably compliant. Two noteworthy ones

SML/NJ - An interactive system for SML. This provides a REPL and is what we use at CMU for our introduction to functional programming courses.
Mlton - A whole program optimizing compiler. Mlton produces stupidly fast code but is significantly slower for compilation.

Since SML is fully standardized, I general develop with NJ and eventually feed the program into mlton if I intend the thing to run fast.
Also, modules are amazing, have I mentioned modules yet?
Wrap Up
So now that we’ve gone through most of the basic syntactic constructs of SML, most ML code should be readable. This is great because there’s some interesting pieces of ML code to read. In particular, these wonderful books are written with ML

Purely Functional Data Structures
Compiling With Continuations
Modern Compiler Construction in ML

I recommend all three of these books heartily. If you’re looking to learn about compilers, the last one in particular is the best introduction I’m aware of. The second one is an in depth look at a trick for compiling strict functional language.
Other general books on ML if you decide you want to give SML a more serious look

ML for the Working Programmer
Elements of ML Programming
Programming in Standard ML

The course I’m TAing currently is based around the last one and it’s freely available online which is nice.
Cheers,

          
          
          comments powered by Disqus



Value vs Monomorphism Restriction
Danny Gratzer — Fri, 27 Mar 2015 00:00:00 UT

    Posted on March 27, 2015
    


    
    Tags: sml, haskell
    


I’m taking the undergraduate course on programming languages at CMU. For the record, I still get really excited about the novelty of taking a class (at school!) on programming languages. I’m easily made happy.
We started talking about System F and before long we touched on the value restriction. Specifically, how most people think of the value restriction incorrectly. To understand why this is, let’s first define the value restriction since it’s probably new to you if you don’t use SML.
The Value Restriction
In SML there are value level declarations just like in Haskell. We can write things like
    val x = 1
    val y = x + 1
and we end up with x bound to 1 and y bound to 2. Note that SML is strict so these bindings are evaluated right as we reach them. Also like in Haskell, SML has polymorphism, so we can write map
   fun map f [] = []
     | map f (x :: xs) = f x :: map f xs
And it gets the type ('a -> 'b) -> ('a list -> 'b list). Aside from minor syntatic differences, this is pretty much identical to what we’d write in Haskell. The value restriction concerns the intersection of these two things. In SML, the following should not compile under the standard
    val x = rev []
This is because SML requires that all polymorphic val bindings be values! In practice all implementations will do something besides this but we’ll just focus on what the standard says. Now the reason for this value restriction is widely misunderstood. Most people believe that the value restrictions
    val r  = ref NONE
    val () = r := SOME 1
    val _  = case !r of
                 SOME s => s
               | NONE   => ""
This seems to illustrate a pretty big issue for SML! We’re filling in polymorphic reference with one type and unboxing it with a different one! Clearly this would segfault without the value restriction. However, there’s a catch.
SML is based on System F (did you really think I could get through a blog post without some theory?) which is sometimes called the “polymorphic lambda calculus”. It’s the minimal language with polymorphism and functions. In this language there’s a construct for making polymorphic things: Λ.
In this language we write polymorphism explicitly by saying Λ τ. e which has the type ∀ t. T. So for example we write the identity function as
    id ≡ Λ τ. λ x : τ. x
    () = id[unit] ()
Now SML (and vanilla Haskell) have a limited subset of the power of Λ. Specifically all these lambdas have to appear at the start of a toplevel term. Meaning that they have to be of the form
    val x = Λ α. Λ β. ... e
This is called “prenex” form and is what makes type inference for SML possible. Now since we don’t show anyone the hidden Λs it doesn’t make sense to show them the type application that comes with them and SML infers and adds those for us too. What’s particularly interesting is that SML is often formalized as having this property: values start with Λ and are implicitly applied to the appropriate types where used. Even more interesting, how do you suppose we should evaluate a Λ? What for example, should this code do
    val x  = Λ τ. raise[τ] Fail (* Meaning raise an exception and say
                                  we have type τ *)
    val () = print "I made it here"
    val () = x[unit]
It seems clear that Λ should be evaluated just like how we evaluate λ, when we apply it. So I’d (and the formalization of SML) would expect this to print "I made it here" before throwing that exception. This might now surprise you just by parallels with code like this
    val x  = fn () => raise[τ] Fail
    val () = print "I made it here"
    val () = x ()
However, what about when those lambdas are implicit? In the actual source language of ML our code snippet would be
    val x  = raise Fail
    val () = print "I made it here"
    val () = x[unit]
Uhoh, this really looks like it ought to throw an exception but it apparently doesn’t! More worringly, what about when we have something like
    fun die ()  = raise Fail
    val x = die ()
    val () = print "Made it here"
Since x is never specialized, this doesn’t even throw an error! Yikes! Clearly this is a little confusing. It is however, type safe. Consider our original motivation for the value restriction. With explicit type application
    val r  = Λ τ. ref[τ] NONE
    val () = r[int] := SOME 1
    val _  = case !(r[string]) of
                 SOME s => s
               | NONE   => ""
Since the body of this function is run every time we do something with r, we’re just creating a whole bunch of new references in this code! There’s no type safety failure since !(r[string]) returns a fresh ref cell, completely different from the one we modified on the line above! This code always runs the NONE case. In fact, if this did the wrong thing it’s just a canary in the coal mine, a symptom of the fact that our system evaluates under (big) lambda binders.
So the value restriction is really not at all about type safety, it’s about comprehensibility. Mostly since the fact that a polymorphic expression is evaluated at usage rather than location is really strange. Most documentation seems to be wrong about this, everyone here seems agree that this is unfortunate but such is life.
The Monomorphism Restriction
Now let’s talk about the monomorphism restriction. This is better understood but still worth recapping. In Haskell we have type classes. They let us overload function to behave differently on different types. Everyone’s favoriate example is the type class for numbers which let’s us write
    fact :: (Eq a, Num a) => a -> a
    fact 0 = 1
    fact n = n * fact (n - 1)
And this works for all numbers, not just int or something. Under the hood, this works by passing a record of functions like *, fromInteger, and - to make the code work. That => is really just a sort of function arrow that happens to only take particular “implicit” records as an argument.
Now what do you suppose the most polymorphic type this code is?
    x = fact 10000
It could potentially work on all numbers so it gets the type
    x :: (Num a, Eq a) => a
However this is really like a function! This means that fact :: Integer and fact :: Int evaluate that computation twice. In fact each time we call fact we supply a new record and end up evaluating again. This is very costly and also very surprising to most folks. After all, why should something that looks like a normal number evaluate every time we use it! The monomorphism restriction is essentially

If you have a binding
Whose type is (C1, C2 ...) => t
And has no arguments to the left of the =
Don’t generalize it

This is intended to keep us from the surprise of evaluating a seemingly fully reduced term twice.
Sound familiar? Just like with the value restriction the whole point of the monomorphism restriction is to prevent a hidden function, either type abstraction or type constraints, from causing us to silently and dangerously duplicate work. While neither of them are essential to type safety: without it some really simple looking pieces of code become exponential.
Wrap Up
That about covers things. It turns out that both of these restrictions are just patches to cover some surprising areas of the semantics but both are easily understood when you look at the elaborated version. I deliberately went a bit faster through the monomorphism restriction since quite a lot of ink has already been spilled on the subject and unlike the value restriction, most of it is correct :)
As one final note, the way that Haskell handles the monomorphism restriction is precisely how OCaml handles the value restriction: weak polymorphism. Both of them mark the type variables they refuse to generalize as weak type variables. Whenever we first instantiate them to something we go back and retroactively modify the definition to pretend we had used this type all along. In this way, we only evaluate things once but can handle a lot of simple cases of binding a value and using it once.
The more you know.

          
          
          comments powered by Disqus



A Tiny Compiler For A Typed Higher Order Language
Danny Gratzer — Tue, 24 Mar 2015 00:00:00 UT

    Posted on March 24, 2015
    


    
    Tags: compilers, types, haskell
    


Hi folks, the last week or so I was a little tired of schoolwork so I decided to scratch out some fun code. The end result is an extremely small compiler for a typed, higher order functional language called PCF to C. In this post I’ll explain attempt to explain the whole thing, from front to back :)
What’s PCF
First things first, it’s important to define the language we’re compiling. The language, PCF short for “partial computable functions”, is an extremely small language you generally find in a book on programming languages, it originates with Plotkin if I’m not mistaken.
PCF is based around 3 core elements: natural numbers, functions (closures), and general recursion. There are two constants for creating numbers, Zero and Suc. Zero is self explanatory and Suc e is the successor of the natural number e evaluates to. In most programming languages this just means Suc e = 1 + e but + isn’t a primitive in PCF (we can define it as a normal function).
For functions, we have lambdas like you’d find in any functional language. Since PCF includes no polymorphism it’s necessary to annotate the function’s argument with it’s type.
Finally, the weird bit: recursion. In PCF we write recursive things with fix x : τ in e. Here we get to use x in e and we should understand that x “stands for” the whole expression, fix .... As an example, here’s how we define +.
    plus =
          fix rec : nat -> nat -> nat in
            λ m : nat.
            λ n : nat.
              ifz m {
                  Zero  => n
                | Suc x => Suc (rec x n)
              }
Now Let’s Compile It
Now compilation is broken up into a bunch of phases and intermediate languages. Even in this small of a compiler there are 3 (count-em) languages so along with the source and target language there are 5 different languages running around inside of this compiler. Each phase with the exception of typechecking is just translating one intermediate language (IL) into another and in the process making one small modification to the program as a whole.
The AST
This compiler starts with an AST, I have no desire to write a parser for this because parsers make me itchy. Here’s the AST
    data Ty = Arr Ty Ty
            | Nat
            deriving Eq

    data Exp a = V a
               | App (Exp a) (Exp a)
               | Ifz (Exp a) (Exp a) (Scope () Exp a)
               | Lam Ty (Scope () Exp a)
               | Fix Ty (Scope () Exp a)
               | Suc (Exp a)
               | Zero
               deriving (Eq, Functor, Foldable, Traversable)
What’s interesting here is that our AST uses bound to manage variables. Unfortunately there really isn’t time to write both a bound tutorial and a PCF compiler one. I’ve written about using bound before here otherwise you can just check out the official docs. The important bits here are that Scope () ... binds one variable and that a stands for the free variables in an expression. 3 constructs bind variables here, Ifz for pattern matching, Fix for recursive bindings, and Lam for the argument. Note also that Fix and Lam both must be annotated with a type otherwise stuff like fix x in x and fn x => x are ambiguous.
Type Checking
First up is type checking. This should be familiar to most people we’ve written a type checker before since PCF is simply typed. We simply have a Map of variables to types. Since we want to go under binders defined using Scope we’ll have to use instantiate. However this demands we be able to create fresh free variables so we don’t accidentally cause clashes. To prevent this we use monad-gen to generate fresh free variables.
To warm up, here’s a helper function to check that an expression has a particular type. This uses the more general typeCheck function which actually produces the type of an expression.
    type TyM a = MaybeT (Gen a)

    assertTy :: (Enum a, Ord a) => M.Map a Ty -> Exp a -> Ty -> TyM a ()
    assertTy env e t = (== t) <$> typeCheck env e >>= guard
This type checks the variable in an environment (something that stores the types of all of the free variables). Once it receives that it compares it to the type we expected and chucks the resulting boolean into guard. This code is used in places like Ifz where we happen to know that the first expression has the type Nat.
Now on to the main code, typeCheck
    typeCheck :: (Enum a, Ord a) => M.Map a Ty -> Exp a -> TyM a Ty
    typeCheck _   Zero = return Nat
    typeCheck env (Suc e) = assertTy env e Nat >> return Nat
The first two cases for typeCheck are nice and straightforward. All we if we get a Zero then it has type Nat. If we get a Suc e we assert that e is an integer and then the whole thing has the type Nat.
    typeCheck env (V a) = MaybeT . return $ M.lookup a env
For variables we just look things up in the environment. Since this returns a Maybe it’s nice and easy to just jam it into our MaybeT.
    typeCheck env (App f a) = typeCheck env f >>= \case
      Arr fTy tTy -> assertTy env a fTy >> return tTy
      _ -> mzero
Application is a little more interesting. We recurse over the function and make sure it has an actual function type. If it does, we assert the argument has the argument type and return the domain. If it doesn’t have a function type, we just fail.
    typeCheck env (Lam t bind) = do
      v <- gen
      Arr t <$> typeCheck (M.insert v t env) (instantiate1 (V v) bind)
    typeCheck env (Fix t bind) = do
      v <- gen
      assertTy (M.insert v t env) (instantiate1 (V v) bind) t
      return t
Type checking lambdas and fixpoints is quite similar. In both cases we generate a fresh variable to unravel the binder with. We know what type this variable is supposed to have because we required explicit annotations so we add that to the map constituting our environment. Here’s where they diverge.
For a fixpoint we want to make sure that the body has the type as we said it would so we use assertTy. For a lambda we infer the body type and return a function from the given argument type to the body type.
    typeCheck env (Ifz i t e) = do
      assertTy env i Nat
      ty <- typeCheck env t
      v <- gen
      assertTy (M.insert v Nat env) (instantiate1 (V v) e) ty
      return ty
For Ifz we want to ensure that we actually are casing on a Nat so we use assertTy. Next we figure out what type the zero branch returns and make sure that the else branch has the same type.
All in all this type checker is not particularly fascinating since all we have are simple types. Things get a bit more interesting with polymorphism. I’d suggest looking at that if you want to see a more interesting type checker.
Closure Conversion
Now for our first interesting compilation phase, closure conversion. In this phase we make closures explicit by annotating lambdas and fixpoints with the variables that they close over. Those variables are then explicitly bound in the scope of the lambda. With these changes, our new syntax tree looks like this
    -- Invariant, Clos only contains VCs, can't be enforced statically due
    -- to annoying monad instance
    type Clos a = [ExpC a]

    data ExpC a = VC a
                | AppC (ExpC a) (ExpC a)
                | LamC Ty (Clos a) (Scope Int ExpC a)
                | FixC Ty (Clos a) (Scope Int ExpC a)
                | IfzC (ExpC a) (ExpC a) (Scope () ExpC a)
                | SucC (ExpC a)
                | ZeroC
                deriving (Eq, Functor, Foldable, Traversable)
The interesting parts are the additions of Clos and the fact that the Scope for a lambda and a fixpoint now binds an arbitrary number of variables instead of just one. Here if a lambda or fixpoint binds n variables, the first n - 1 are stored in the Clos and the last one is the “argument”. Closure conversion is thus just the process of converting an Exp to an ExpC.
    closConv :: Ord a => Exp a -> Gen a (ExpC a)
    closConv (V a) = return (VC a)
    closConv Zero = return ZeroC
    closConv (Suc e) = SucC <$> closConv e
    closConv (App f a) = AppC <$> closConv f <*> closConv a
    closConv (Ifz i t e) = do
      v <- gen
      e' <- abstract1 v <$> closConv (instantiate1 (V v) e)
      IfzC <$> closConv i <*> closConv t <*> return e'
Most of the cases here are just recursing and building things back up applicatively. There’s the moderately interesting case where we instantiate the else branch of an Ifz with a fresh variable and then recurse, but the interesting cases are for fixpoints and lambdas. Since they’re completely identical we only present the case for Fix.
    closConv (Fix t bind) = do
      v <- gen
      body <- closConv (instantiate1 (V v) bind)
      let freeVars = S.toList . S.delete v $ foldMap S.singleton body
          rebind v' = elemIndex v' freeVars <|>
                      (guard (v' == v) *> (Just $ length freeVars))
      return $ FixC t (map VC freeVars) (abstract rebind body)
There’s a lot going on here but it boils down into three parts.

Recurse under the binder
Gather all the free variables in the body
Rebind the body together so that all the free variables map to their position in the closure and the argument is n where n is the number of free variables.

The first is accomplished in much the same way as in the above cases. To gather the number of free variables all we need to is use the readily available notion of a monoid on sets. The whole process is just foldMap S.singleton! There’s one small catch: we don’t want to put the argument into the list of variables we close over so we carefully delete it from the closure. We then convert it to a list which gives us an actual Clos a. Now for the third step we have rebind.
rebind maps a free variable to Maybe Int. It maps a free variable to it’s binding occurrence it has one here. This boils down to using elemIndex to look up somethings position in the Clos we just built up. We also have a special case for when the variable we’re looking at is the “argument” of the function we’re fixing. In this case we want to map it to the last thing we’re binding, which is just length n. To capture the “try this and then that” semantics we use the alternative instance for Maybe which works wonderfully.
With this, we’ve removed implicit closures from our language: one of the passes on our way to C.
Lambda Lifting
Next up we remove both fixpoints and lambdas from being expressions. We want them to have an explicit binding occurrence because we plan to completely remove them from expressions soon. In order to do this, we define a language with lambdas and fixpoints explicitly declared in let expressions. The process of converting from ExpC to this new language is called “lambda lifting” because we’re lifting things into let bindings.
Here’s our new language.
    data BindL a = RecL Ty [ExpL a] (Scope Int ExpL a)
                 | NRecL Ty [ExpL a] (Scope Int ExpL a)
                 deriving (Eq, Functor, Foldable, Traversable)
    data ExpL a = VL a
                | AppL (ExpL a) (ExpL a)
                | LetL [BindL a] (Scope Int ExpL a)
                | IfzL (ExpL a) (ExpL a) (Scope () ExpL a)
                | SucL (ExpL a)
                | ZeroL
                deriving (Eq, Functor, Foldable, Traversable)
Much here is the same except we’ve romved both lambdas and fixpoints and replaced them with LetL. LetL works over bindings which are either recursive (Fix) or nonrecursive (Lam). Lambda lifting in this compiler is rather simplistic in how it lifts lambdas: we just boost everything one level up and turn
    λ (x : τ). ...
into
    let foo = λ (x : τ). ...
    in foo
Just like before, this procedure is captured by transforming an ExpC into an ExpL.
    llift :: Eq a => ExpC a -> Gen a (ExpL a)
    llift (VC a) = return (VL a)
    llift ZeroC = return ZeroL
    llift (SucC e) = SucL <$> llift e
    llift (AppC f a) = AppL <$> llift f <*> llift a
    llift (IfzC i t e) = do
      v <- gen
      e' <- abstract1 v <$> llift (instantiate1 (VC v) e)
      IfzL <$> llift i <*> llift t <*> return e'
Just like in closConv we start with a lot of very boring and trivial “recurse and build back up” cases. These handle everything but the cases where we actually convert constructs into a LetL.
Once again, the interesting cases are pretty much identical. Let’s look at the case for LamC for variety.
    llift (LamC t clos bind) = do
      vs <- replicateM (length clos + 1) gen
      body <- llift $ instantiate (VC . (!!) vs) bind
      clos' <- mapM llift clos
      let bind' = abstract (flip elemIndex vs) body
      return (LetL [NRecL t clos' bind'] trivLetBody)
Here we first generate a bunch of fresh variables and unbind the body of our lambda. We then recurse on it. We also have to recurse across all of our closed over arguments but since those are variables we know that should be pretty trivial (why do we know this?). Once we’ve straightened out the body and the closure all we do is transform the lambda into a trivial let expression as shown above. Here trivLetBody is.
    trivLetBody :: Scope Int ExpL a
    trivLetBody = fromJust . closed . abstract (const $ Just 0) $ VL ()
Which is just a body that returns the first thing bound in the let. With this done, we’ve pretty much transformed our expression language to C. In order to get rid of the nesting, we want to make one more simplification before we actually generate C.
C-With-Expression
C-With-Expressions is our next intermediate language. It has no notion of nested functions or of fixpoints. I suppose now I should finally fess up to why I keep talking about fixpoints and functions as if they’re the same and why this compiler is handling them identically. The long and short of it is that fixpoints are really a combination of a “fixed point combinator” and a function. Really when we say
    fix x : τ in ...
It’s as if we had sayed
    F (λ x : τ. ...)
Where F is a magical constant with the type
    F :: (a -> a) -> a
F calculates the fixpoint of a function. This means that f (F f) = F f. This formula underlies all recursive bindings (in Haskell too!). In the compiler we basically compile a Fix to a closure (the runtime representation of a function) and pass it to a C function fixedPoint which actually calculates the fixed point. Now it might seem dubious that a function has a fixed point. After all, it would seem that there’s no x so that (λ (x : nat). suc x) = x right? Well the key is to think of these functions as not ranging over just values in our language, but a domain where infinite loops (bottom values) are also represented. In the above equation, the solution is that x should be bottom, an infinite loop. That’s why
    fix x : nat in suc x
should loop! There’s actual some wonderful math going on here about how computable functions are continuous functions over a domain and that we can always calculate the least fixed point of them in this manner. The curious reader is encouraged to check out domain theory.
Anyways, so that’s why I keep handling fixpoints and lambdas in the same way, because to me a fixpoint is a lambda + some magic. This is going to become very clear in C-With-Expressions (FauxC from now on) because we’re going to promote both sorts of let bindings to the same thing, a FauxC toplevel function. Without further ado, here’s the next IL.
    -- Invariant: the Integer part of a FauxCTop is a globally unique
    -- identifier that will be used as a name for that binding.
    type NumArgs = Int
    data BindTy = Int | Clos deriving Eq

    data FauxCTop a = FauxCTop Integer NumArgs (Scope Int FauxC a)
                    deriving (Eq, Functor, Foldable, Traversable)
    data BindFC a = NRecFC Integer [FauxC a]
                  | RecFC BindTy Integer [FauxC a]
                  deriving (Eq, Functor, Foldable, Traversable)
    data FauxC a = VFC a
                 | AppFC (FauxC a) (FauxC a)
                 | IfzFC (FauxC a) (FauxC a) (Scope () FauxC a)
                 | LetFC [BindFC a] (Scope Int FauxC a)
                 | SucFC (FauxC a)
                 | ZeroFC
                 deriving (Eq, Functor, Foldable, Traversable)
The big difference is that we’ve lifted things out of let bindings. They now contain references to some global function instead of actually having the value right there. We also tag fixpoints as either fixing an Int or a Clos. The reasons for this will be apparent in a bit.
Now for the conversion. We don’t just have a function from ExpL to FauxC because we also want to make note of all the nested lets we’re lifting out of the program. Thus we use WriterT to gather a lift of toplevel functions as we traverse the program. Other than that this is much like what we’ve seen before.
    type FauxCM a = WriterT [FauxCTop a] (Gen a)

    fauxc :: ExpL Integer -> FauxCM Integer (FauxC Integer)
    fauxc (VL a) = return (VFC a)
    fauxc (AppL f a) = AppFC <$> fauxc f <*> fauxc a
    fauxc ZeroL = return ZeroFC
    fauxc (SucL e) = SucFC <$> fauxc e
    fauxc (IfzL i t e) = do
      v <- gen
      e' <- abstract1 v <$> fauxc (instantiate1 (VL v) e)
      IfzFC <$> fauxc i <*> fauxc t <*> return e'
In the first couple cases we just recurse. as we’ve seen before. Things only get interesting once we get to LetL
    fauxc (LetL binds e) = do
      binds' <- mapM liftBinds binds
      vs <- replicateM (length binds) gen
      body <- fauxc $ instantiate (VL . (!!) vs) e
      let e' = abstract (flip elemIndex vs) body
      return (LetFC binds' e')
In this case we recurse with the function liftBinds across all the bindings, then do what we’ve done before and unwrap the body of the let and recurse in it. So the meat of this transformation is in liftBinds.
      where liftBinds (NRecL t clos bind) = lifter NRecFC clos bind
            liftBinds (RecL t clos bind) = lifter (RecFC $ bindTy t) clos bind
            lifter bindingConstr clos bind = do
              guid <- gen
              vs <- replicateM (length clos + 1) gen
              body <- fauxc $ instantiate (VL . (!!) vs) bind
              let bind' = abstract (flip elemIndex vs) body
              tell [FauxCTop guid (length clos + 1) bind']
              bindingConstr guid <$> mapM fauxc clos
            bindTy (Arr _ _) = Clos
            bindTy Nat = Int
To lift a binding all we do is generate a globally unique identifier for the toplevel. Once we have that we that we can unwrap the particular binding we’re looking at. This is going to comprise the body of the TopC function we’re building. Since we need it to be FauxC code as well we recurse on it. No we have a bunch of faux-C code for the body of the toplevel function. We then just repackage the body up into a binding (a FauxCTop needs one) and use tell to make a note of it. Once we’ve done that we return the stripped down let binding that just remembers the guid that we created for the toplevel function.
In an example, this code transformers
    let x = λ (x : τ). ... in
      ... x ...
into
    TOP = λ (x : τ). ...
    let x = TOP in
      ... x ...
With this done our language is now 80% of the way to C!
Converting To SSA-ish C
Converting our faux-C language to actual C has one complication: C doesn’t have let expressions. Given this, we have to flatten out a faux-C expression so we can turn a let expression into a normal C declaration. This conversion is almost a conversion to single static assignment form, SSA. I say almost because there’s precisely one place where we break the single assignment discipline. This is just because it seemed rather pointless to me to introduce an SSA IL with φ just so I could compile it to C. YMMV.
This is what LLVM uses for its intermediate language and because of this I strongly suspect regearing this compiler to target LLVM should be pretty trivial.
Now we’re using a library called c-dsl to make generating the C less painful, but there’s still a couple of things we’d like to add. First of all, all our names our integers so we have i2e and i2d for converting an integer into a C declaration or an expression.
    i2d :: Integer -> CDeclr
    i2d = fromString . ('_':) . show

    i2e :: Integer -> CExpr
    i2e = var . fromString . ('_':) . show
We also have a shorthand for the type of all expression in our generated C code.
    taggedTy :: CDeclSpec
    taggedTy = CTypeSpec "tagged_ptr"
Finally, we have our writer monad and helper function for implementing the SSA conversion. We write C99 block items and use tellDecl binding an expression to a fresh variable and then we return this variable.
    type RealCM = WriterT [CBlockItem] (Gen Integer)

    tellDecl :: CExpr -> RealCM CExpr
    tellDecl e = do
      i <- gen
      tell [CBlockDecl $ decl taggedTy (i2d i) $ Just e]
      return (i2e i)
Next we have the conversion procedure. Most of this is pretty straightforward because we shell out to calls in the runtime system for all the hardwork. We have the following RTS functions

mkZero, create a zero value
inc, increment an integer value
dec, decrement an integer value
apply, apply a closure to an argument
mkClos, make a closure with a closing over some values
EMPTY, an empty pointer, useful for default values
isZero, check if something is zero
fixedPoint, find the fixed point of function
INT_SIZE, the size of the runtime representation of an integer
CLOS_SIZE, the size of the runtime representation of a closure

Most of this code is therefore just converting the expression to SSA form and using the RTS functions to shell do the appropriate computation at each step. Note that c-dsl provides a few overloaded string instances and so to generate the C code to apply a function we just use "foo"#[1, "these", "are", "arguments"].
The first few cases for conversion are nice and straightforward.
    realc :: FauxC CExpr -> RealCM CExpr
    realc (VFC e) = return e
    realc (AppFC f a) = ("apply" #) <$> mapM realc [f, a] >>= tellDecl
    realc ZeroFC = tellDecl $ "mkZero" # []
    realc (SucFC e) = realc e >>= tellDecl . ("inc"#) . (:[])
We take advantage of the fact that realc returns it’s result and we can almost make this look like the applicative cases we had before. One particularly slick case is how Suc works. We compute the value of e and apply the result to suc. We then feed this expression into tellDecl which binds it to a fresh variable and returns the variable. Haskell is pretty slick.
    realc (IfzFC i t e) = do
      outi <- realc i
      deci <- tellDecl ("dec" # [outi])
      let e' = instantiate1 (VFC deci) e
      (outt, blockt) <- lift . runWriterT $ (realc t)
      (oute, blocke) <- lift . runWriterT $ (realc e')
      out <- tellDecl "EMPTY"
      let branch b tempOut =
            CCompound [] (b ++ [CBlockStmt . liftE $ out <-- tempOut]) undefNode
          ifStat =
            cifElse ("isZero"#[outi]) (branch blockt outt) (branch blocke oute)
      tell [CBlockStmt ifStat]
      return out
In this next case we’re translating Ifz. For this we obviously need to compute the value of i. We do that by recursing and storing the result in outi. Now we want to be able to use 1 less than the value of i in case we go into the successor branch. This is done by calling dec on outi and storing it for later.
Next we do something a little odd. We recurse on the branches of Ifz but we definitely don’t want to compute both of them! So we can’t just use a normal recursive call. If we did they’d be added to the block we’re building up in the writer monad. So we use lift . runWriterT to give us back the blocks without adding them to the current one we’re building. Now it’s just a matter of generating the appropriate if statement.
To do this we add one instruction to the end of both branches, to assign to some output variable. This ensures that no matter which branch we go down we’ll end up the result in one place. This is also the one place where we are no longer doing SSA. Properly speaking we should write this with a φ but who has time for that? :)
Finally we build add the if statement and the handful of declarations that precede it to our block. Now for the last case.
    realc (LetFC binds bind) = do
      bindings <- mapM goBind binds
      realc $ instantiate (VFC . (bindings !!)) bind
      where sizeOf Int = "INT_SIZE"
            sizeOf Clos = "CLOS_SIZE"
            goBind (NRecFC i cs) =
              ("mkClos" #) <$> (i2e i :) . (fromIntegral (length cs) :)
                           <$> mapM realc cs
                           >>= tellDecl
            goBind (RecFC t i cs) = do
              f <- ("mkClos" #) <$> (i2e i :) . (fromIntegral (length cs) :)
                                <$> mapM realc cs
                                >>= tellDecl
              tellDecl ("fixedPoint"#[f, sizeOf t])
For our last case we have to deal with lets. For this we simply traverse all the bindings which are now flat and then flatten the expression under the binder. When we mapM over the bindings we actually get back a list of all the expressions each binding evaluated to. This is perfect for use with instantiate making the actual toplevel function quite pleasant. goBind is slightly less so.
In the nonrecursive case all we have to do is create a closure. So goBind of a nonrecursive binding shells out to mkClos. This mkClos is applied to the number of closed over expressions as well as all the closed over expressions. This is because mkClos is variadic. Finally we shove the result into tellDecl as usual. For a recursive call there’s a slight difference, namely after doing all of that we apply fixedPoint to the output and to the size of the type of the thing we’re fixing. This is why we kept types around for these bindings! With them we can avoid dragging the size with every value since we know it statically.
Next, we have a function for converting a faux C function into an actual function definition. This is the function that we use realc in.
    topc :: FauxCTop CExpr -> Gen Integer CFunDef
    topc (FauxCTop i numArgs body) = do
      binds <- gen
      let getArg = (!!) (args (i2e binds) numArgs)
      (out, block) <- runWriterT . realc $ instantiate getArg body
      return $
        fun [taggedTy] ('_' : show i) [decl taggedTy $ ptr (i2d binds)] $
          CCompound [] (block ++ [CBlockStmt . creturn $ out]) undefNode
      where indexArg binds i = binds ! fromIntegral i
            args binds na = map (VFC . indexArg binds) [0..na - 1]
This isn’t the most interesting function. We have one array of arguments to our C function, and then we unbind the body of the FauxC function by indexing into this array. It’s not explicitly stated in the code but the array contains the closed over expressions for the first n - 1 entries and the nth is the actual argument to the function. This is inline with how the variables are actually bound in the body of the function which makes unwrapping the body to index into the argument array very simple. We then call realc which transforms our faux-c expression into a block of actual C code. We add one last statement to the end of the block that returns the final outputted variable. All that’s left to do is bind it up into a C function and call it a day.
Putting It All Together
Finally, at the end of it all we have a function from expression to Maybe CTranslUnit, a C program.
    compile :: Exp Integer -> Maybe CTranslUnit
    compile e = runGen . runMaybeT $ do
      assertTy M.empty e Nat
      funs <- lift $ pipe e
      return . transUnit . map export $ funs
      where pipe e = do
              simplified <- closConv e >>= llift
              (main, funs) <- runWriterT $ fauxc simplified
              i <- gen
              let topMain = FauxCTop i 1 (abstract (const Nothing) main)
                  funs' = map (i2e <$>) (funs ++ [topMain])
              (++ [makeCMain i]) <$> mapM topc funs'
            makeCMain entry =
              fun [intTy] "main"[] $ hBlock ["call"#[i2e entry]]
This combines all the previous compilation passes together. First we typecheck and ensure that the program is a Nat. Then we closure convert it and immediately lambda lift. This simplified program is then fed into fauxc giving a fauxc expression for main and a bunch of functions called by main. We wrap up the main expression in a function that ignores all it’s arguments. We then map realc over all of these fauxc functions. This gives us actual C code. Finally, we take on a trivial C main to call the generated code and return the whole thing.
And that’s our PCF compiler.
Wrap Up
Well if you’ve made it this far congratulations. We just went through a full compiler from a typed higher order language to C. Along the way we ended up implementing

A Type Checker
Closure Conversion
Lambda Lifting
Conversion to Faux-C
SSA Conversion

If you’d like to fiddle a bit more, some fun project might be

Writing type checkers for all the intermediate languages. They’re all typeable except perhaps Faux-C
Implement compilation to LLVM instead of C. As I said before, this shouldn’t be awful

Cheers,

          
          
          comments powered by Disqus



Worlds in Twelf
Danny Gratzer — Sat, 07 Mar 2015 00:00:00 UT

    Posted on March  7, 2015
    


    
    Tags: twelf, types
    


In this post I wanted to focus on one particular thing in Twelf: %worlds declarations. They seems to be the most mysterious. I’ve had a couple people tell me that they just blindly stick %worlds () (x _ _ _) before every total and pray which is a little concerning..
In this post hopefully we’ll remove some of the “compile-n-pray” from using Twelf code.
What is %worlds
In Twelf we’re interested in proving theorems. These theorems are basically proven by some chunk of code that looks like this.
    my-cool-tyfam : with -> some -> cool -> args -> type.
    %mode my-cool-tyfam +A +B +C -D.

    some         : ... -> my-cool-tyfam A B C D.
    constructors : ... -> my-cool-tyfam A B C D.

    %worlds (...) (my-cool-tyfam _ _ _ _).
    %total (T) (my-cool-tyfam T _ _ _).
What’s interesting here is the 3 directives we needed

%mode to specify which arguments of the type family are universally quantified and which are existentially qualified in our theorem. This specifies the “direction” of the type family, + arguments are inputs and - arguments are outputs.
%total which actually goes and proves the theorem by induction on the canonical forms of the term in the parens.
%worlds which specifies the set of contexts to check the totality in. Note that a world is simply a set of contexts.

The one we’re interested in talking about here is %worlds. Everything we want to call %total has to have on of these and as mentioned above it specifies the contexts to check the theorem in. Remember that total is proven by induction over the canonical forms. One of the canonical forms for every type is off the form

For some x : ty ∈ Γ, then x is a canonical form of ty.

This is a little different than in other languages. We could usually just invert upon something in the context. That’s not the case in Twelf, we have to handle variables parametrically (this is critical to admitting HOAS and similar). This means that means we have to extremely careful about what’s in Γ lest we accidentally introduce something canonical form of ty without any additional information about it. The worlds specification tells us about the forms Γ can take. Twelf allows us to specify sets of contexts that are “regular”.
So for example remember how plus might be defined.
    plus : nat -> nat -> nat -> type.
    %mode plus +N +M -P.

    plus/z : plus z N N.
    plus/s : plus N M P -> plus (s N) M (s P).
This is total in the empty context. If we added some b : nat to our context then we have no way of showing it is either a s or a z! This means that there’s a missing case for variables of type nat in our code. In order to exclude this impossible case we just assert that we only care about plus’s totality in the empty context. This is what the %worlds specification for plus stipulates
    %worlds () (plus _ _ _).
should be read as “plus should only be considered in the empty context” so the only canonical forms of plus are those specified as constants in our signature. This sort of specification is what we want for most vanilla uses of Twelf.
For most cases we want to be proving theorems in the empty context because we do nothing to extend the context in our constructors. That’s not to say that we can’t specify some nonempty world. We can specify a world where there is a b : nat, but if such a b must appear we have a derivation {a} plus b a z. This way when Twelf goes to check the canonical forms case for something in our context, b : nat, it knows that there’s a derivation that precisely matches what we need. I’ll circle back to this in a second, but first we have to talk about how to specify fancier worlds.
%block and Fancier Worlds
In Twelf there’s some special syntax for specifying worlds. Basically we can specify a template for some part of the world, called a block. A world declaration is just a conglomeration of blocks and Twelf will interpret this as a world of contexts in which each block may appear zero or more times.
In Twelf code we specify a block with the following syntax
    %block block_name : block {a : ty} ... {b : ty'}.
This specifies that if there is an a : ty in the context, it’s going to be accompanied by a bunch of other stuff including a b : ty'. Some blocks are pretty trivial. For example, if we wanted to allow plus to be defined in a context with some a : nat in the context we might say
    %block random_nat : block {b : nat}.
    %worlds (random_nat) (plus _ _ _).
This doesn’t work though. If we ask Twelf to check totality it’ll get angry and say
Coverage error --- missing cases:
{#random_nat:{b:nat}} {X1:nat} {X2:nat} |- plus #random_nat_b X1 X2.
In human,

You awful person Danny! You’re missing the case where you have to random integers and the random natural number b from the random_nat block and we want to compute plus b X X'.

Now there are a few things to do here. The saner person would probably just say “Oh, I clearly don’t want to try to prove this theorem in a nonempty context”. Or we can wildly add things to our context in order to patch this hole. In this case, we need some proof that about adding b to other stuff. Let’s supplement our block
    %block random_nat : block {b : nat}{_:{a} plus b a z}
Such a context is pretty idiotic though since there isn’t a natural number that can satisfy it. It is however enough to sate the totality checker.
    %total (T) (plus T _ _).
For a non contrived for example let’s discuss where interesting worlds come into play: with higher order abstract syntax. When we use HOAS we end up embedding the LF function space in our terms. This is important because it means as we go to prove theorems about it we end up recursing on a term under an honest to goodness LF lambda. This means we extend the context at some points in our proof and we can’t just prove theorems in the empty context!
To see this in action here’s an embedding of the untyped lambda calculus in LF
    term : type.
    lam  : (term -> term) -> term.
    app  : term -> term -> term.
Now let’s say we want to determine how many binders are in a lambda term. We start by defining our relation
    nbinds : term -> nat -> type.
    %mode nbinds +T -N.
We set this type family up so that it has one input (the term) and one output (a nat representing the number of binders). We have two cases to deal with here
    nbinds/lam : nbinds (lam F) (s N)
                  <- ({x : term} nbinds (F x) N).
    nbinds/app : nbinds (app F A) O
                  <- nbinds F N1
                  <- nbinds A N2
                  <- plus N1 N2 O.
In the lam case we recurse under the binder. This is the interesting thing here, we stick the recurse call under a pi binder. This gives us access to some term x which we apply the LF function two. This code in effect says "If for all terms F has N binders then lam F has N + 1 binders. The app case just sums the two binders.
We can try to world check this in only the empty context but this fails with
Error:
While checking constant nbinds/lam:
World violation for family nbinds: {x:term} 

This says that even though we promised never to extend the LF context we did just that! To fix this we must have a fancier world. We create a block which just talks about adding a term to the context.
    %block nbinds_block : block {x : term}.
    %worlds (nbinds_block) (nbinds _ _).
This world checks but there’s another issue lurking about. Let’s try to ask Twelf to prove totality.
    %total (T) (nbinds T _).
This spits out the error message
Coverage error --- missing cases:
{#nbinds_block:{x:term}} {X1:nat} |- nbinds #nbinds_block_x X1.
This is the same error as before! Now that we’ve extended our context with a term we need to somehow be able to tell Twelf the height of that term. This smacks of the slightly fishy type of nbinds/lam: it’s meaning is that F x has the height N for any term x. This seems a little odd, why doesn’t the height of a functions body depend on its argument? We really ought to be specifying that whatever this x is, we know its height is z. This makes our new code
    nbinds/lam : nbinds (lam F) (s N)
                 <- ({x : term}{_ : nbinds x z} nbinds (F x) N).
Now we specify that the height of x is zero. This means we have to change our block to
    %block nbinds_block : block {x : term}{_ : nbinds x z}.
With this modification else everything goes through unmodified. For fun, we can ask Twelf to actually compute some toy examples.
    %solve deriv : nbinds (lam ([x] (lam [y] x))) N.
This gives back that deriv : nbinds (lam ([x] lam ([y] x))) (s (s z)) as we’d hope. It’s always fun to run our proofs.
Conclusion
Hopefully that clears up some of the mystery of worlds in Twelf. Happily this doesn’t come up for a lot of simple uses of Twelf. As far as I know the entire constructive logic course at CMU sidesteps the issue with a quick “Stick %worlds () (...) before each totality check”.
It is completely invaluable if you’re doing anything under binders which turns out to be necessary for most interesting proofs about languages with binders. If nothing else, the more you know..
Those who enjoyed this post might profit from Dan Licata and Bob Harper’s paper on mechanizing metatheory.
Cheers,

          
          
          comments powered by Disqus



An Explanation of Type Inference for ML/Haskell
Danny Gratzer — Sat, 28 Feb 2015 00:00:00 UT

    Posted on February 28, 2015
    


    
    Tags: sml, haskell, types
    


A couple of days ago I wrote a small implementation of a type inferencer for a mini ML language. It turns out there are very few explanations of how to do this properly and the ones that exist tend to be the really naive, super exponential algorithm. I wrote the algorithm in SML but nothing should be unfamiliar to the average Haskeller.
Type inference breaks down into essentially 2 components

Constraint Generation
Unification

We inspect the program we’re trying to infer a type for and generate a bunch of statements (constraints) which are of the form

This type is equal to this type

These types have “unification variables” in them. These aren’t normal ML/Haskell type variables. They’re generated by the compiler, for the compiler, and will eventually be filled in with either

A rigid polymorphic variable
A normal concrete type

They should be thought of as holes in an otherwise normal type. For example, if we’re looking at the expression
   f a
We first just say that f : 'f where 'f is one of those unification variables I mentioned. Next we say that a : 'a. Since we’re apply f to a we can generate the constraints that
'f ~ 'x -> 'y
'a ~ 'x
Since we can only apply things with of the form _ -> _. We then unify these constraints to produce f : 'a -> 'x and a : 'a. We’d then using the surrounding constraints to produce more information about what exactly 'a and 'x might be. If this was all the constraints we had we’d then “generalize” 'a and 'x to be normal type variables, making our expression have the type x where f : a -> x and a : a.
Now onto some specifics
Set Up
In order to actually talk about type inference we first have to define our language. We have the abstract syntax tree:
    type tvar = int
    local val freshSource = ref 0 in
    fun fresh () : tvar =
        !freshSource before freshSource := !freshSource + 1
    end


    datatype monotype = TBool
                      | TArr of monotype * monotype
                      | TVar of tvar
    datatype polytype = PolyType of int list * monotype

    datatype exp = True
                 | False
                 | Var of int
                 | App of exp * exp
                 | Let of exp * exp
                 | Fn of exp
                 | If of exp * exp * exp
First we have type variables which are globally unique integers. To give us a method for actually producing them we have fresh which uses a ref-cell to never return the same result twice. This is probably surprising to Haskellers: SML isn’t purely functional and frankly this is less noisy than using something like monad-gen.
From there we have mono-types. These are normal ML types without any polymorphism. There are type/unification variables, booleans, and functions. Polytypes are just monotypes with an extra forall at the front. This is where we get polymorphism from. A polytype binds a number of type variables, stored in this representation as an int list. There is one ambiguity here, when looking at a variable it’s not clear whether it’s supposed to be a type variable (bound in a forall) and a unification variable. The idea is that we never ever inspect a type bound under a forall except when we’re converting it to a monotype with fresh unification variables in place of all of the bound variables. Thus, when inferring a type, every variable we come across is a unification variable.
Finally, we have expressions. Aside form the normal constants, we have variables, lambdas, applications, and if. The way we represent variables here is with DeBruijn variables. A variable is a number that tells you how many binders are between it and where it was bound. For example, const would be written Fn (Fn (Var 1)) in this representation.
With this in mind we define some helpful utility functions. When type checking, we have a context full of information. The two facts we know are
    datatype info = PolyTypeVar of polytype
                  | MonoTypeVar of monotype

    type context = info list
Where the ith element of a context indicates the piece of information we know about the ith DeBruijn variable. We’ll also need to substitute a type variable for a type. We also want to be able to find out all the free variables in a type.
    fun subst ty' var ty =
        case ty of
            TVar var' => if var = var' then ty' else TVar var'
          | TArr (l, r) => TArr (subst ty' var l, subst ty' var r)
          | TBool => TBool

    fun freeVars t =
        case t of
            TVar v => [v]
          | TArr (l, r) => freeVars l @ freeVars r
          | TBool => []
Both of these functions just recurse over types and do some work at the variable case. Note that freeVars can contain duplicates, this turns out not to be important in all cases except one: generalizeMonoType. The basic idea is that given a monotype with a bunch of unification variables and a surrounding context, figure out which variables can be bound up in a polymorphic type. If they don’t appear in the surrounding context, we generalize them by binding them in a new poly type’s forall spot.
    fun dedup [] = []
      | dedup (x :: xs) =
        if List.exists (fn y => x = y) xs
        then dedup xs
        else x :: dedup xs

    fun generalizeMonoType ctx ty =
        let fun notMem xs x = List.all (fn y => x <> y) xs
            fun free (MonoTypeVar m) = freeVars m
              | free (PolyTypeVar (PolyType (bs, m))) =
                List.filter (notMem bs) (freeVars m)

            val ctxVars = List.concat (List.map free ctx)
            val polyVars = List.filter (notMem ctxVars) (freeVars ty)
        in PolyType (dedup polyVars, ty) end
Here the bulk of the code is deciding whether or not a variable is free in the surrounding context using free. It looks at a piece of info to determine what variables occur in it. We then accumulate all of these variables into cxtVars and use this list to decide what to generalize.
Next we need to take a polytype to a monotype. This is the specialization of a polymorphic type that we love and use when we use map on a function from int -> double. This works by taking each bound variable and replacing it with a fresh unification variables. This is nicely handled by folds!
    fun mintNewMonoType (PolyType (ls, ty)) =
        foldl (fn (v, t) => subst (TVar (fresh ())) v t) ty ls
Last but not least, we have a function to take a context and a variable and give us a monotype which corresponds to it. This may produce a new monotype if we think the variable has a polytype.
    exception UnboundVar of int
    fun lookupVar var ctx =
        case List.nth (ctx, var) handle Subscript => raise UnboundVar var of
            PolyTypeVar pty => mintNewMonoType pty
          | MonoTypeVar mty => mty
For the sake of nice error messages, we also throw UnboundVar instead of just subscript in the error case. Now that we’ve gone through all of the utility functions, on to unification!
Unification
A large part of this program is basically “I’ll give you a list of constraints and you give me the solution”. The program to solve these proceeds by pattern matching on the constraints.
In the empty case, we have no constraints so we give back the empty solution.
    fun unify [] = []
In the next case we actually have to look at what constraint we’re trying to solve.
      | unify (c :: constrs) =
        case c of
If we’re lucky, we’re just trying to unify TBool with TBool, this does nothing since these types have no variables and are equal. In this case we just recurse.
       (TBool, TBool) => unify constrs
If we’ve got two function types, we just constrain their domains and ranges to be the same and continue on unifying things.
     | (TArr (l, r), TArr (l', r')) => unify ((l, l') :: (r, r') :: constrs)
Now we have to deal with finding a variable. We definitely want to avoid adding (TVar v, TVar v) to our solution, so we’ll have a special case for trying to unify two variables.
     | (TVar i, TVar j) =>
       if i = j
       then unify constrs
       else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
This is our first time actually adding something to our solution so there’s several new elements here. The first is this function addSol. It’s defined as
    fun addSol v ty sol = (v, applySol sol ty) :: sol
So in order to make sure our solution is internally consistent it’s important that whenever we add a type to our solution we first apply the solution to it. This ensures that we can substitute a variable in our solution for its corresponding type and not worry about whether we need to do something further. Additionally, whenever we add a new binding we substitute for it in the constraints we have left to ensure we never have a solution which is just inconsistent. This prevents us from unifying v ~ TBool and v ~ TArr(TBool, TBool) in the same solution! The actual code for doing this is that substConstr (TVar j) i constrs bit.
The next case is the general case for unifying a variable with some type. It looks very similar to this one.
     | ((TVar i, ty) | (ty, TVar i)) =>
       if occursIn i ty
       then raise UnificationError c
       else addSol i ty (unify (substConstrs ty i constrs))
Here we have the critical occursIn check. This checks to see if a variable appears in a type and prevents us from making erroneous unifications like TVar a ~ TArr (TVar a, TVar a). This occurs check is actually very easy to implement
    fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)
Finally we have one last case: the failure case. This is the catch-all case for if we try to unify two things that are obviously incompatible.
     | _ => raise UnificationError c
All together, that code was
    fun applySol sol ty =
        foldl (fn ((v, ty), ty') => subst ty v ty') ty sol
    fun applySolCxt sol cxt =
        let fun applyInfo i =
                case i of
                    PolyTypeVar (PolyType (bs, m)) =>
                    PolyTypeVar (PolyType (bs, (applySol sol m)))
                  | MonoTypeVar m => MonoTypeVar (applySol sol m)
        in map applyInfo cxt end

    fun addSol v ty sol = (v, applySol sol ty) :: sol

    fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)

    fun unify ([] : constr list) : sol = []
      | unify (c :: constrs) =
        case c of
            (TBool, TBool) => unify constrs
          | (TVar i, TVar j) =>
            if i = j
            then unify constrs
            else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
          | ((TVar i, ty) | (ty, TVar i)) =>
            if occursIn i ty
            then raise UnificationError c
            else addSol i ty (unify (substConstrs ty i constrs))
          | (TArr (l, r), TArr (l', r')) =>
            unify ((l, l') :: (r, r') :: constrs)
          | _ => raise UnificationError c
Constraint Generation
The other half of this algorithm is the constraint generation part. We generate constraints and use unify to turn them into solutions. This boils down to two functoins. The first is to glue together solutions.
    fun <+> (sol1, sol2) =
        let fun notInSol2 v = List.all (fn (v', _) => v <> v') sol2
            val sol1' = List.filter (fn (v, _) => notInSol2 v) sol1
        in
            map (fn (v, ty) => (v, applySol sol1 ty)) sol2 @ sol1'
        end
    infixr 3 <+>
Given two solutions we figure out which things don’t occur in the in the second solution. Next, we apply solution 1 everywhere in the second solution, giving a consistent solution wihch contains everything in sol2, finally we add in all the stuff not in sol2 but in sol1. This doesn’t check to make sure that the solutions are actually consistent, this is done elsewhere.
Next is the main function here constrain. This actually generates solution and type given a context and an expression. The first few cases are nice and simple
    fun constrain ctx True = (TBool, [])
      | constrain ctx False = (TBool, [])
      | constrain ctx (Var i) = (lookupVar i ctx, [])
In these cases we don’t infer any constraints, we just figure out types based on information we know previously. Next for Fn we generate a fresh variable to represent the arguments type and just constrain the body.
      | constrain ctx (Fn body) =
        let val argTy = TVar (fresh ())
            val (rTy, sol) = constrain (MonoTypeVar argTy :: ctx) body
        in (TArr (applySol sol argTy, rTy), sol) end
Once we have the solution for the body, we apply it to the argument type which might replace it with a concrete type if the constraints we inferred for the body demand it. For If we do something similar except we add a few constraints of our own to solve.
      | constrain ctx (If (i, t, e)) =
        let val (iTy, sol1) = constrain ctx i
            val (tTy, sol2) = constrain (applySolCxt sol1 ctx) t
            val (eTy, sol3) = constrain (applySolCxt (sol1 <+> sol2) ctx) e
            val sol = sol1 <+> sol2 <+> sol3
            val sol = sol <+> unify [ (applySol sol iTy, TBool)
                                    , (applySol sol tTy, applySol sol eTy)]
        in
            (tTy, sol)
        end
Notice how we apply each solution to the context for the next thing we’re constraining. This is how we ensure that each solution will be consistent. Once we’ve generated solutions to the constraints in each of the subterms, we smash them together to produce the first solution. Next, we ensure that the subcomponents have the right type by generating a few constraints to ensure that iTy is a bool and that tTy and eTy (the types of the branches) are both the same. We have to carefully apply the sol to each of these prior to unifying them to make sure our solution stays consistent.
This is practically the same as what the App case is
      | constrain ctx (App (l, r)) =
        let val (domTy, ranTy) = (TVar (fresh ()), TVar (fresh ()))
            val (funTy, sol1) = constrain ctx l
            val (argTy, sol2) = constrain (applySolCxt sol1 ctx) r
            val sol = sol1 <+> sol2
            val sol = sol <+> unify [(applySol sol funTy,
                                      applySol sol (TArr (domTy, ranTy)))
                                    , (applySol sol argTy, applySol sol domTy)]
        in (ranTy, sol) end
The only real difference here is that we generate different constraints: we make sure we’re applying a function whose domain is the same as the argument type.
The most interesting case here is Let. This implements let generalization which is how we actually get polymorphism. After inferring the type of the thing we’re binding we generalize it, giving us a poly type to use in the body of let. The key to generalizing it is that generalizeMonoType we had before.
      | constrain ctx (Let (e, body)) =
        let val (eTy, sol1) = constrain ctx e
            val ctx' = applySolCxt sol1 ctx
            val eTy' = generalizeMonoType ctx' (applySol sol1 eTy)
            val (rTy, sol2) = constrain (PolyTypeVar eTy' :: ctx') body
        in (rTy, sol1 <+> sol2) end
We do pretty much everything we had before except now we carefully ensure to apply the solution we get for the body to the context and then to generalize the type with respect to that new context. This is how we actually get polymorphism, it will assign a proper polymorphic type to the argument.
That wraps up constraint generation. Now all that’s left to see if the overall driver for type inference.
    fun infer e =
        let val (ty, sol) = constrain [] e
        in generalizeMonoType [] (applySol sol ty) end
    end
So all we do is infer and generalize a type! And there you have it, that’s how ML and Haskell do type inference.
Wrap Up
Hopefully that clears up a little of the magic of how type inference works. The next challenge is to figure out how to do type inference on a language with patterns and ADTs! This is actually quite fun, pattern checking involves synthesizing a type from a pattern which needs something like linear logic to handle pattern variables correctly.
With this we’re actually a solid 70% of the way to building a type checker to SML. Until I have more free time though, I leave this as an exercise to the curious reader.
Cheers,

          
          
          comments powered by Disqus



A Twelf Introduction
Danny Gratzer — Sat, 28 Feb 2015 00:00:00 UT

    Posted on February 28, 2015
    


    
    Tags: twelf, types
    


For the last 3 or so weeks I’ve been writing a bunch of Twelf code for my research (hence my flat-lined github punch card). Since it’s actually a lot of fun I thought I’d share a bit about Twelf.
What Is Twelf
Since Twelf isn’t a terribly well known language it’s worth stating what exactly it is we’re talking about. Twelf is a proof assistant. It’s based on a logic called LF (similarly to how Coq is based on CiC).
Twelf is less powerful than some other proof assistants but by limiting some of its power it’s wonderfully suited to proving certain types of theorems. In particular, Twelf admits true “higher order abstract syntax” (don’t worry if you don’t know what this means) this makes it great for formalizing programming languages with variable bindings.
In short, Twelf is a proof assistant which is very well suited for defining and proving things about programming languages.
Getting Twelf
It’s much more fun to follow along a tutorial if you actually have a Twelf installation to try out the code. You can download and compile the sources to Twelf with SML/NJ or Mlton. You could also use smackage to get the compiler.
Once you’ve compiled the thing you should be left with a binary twelf-server. This is your primary way of interacting with the Twelf system. There’s quite a slick Emacs interface to smooth over this process. If you’ve installed twelf into a directory ~/twelf/ all you need is the incantation
    (setq twelf-root "~/twelf/")
    (load (concat twelf-root "emacs/twelf-init.el"))
Without further ado, let’s look at some Twelf code.
Some Code
When writing Twelf code we encode the thing that we’re studying, the object language, as a bunch of type families and constructors in Twelf. This means that when we edit a Twelf file we’re just writing signatures.
For example, if we want to encode natural numbers we’d write something like
    nat : type.
    z   : nat.
    s   : nat -> nat.
This is an LF signature, we declare a series of constants with NAME : TYPE.. Note the period at the end of each declaration. First we start by declaring a type for natural numbers called nat with nat : type. Here type is the base kind of all types in Twelf. Next we go on to declare what the values of type nat are.
In this case there are two constructors for nat. We either have zero, z, or the successor of another value of type nat, s. This gives us a canonical forms lemma for natural numbers: All values of type nat are either

z
s N for some value N : nat

Later on, we’ll justify the proofs we write with this lemma.
Anyways, now that we’ve encoded the natural numbers I wanted to point out a common point of confusion about Twelf. We’re not writing programs to be run. We’re writing programs exclusively for the purpose of typechecking. Heck, we’re not even writing programs at the term level! We’re just writing a bunch of constants out with their types! More than this even, Twelf is defined so that you can only write canonical forms. This means that if you write something in your program, it has to be in normal form, fully applied! In PL speak it has to be β-normal and η-long. This precludes actually writing programs for the sake of reducing them. You’re never going to write a web server in Twelf, you even be writing “Hello World”. You might use it to verify the language your writing them in though.
Now that we’ve gotten the awkward bit out the way, let’s now define a Twelf encoding of a judgment. We want to encode the judgment + which is given by the following rules
—————————
z + n = n

   m + n = p
———————————————
s(m) + n = s(p)
In the rest of the world we have this idea that propositions are types. In twelf, we’re worried about defining logics and systems, so we have the metatheoretic equivalent: judgments are types.
So we define a type family plus.
    plus : nat -> nat -> nat -> type
So plus is a type indexed over 3 natural numbers. This is our first example of dependent types: plus is a type which depends on 3 terms. Now we can list out how to construct a derivation of plus. This means that inference rules in a meta theory corresponds to constants in Twelf as well.
    plus/z : {n : nat} plus z n n
This is some new syntax, in Twelf {NAME : TYPE} TYPE is a dependent function type, a pi type. This notation is awfully similar to Agda and Idris if you’re familiar with them. This means that this constructor takes a natural number, n and returns a derivation that plus z n n. The fact that the return type depends on what nat we supply is why this needs a dependent type.
In fact, this is such a common pattern that Twelf has sugar for it. If we write an unbound capital variable name Twelf will automagically introduce a binder {N : ...} at the front of our type. We can thus write our inference rules as
    plus/z : plus z N N
    plus/s : plus N M P -> plus (s N) M (s P)
These rules together with our declaration of plus. In fact, there’s something kinda special about these two rules. We know that for any term n : nat which is in canonical form, there should be an applicable rule. In Twelf speak, we say that this type family is total.
We can ask Twelf to check this fact for us by saying
    plus : nat -> nat -> nat -> type.
    %mode plus +N +M -P.

    plus/z : plus z N N.
    plus/s : plus N M P -> plus (s N) M (s P).

    %worlds () (plus _ _ _).
    %total (N) (plus N _ _).
We want to show that for all terms n, m : nat in canonical form, there is a term p in canonical form so that plus n m p. This sort of theorem is what we’d call a ∀∃-theorem. This is literally because it’s a theorem of the form “∀ something. ∃ something. so that something”. These are the sort of thing that Twelf can help us prove.
Here’s the workflow for writing one of these proofs in Twelf

Write out the type family
Write out a %mode specification to say what is bound in the ∀ and what is bound in the ∃.
Write out the different constants in our type family
Specify the context to check our proof in with %worlds, usually we want to say the empty context, ()
Ask Twelf to check that we’ve created a proof according to the mode with %total where the N specifies what to induct on.

In our case we have a case for each canonical form of nat so our type family is total. This means that our theorem passes. Hurray!
Believe it or not this is what life is like in Twelf land. All the code I’ve written these last couple of weeks is literally type signatures and 5 occurrences of %total. What’s kind of fun is how unreasonably effective a system this is for proving things.
Let’s wrap things up by proving one last theorem, if plus A B N and plus A B M both have derivations, then we should be able to show that M and N are the same. Let’s start by defining what it means for two natural numbers to be the same.
    nat-eq : nat -> nat -> type.
    nat-eq/r : nat-eq N N.
    nat-eq/s : nat-eq N M -> nat-eq (s N) (s M).
I’ve purposefully defined this so it’s amenable to our proof, but it’s still a believable formulation of equality. It’s reflexive and if N is equal to M, then s N is equal to s M. Now we can actually state our proof.
    plus-fun : plus N M P -> plus N M P' -> nat-eq P P' -> type.
    %mode plus-fun +A +B -C.
Our theorem says if you give us two derivations of plus with the same arguments, we can prove that the outputs are equal. There are two cases we have to cover for our induction so there are two constructors for this type family.
    plus-fun/z : plus-fun plus/z plus/z nat-eq/r.
    plus-fun/s : plus-fun (plus/s L) (plus/s R) (nat-eq/s E)
                  <- plus-fun L R E.
A bit of syntactic sugar here, I used the backwards arrow which is identical to the normal -> except its arguments are flipped. Finally, we ask Twelf to check that we’ve actually proven something here.
    %worlds () (plus-fun _ _ _).
    %total (P) (plus-fun P _ _).
And there you have it, some actual theorem we’ve mechanically checked using Twelf.
Wrap Up
I wanted to keep this short, so now that we’ve covered Twelf basics I’ll just refer you to one of the more extensive tutorials. You may be interested in

Proving Metatheorems with Twelf
The OPLSS course on Twelf

If you’re interested in learning a bit more about the nice mathematical foundations for LF you should check out “The LF Paper”.

          
          
          comments powered by Disqus



Notes on Proof Theory: Part 1
Danny Gratzer — Wed, 11 Feb 2015 00:00:00 UT

    Posted on February 11, 2015
    


    
    Tags: types
    


I write a lot about types. Up until now however, I’ve only made passing references to the thing I’ve actually been studying in most of my free time lately: proof theory. Now I have a good reason for this: the proof theory I’m interested in is undeniably intertwined with type theory and computer science as a whole. In fact, you occasionally see someone draw the triangle
           Type Theory
          /           \
         /             \
 Proof Theory ---- Category Theory
Which nicely summarizes the lay of the land in the world I’m interested in. People will often pick up something will understood on one corner of the triangle and drag it off to another, producing a flurry of new ideas and papers. It’s all very exciting and leads to really cool stuff. I think the most talked about example lately is homotopy type theory which drags a mathematical structure (weak infinite groupoids) and hoists off to type theory!
If you read the [unprofessional, mostly incorrect, and entirely more fun to read] blog posts on these subjects you’ll find most of the lip service is paid to category theory and type theory with poor proof theory shunted off to the side.
In this post, I’d like to jot down my notes on Frank Pfenning’s introduction to proof theory materials to change that in some small way.
What is Proof Theory
The obvious question is just “What is proof theory?”. The answer is that proof theory is the study of proofs. In this world we study proofs as first class mathematical objects which we prove interesting things about. This is the branch of math that formalizes our handwavy notion of a proof into a precise object governed by rules.
We can then prove things like "Given a proof that Γ ⊢ A and another derivation of Γ, A ⊢ B, then we can produce a derivation of Γ ⊢ B. Such a theorem is utterly crazy unless we can formalize what it means to derive something.
From this we grow beautiful little sets of rules and construct derivations with them. Later, we can drag these derivations off to type theory and use them to model all sorts of wonderful phenomena. My most recent personal example was when folks noticed that the rules for modal logic perfectly capture what the semantics of static pointers ought to be.
So in short, proof theory is devoted to answering that question that every single one of your math classes dodged

Professor, what exactly is a proof?

Basic Building Blocks
In every logic that we’ll study we’ll keep circling back to two core objects: judgments and propositions. The best explanation of judgments I’ve read comes from Frank Pfenning

A judgment is something we may know, that is, an object of knowledge. A judgment is evident if we in fact know it.

So judgments are the things we’ll structure our logic around. You’ve definitely heard of one judgment: A true. This judgment signifies whether or not some proposition A is true. Judgments can be much fancier though: we might have a whole bunch of judgments like n even, A possible or A resource.
These judgments act across various syntactic objects. In particular, from our point of view we’ll understand the meaning of a proposition by the ways we can prove it, that is the proofs that A true is evident.
We prove a judgment J through inference rules. An inference rule takes the form
J₁ J₂ .... Jₓ
—————————————
     J
Which should be read as “When J₁, J₂ … and Jₓ hold, then so does J”. Here the things above the line are premises and the ones below are conclusions. What we’ll do is define a bunch of these inference rules and use them to construct proofs of judgments. For example, we might have the inference rules
             n even
 ——————    ————————————
 0 even    S(S(n)) even
for the judgment n even. We can then form proofs to show that n even holds for some particular n.
       ——————
       0 even
    ————————————
    S(S(0)) even
 ——————————————————
 S(S(S(S(0)))) even
This tree for example is evidence that 4 even holds. We apply second inference rule to S(S(S(S(0)))) first. This leaves us with one premise to show, S(S(0)) even. For this we repeat the process and end up with the new premise that 0 even. For this we can apply the first inference rule which has no premises completing our proof.
One judgment we’ll often see is A prop. It simply says that A is a well formed proposition, not necessarily true but syntactically well formed. This judgment is defined inductively over the structure of A. An example judgment would be
A prop  B prop
——————————————
  A ∧ B prop
Which says that A ∧ B (A and B) is a well formed proposition if and only if A and B are! We can imagine a whole bunch of these rules
                A prop B prop
——————  ——————  ————————————— ...
⊤ prop  ⊥ prop    A ∨ B prop
that lay out the propositions of our logic. This doesn’t yet tell us how prove any of these propositions to be true, but it’s a start. After we formally specify what sentences are propositions in our logic we need to discuss how to prove that one is true. We do this with a different judgment A true which is once again defined inductively.
For example, we might want to give meaning to the proposition A ∧ B. To do this we define its meaning through the inference rules for proving that A ∧ B true. In this case, we have the rule
A true  B true
—————————————— (∧ I)
  A ∧ B true
I claim that this defines the meaning of ∧: to prove a conjunction to be true we must prove its left and right halves. The rather proof-theoretic thing we’ve done here is said that the meaning of something is what we use to prove it. This is sometimes called the “verificationist perspective”. Finally, note that I annotated this rule with the name ∧ I simply for convenience to refer it.
Now that we know what A ∧ B means, what does have a proof of it imply? Well we should be able to “get out what we put in” which would mean we’d have two inference rules
A ∧ B true    A ∧ B true
——————————    ——————————
  A true        B true
We’ll refer to these rules as ∧ E1 and ∧ E2 respectively.
Now for a bit of terminology, rules that let us “introduce” new proofs of propositions are introduction rules. Once we have a proof, we can use it to construct other proofs. The rules for how we do that are called elimination rules. That’s why I’ve been adding I’s and E’s to the ends of our rule names.
How do we convince ourselves that these rules are correct with respect to our understanding of ∧? This question leads us to our first sort of proofs-about-proofs we’ll make.
Local Soundness and Completeness
What we want to say is that the introduction and elimination rules match up. This should mean that anytime we prove something with an by an introduction rule followed by an elimination rule, we should be able to rewrite to avoid this duplication. This also hints that the rules aren’t too powerful: we can’t prove anything with the elimination rules that we didn’t have a proof for at some point already.
For ∧ this proof looks like this
  D  E
  –  –
  A  B            D
 —————— ∧I   ⇒  ————
  A ∧ B           A
 —————— ∧E 1
    A
So whenever we introduce a ∧ and then eliminate it with ∧ E1 we can always rewrite our proof to not use the elimination rules. Here notice that D and E range over derivations in this proof. They represent a chain of rule applications that let us produce an A or B in the end. Note I got a bit lazy and started omitting the true judgments, this is something I’ll do a lot since it’s mostly unambiguous.
The proof for ∧E2 is similar.
  D  E
  –  –
  A  B            E
  ————— ∧I   ⇒  ————
  A ∧ B           B
  ————— ∧E 2
    B
Given this we say that the elimination rules for ∧ are “locally sound”. That is, when used immediately after an elimination rule they don’t let us produce anything truly new.
Next we want to show that if we have a proof of A ∧ B, the elimination rules give us enough information that we can pick the proof apart and produce a reassembled A ∧ B.
           D           D
         ————–       ————–
  D      A ∧ B       A ∧ B
————— ⇒ —————∧E1   ——————∧E2
A ∧ B      A           B
         ———————————————— ∧I
               A ∧ B
This somewhat confusion derivation takes our original proof of A ∧ B and pulls it apart into proof of A and B and uses these to assemble a new proof of A ∧ B. This means that our elimination rules give us all the information we put in so we say their locally complete.
The two of these properties combined, local soundness and completeness are how we show that an elimination rule is balanced with its introduction rule.
If you’re more comfortable with programming languages (I am) our local soundness property is equivalent to stating that
fst (a, b) ≡ a
snd (a, b) ≡ b
And local completeness is that
a ≡ (fst a, snd a)
The first equations are reductions and the second is expansion. These actually correspond the eta and beta rules we expect a programming language to have! This is a nice little example of why proof theory is useful, it gives a systematic way to define some parts of the behavior of a program. Given the logic a programming language gives rise to we can double check that all rules are locally sound and complete which gives us confidence our language isn’t horribly broken.
Hypothetical Judgments
Before I wrap up this post I wanted to talk about one last important concept in proof theory: judgments with hypotheses. This is best illustrated by trying to write the introduction and elimination rules for “implies” or “entailment”, written A ⊃ B.
Clearly A ⊃ B is supposed to mean we can prove B true assume A true to be provable. In other words, we can construct a derivation of the form
 A true
 ——————
   .
   .
   .
 ——————
 B true
We can notate our rules then as
 —————— u
 A true
 ——————
   .
   .
   .
 ——————
 B true           A ⊃ B    A
 —————————— u     ——————————
 A ⊃ B true         B true
This notation is a bit clunky, so we’ll opt for a new one: Γ ⊢ J. In this notation Γ is some list of judgments we assume to hold and J is the thing we want to show holds. Generally we’ll end up with the rule
J ∈ Γ
—————
Γ ⊢ J
Which captures the fact that Γ contains assumptions we may or may not use to prove our goal. This specific rule may vary depending on how we want express how assumptions work in our logic (substructural logics spring to mind here). For our purposes, this is the most straightforward characterization of how this ought to work.
Our hypothetical judgments come with a few rules which we call “structural rules”. They modify the structure of judgment, rather than any particular proposition we’re trying to prove.
Weakening
  Γ ⊢ J
—————————
Γ, Γ' ⊢ J

Contraction
Γ, A, A, Γ' ⊢ J
———————————————
 Γ, A, Γ' ⊢ J

Exchange
Γ' = permute(Γ)   Γ' ⊢ A
————————————————————————
        Γ ⊢ A
Finally, we get a substitution principle. This allows us to eliminate some of the assumptions we made to prove a theorem.
Γ ⊢ A   Γ, A ⊢ B
————————————————
     Γ ⊢ B
These 5 rules define meaning to our hypothetical judgments. We can restate our formulation of entailment with less clunky notation then as
A prop  B prop
——————————————
  A ⊃ B prop

Γ, A ⊢ B      Γ ⊢ A ⊃ B    Γ ⊢ A
—————————     ——————————————————
Γ ⊢ A ⊃ B           Γ ⊢ B
One thing in particular to note here is that entailment actually internalizes the notion of hypothetical judgments into our logic. This the aspect of it that made it behave so differently then the other connectives we looked at.
As an exercise to the reader: prove the local soundness and completeness of these rules.
Conclusion
In this post we’ve layed out a bunch of rules and I’ve hinted that a bunch more are possible. When put together these rules define a logic using “natural deduction”, a particular way of specifying proofs that uses inference rules rather than axioms or something entirely different.
Hopefully I’ve inspired you to poke a bit further into proof theory, in that case I heartily recommend Frank Pfenning’s lectures at the Oregon Summer School for Programming Languages.
Cheers,

          
          
          comments powered by Disqus



Observations about -XStaticPointers
Danny Gratzer — Tue, 27 Jan 2015 00:00:00 UT

    Posted on January 27, 2015
    


    
    Tags: haskell, types
    


For those who haven’t heard, GHC 7.10 is making a brave foray into the exciting world of distributed computing. To this end, it’s made a new language extension called -XStaticPointers to support Cloud Haskell in a pleasant, first class manner.
If you haven’t heard of static pointers before now, it’s worth glancing through the nice tutorial from ocharles’ 24 days of $(Haskell Related Thing).
The long and short of it is that -XStaticPointers gives us this new keyword static. We apply static to an expression and if there are no closured variables (to be formalized momentarily) then we get back a StaticPtr a. This gives us a piece of data that we can" serialize and ship over the wire because it has no dependencies.
Now to expand upon this “no closured variables”. A thing can only be fed to static if the free variables in the expression are all top level variables. This forbids us from writing something like
    foo :: StaticPtr a
    foo a = static a
Now in all honesty, I’m not super interested in Cloud Haskell. It’s not my area of expertise and I’m already terrified of trying to do things on one machine. What does interest me a lot though is this notion of having “I have no free variables” in the type of an expression. It’s an invariant we didn’t really have before in Haskell.
In fact, as I looked more closely it reminded me of something called box from modal logic.
A Quick Summary of Modal Logic
I’m not trying to give you a full understanding of modal logic, just a brief taste.
Modal logic extends our vanilla logic (in Haskell land this is constructive logic) with modalities. Modalities are little prefixes we tack on the front of a proposition to qualify its meaning slightly.
For example we might say something like

If it is possible that it is raining, then I will need an umbrella.

Here we used then modality possible to indicate we’re not assuming that it is snowing, only that it’s conceivable that it is. Because I’m a witch and will melt in the rain even the possibility of death raining from the sky will force me to pack my umbrella.
To formalize this a bit, we have our inductive definition of propositions
P = ⊥
  | ⊤
  | P ∧ P
  | P ∨ P
  | P ⇒ P
  | □ P
This is the syntax of a particular modal logic with one modality. Everything looks quite normal up until the last proposition form, which is the “box” modality applied to some proposition.
The box modality (the one we really care about for later) means “necessarily”. I almost think of it is a truthier truth if you can buy that. □ forbids us from using any hypotheses saying something like A is true inside of it. Since it represents a higher standard of proof we can’t use the weaker notion that A is true! The rule for creating a box looks like this to the first approximation
 • ⊢ A
———————
Γ ⊢ □ A
So in order to prove a box something under a set of assumptions Γ, we have to prove it assuming none of those assumptions. In fact, we find that this is a slightly overly restrictive form for this judgment, we know that if we have a □ A we proved it without assumptions so if we have to introduce a □ B we should be able to use the assumption that A is true for this proof because we know we can construct one without any assumptions and could just copy paste that in.
This causes us to create a second context, one of the hypotheses that A is valid, usually notated with a Δ. We then get the rules
   Δ; • ⊢ A          Δ; Γ ⊢ A valid     A valid ∈ Δ
———————————————      ——————————————   ———————————————
Δ; Γ ⊢ □ A true       Δ; Γ ⊢ A true    Δ; Γ ⊢ A valid


Δ; Γ ⊢ □ A   Δ, A valid; Γ ⊢ B
——————————————————————————————
         Δ; Γ ⊢ B
What you should take away from these scary looking symbols is

A valid is much stronger than A true
Anything inside a □ can depend on valid stuff, but not true stuff
□ A true is the same as A valid.

This presentation glosses over a fair amount, if your so inclined I’d suggest looking at Frank Pfenning’s lecture notes from his class entitled “Modal Logic”. These actually go at a reasonable pace and introduce the groundwork for someone who isn’t super familiar with logic.
Now that we’ve established that there is an interesting theoretical backing for modal logic, I’m going to drop it on the floor and look at what Haskell actually gives us.
That “Who Cares” Bit
Okay, so how does this pertain to StaticPtr? Well I noticed that just like how box drops hypotheses that are “merely true”, static drops variables that are introduced by our local context!
This made me think that perhaps StaticPtrs are a useful equivalent to the □ modality! This shouldn’t be terribly surprising for PL people, indeed the course I linked to above expressly mentions □ to notate “movable code”. What’s really exciting about this is that there are a lot more applications of □ then just movable code! We can use it to notate staged computation for example.
Alas however, it was not to be. Static pointers are missing one essential component that makes them unsuitable for being □, we can’t eliminate them properly. In modal logic, we have a rule that lets other boxes depend on the contents of some box. The elimination rule is much stronger than just “If you give me a □ A, I’ll give you an A” because it’s much harder to construct a □ A in the first place! It’s this asymmetry that makes static pointers not quite kosher. With static pointers there isn’t a notion that we can take one static pointer and use it in another.
For example, we can’t write
    applyS :: StaticPtr a -> StaticPtr (a -> b) -> StaticPtr b
    applyS sa sf = static (deRefStaticPtr sf (deRefStaticPtr sa))
My initial reaction was that -XStaticPointers is missing something, perhaps a notion of a “static pattern”. This would let us say something like
    applyS :: StaticPtr a -> StaticPtr (a -> b) -> StaticPtr b
    applyS sa sf =
      let static f = sf
          static a = sa
      in static (f a)
So this static pattern in a keyword would allow us to hoist a variable into the realm of things we’re allowed to leave free in a static pointer.
This makes sense from my point of view, but less so from that of Cloud Haskell. The whole point of static pointers is to show a computation is dependency free after all, static patterns introduce a (limited) set of dependencies on the thunk that make our lives complicated. It’s not obvious to me how to desugar things so that static patterns can be compiled how we want them to be, it looks like it would require some runtime code generation which is a no-no for Haskell.
My next thought was that maybe Closure was the answer, but that doesn’t actually work either! We can introduce a closure from an arbitrary serializable term which is exactly what we don’t want from a model of □! Remember, we want to model closed terms so allowing us to accept an arbitrary term defeats the point.
It’s painfully clear that StaticPtrs are very nearly □s, but not quite! Whatever Box ends up being, we’d want the following interface
    data Box a

    intoBox :: StaticPtr a -> Box a
    closeBox :: Box (a -> b) -> Box a -> Box b
    outBox :: Box a -> a
The key difference from StaticPtr’s being closeBox. Basically this gives us a way to say “I have something that’s closed except for one dependency” and we can fill that dependency with some other closed term.
This turns something like
    let static x = sx in static y
into
    intoBox (static (\x -> y)) `closeBox` intoBox sx
If you read the tutorial, you’ll notice that this is most of the implementation of Closure! Following our noses we define
    data Box a where
      Pure  :: StaticPtr a -> Box a
      Close :: Box (a -> b) -> Box a -> Box b
This is literally the dumbest implementation of Box I think is possible, but it actually works just fine.
    intoBox = Pure
    closeBox = Close

    outBox :: Box a -> a
    outBox (Pure a) = deRefStaticPtr a
    outBox (Close f a) = outBox f (outBox a)
which would seem to be modal logic in Haskell.
Wrap Up
To be honest, I’m not sure yet how this is useful. I’m kinda swamped with coursework at the moment (new semester at CMU) but it seems like a new and fun thing to play with.
I’ve stuck the code at jozefg/modal if you want to play with it. Fair warning that it only compiles with GHC >= 7.10 because we need static pointers.
Finally, since the idea of modalities for sendable code is not a new one, I should leave these links

Tom Murphy VII’s PHD on the subject
A much shorter paper on the same
Using modalities indicate staging

Cheers.

          
          
          comments powered by Disqus



Why Constructive Logic
Danny Gratzer — Fri, 09 Jan 2015 00:00:00 UT

    Posted on January  9, 2015
    


    
    Tags: types
    


Continuing on my quest of writing about my poorly thought out comments, let’s talk about constructive logic. A lot of people in and around the Haskell/FP community will make statements like

The Curry-Howard isomorphism means that you’re proving things in constructive logic.

Usually absent from these remarks is a nice explanation of why constructive logic matches up with the programming we know and love.
In this post I’d like to highlight what constructive logic is intended to capture and why this corresponds so nicely with programming.
A Bit of History
First things first, let’s discuss the actual origin of constructive logic. It starts with a mathematician and philosopher named Brouwer. He was concerned trying to give an answer to the question “What does it mean to know something to be true” where something is defined as a mathematical proposition.
He settled on the idea of proof being a sort of subjective and personal thing. I know something is true if and only if I can formulate some intuitive proof of it. When viewed this way, the proof I scribble down on paper doesn’t actually validate something’s truthfulness. It’s merely a serialization of my thought process for validating its truthfulness.
Notice that this line of reasoning doesn’t actually specify a precise definition of what verifying something intuitively means. I interpret this idea as something slightly more meta then any single formal system. Rather, when looking a formal system, you ought to verify that its axioms are admissible by your own intuition and then you may go on to accept proofs built off of these axioms.
Now after Brouwer started talking about these ideas Arend Heyting decided to try to write down a logic that captured this notion of “proof is intuition”. The result was this thing called intuitionistic logic. This logic is part of a broader family of logics called “constructive logics”.
Constructive Logic
The core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version. In a classical logic each proposition is either true or false, regardless of what we know about it.
In our new constructive system, a formula cannot be assigned either until we have direct evidence of it. It’s not that there’s a magical new boolean value, {true, false, i-don’t-know}, it’s just not a meaningful question to ask. It doesn’t make sense in these logics to say “A is true” without having a proof of A. There isn’t necessarily this Platonic notion of truthfulness, just things we as logicians can prove. This is sometimes why constructive logic is called “logic for humans”.
The consequences of dealing with things in this way can be boils down to a few things. For example, we now know that

If ∃x. A(x) can be proven, then there is some term which we can readily produce t so that A(t) is provable
If A ∨ B can be proven then either A or B is provable and we know which. (note that ∨ is the symbol for OR)

These make sense when you realize that ∃x. A(x) can only be proven if we have a direct example of it. We can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. We actually need to introduce it by proving an example of it. When our logic enforces this of course we can produce that example!
The same goes for A ∨ B, in our logic the only way to prove A ∨ B is to either provide a proof of A or provide a proof of B. If this is the only way to build a ∨ we can always just point to how it was introduced!
If we extend this to and, ∧: The only way to prove A ∧ B is to prove both A and B. If this is the only way to get to a proof of A ∧ B then of course we can get a proof of A from A ∧ B. ∧ is just behaving like a pair of proofs.
All of this points at one thing: our logic is structured so that we can only prove something when we directly prove it, that’s the spirit of Brouwer’s intuitionism that we’re trying to capture.
There are a lot of different incarnations of constructive logic, in fact pretty much every logic has a constructive cousin. They all share this notion of “We need a direct proof to be true” however. One thing to note that is that some constructive logics conflict a bit with intuitionism. While intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. For example both Markov’s principle and Church’s thesis state something about all computable functions. While they may be reasonable statements we can’t give a satisfactory proof for them. This is a little confusing I know and I’m only going to talk about constructive logics that Brouwer would approve of.
I encourage the curious reader to poke further at this, it’s rather cool math.
Who on Earth Cares?
Now while constructive logic probably sounds reasonable, if weird, it doesn’t immediately strike me as particularly useful! Indeed, the main reason why computer science cares about constructivism is because we all use it already.
To better understand this, let’s talk about the Curry-Howard isomorphism. It’s that thing that wasn’t really invented by either Curry or Howard and some claim isn’t best seen as an isomorphism, naming is hard. The Curry-Howard isomorphism states that there’s a mapping from a type to a logical proposition and from a program to a proof.
To show some of the mappings for types
    CH(Either a b) = CH(a) ∨ CH(b)
    CH((a, b))     = CH(a) ∧ CH(b)
    CH( () )       = ⊤ -- True
    CH(Void)       = ⊥ -- False
    CH(a -> b)     = CH(a) → CH(b)
So a program with the type (a, b) is really a proof that a ∧ b is true. Here the truthfulness of a proposition really means that the corresponding type can be occupied by a program.
Now, onto why this logic we get is constructive. Recall our two conditions for a logic being constructive, first is that if ∃x. A(x) is provable then there’s a specific t where A(t) is provable.
Under the Curry Howard isomorphism, ∃ is mapped to existential types (I wonder how that got its name :). That means that a proof of ∃x. A(x) is something like
    -- Haskell ex. syntax is a bit gnaryl :/
    data Exists f = forall x. Exists f x

    ourProof :: Exists F
    ourProof = ...
Now we know the only way to construct an Exists F is to use the constructor Exists. This constructor means that there is at least one specific type for which we could prove f x. We can also easily produce this term as well!
    isProof :: Exists f -> (f x -> c) -> c
    isProof (Exists x) cont = cont x
We can always access the specific “witness” we used to construct this Exists type with pattern matching.
The next law is similar. If we have a proof of a ∨ b we’re supposed to immediately be able to produce a proof of a or a proof of b.
In programming terms, if we have a program Either a b we’re supposed to be able to immediately tell whether this returns Right or Left! We can make some argument that one of these must be possible to construct but we’re not sure which since we have to be able to actually run this program! If we evaluate a program with the type Either a b we’re guaranteed to get either Left a or Right b.
The Self-Sacrificing Definition of Constructive Logic
There are a few explanations of constructive logic that basically describe it as “Classical logic - the law of excluded middle”. More verbosely, a constructive logic is just one that forbids

∀ A. A ∨ ¬ A being provable (the law of excluded middle, LEM)
∀ A. ¬ (¬ A) → A being provable (the law of double negation)

I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. Indeed this is not uncommon when working in Coq or Agda. They’re just not a readily available tool. Looking at them, this should be apparent as they both let us prove something without directly proving it.
This isn’t really a defining aspect of constructivism, just a natural consequence. If we need a proof of A to show A to be true if we admit A ∨ ¬ A by default it defeats the point. We can introduce A merely by showing ¬ (¬ A) which isn’t a proof of A! Just a proof that it really ought to be true.
In programming terms this is saying we can’t write these two functions.
    data Void

    doubleNeg :: ((a -> Void) -> Void) -> a
    doubleNeg = ...

    lem :: Either a (a -> Void)
    lem = ...
For the first one we have to choices, either we use this (a -> Void) -> Void term we’re given or we construct an a without it. Constructing an arbitrary a without the function is just equivalent to forall a. a which we know to be unoccupied. That means we have to use (a -> Void) -> Void which means we have to build an a -> Void. We have no way of doing something interesting with that supplied a however so we’re completely stuck! The story is similar with lem.
In a lot of ways this definition strikes me in the same way that describing functional programming as

Oh it’s just programming where you don’t have variables or objects.

Or static typing as

It’s just dynamic typed programming where you can’t write certain correct programs

I have a strong urge to say “Well.. yes but no!”.
Wrap Up
Hopefully this helps clarify what exactly people mean when they say Haskell corresponds to a constructive logic or programs are proofs. Indeed this constructivism gives rise to a really cool thing called “proof relevant mathematics”. This is mathematics done purely with constructive proofs. One of the latest ideas to trickle from mathematics to computers is homotopy type theory where we take a proof relevant look at identity types.
Before I wrap up I wanted to share one funny little thought I heard. Constructive mathematics has found a home in automated proof systems. Imagine Brouwer’s horror at hearing we do “intuitionist” proofs that no one will ever look at or try to understand beyond some random mechanical proof assistant!
Thanks to Jon Sterling and Darryl McAdams for the advice and insight

          
          
          comments powered by Disqus



A Crash Course on ML Modules
Danny Gratzer — Thu, 08 Jan 2015 00:00:00 UT

    Posted on January  8, 2015
    


    
    Tags: sml, haskell
    


I was having lunch with a couple of Haskell programmers the other day and the subject of the ML family came up. I’ve been writing a lot of ML lately and mentioned that I thought *ML was well worth learning for the average Haskeller. When pressed why the best answer I could come up with was “Well.. clean language, Oh! And an awesome module system” which wasn’t my exactly most compelling response.
I’d like to outline a bit of SML module system here to help substantiate why looking at an ML is A Good Thing. All the code here should be translatable to OCaml if that’s more your taste.
Concepts
In ML languages modules are a well thought out portion of the language. They aren’t just “Oh we need to separate these names… modules should work”. Like any good language they have methods for abstraction and composition. Additionally, like any good part of an ML language, modules have an expressive type language for mediating how composition and abstraction works.
So to explain how this module system functions as a whole, we’ll cover 3 subjects

Structures
Signatures
Functors

Giving a cursory overview of what each thing is and how it might be used.
Structures
Structures are the values in the module language. They are how we actually create a module. The syntax for them is
    struct
      fun flip f x y = f y x
      datatype 'a list = Con of ('a * 'a list) | Nil
      ...
    end
A quick note to Haskellers, in ML types are lower case and type variables are written with ’s. Type constructors are applied “backwards” so List a is 'a list.
So they’re just a bunch of a declarations stuffed in between a struct and end. This is a bit useless if we can’t bind it to a name. For that there’s
    structure M = struct val x = 1 end
And now we have a new module M with a single member, x : int. This is just like binding a variable in the term language except a “level up” if you like. We can use this just like you would use modules in any other language.
    val x' = M.x + 1
Since struct ... end can contain any list of declarations we can nest module bindings.
    structure M' =
      struct
        structure NestedM = M
      end
And access this using ..
    val sum = M'.NestedM.x + M.x
As you can imagine, it would get a bit tedious if we needed to . our way to every single module access. For that we have open which just dumps a module’s exposed contents into our namespace. What’s particularly interesting about open is that it is a “normal” declaration and can be nested with let.
    fun f y =
      let open M in
        x + y
      end
OCaml has gone a step further and added special syntax for small opens. The “local opens” would turn our code into
    let f y = M.(x + y)
This already gives us a lot more power than your average module system. Structures basically encapsulate what we’d expect in a module system, but

Structures =/= files
Structures can be bound to names
Structures can be nested

Up next is a look at what sort of type system we can impose on our language of structures.
Signatures
Now for the same reason we love types in the term language (safety, readability, insert-semireligious-rant) we’d like them in the module language. Happily ML comes equipped with a feature called signatures. Signature values look a lot like structures
    sig
      val x : int
      datatype 'a list = Cons of ('a * 'a list) | Nil
    end
So a signature is a list of declarations without any implementations. We can list algebraic data types, other modules, and even functions and values but we won’t provide any actual code to run them. I like to think of signatures as what most documentation rendering tools show for a module.
As we had with structures, signatures can be given names.
    signature MSIG = sig val x : int end
On their own signatures are quite useless, the whole point is that we can apply them to modules after all! To do this we use : just like in the term language.
    structure M : MSIG = struct val x = 1 end
When compiled, this will check that M has at least the field x : int inside its structure. We can apply signatures retroactively to both module variables and structure values themselves.
    structure M : MSIG = struct val x = 1 end : MSIG
One interesting feature of signatures is the ability to leave certain types abstract. For example, when implementing a map the actual implementation of the core data type doesn’t belong in the signature.
    signature MAP =
      sig
        type key
        type 'a table

        val empty : 'a table
        val insert : key -> 'a -> 'a table -> 'a table
        val lookup : key -> 'a table -> 'a option
      end
Notice that the type of keys and tables are left abstract. When someone applies a signature they can do so in two ways, weak or strong ascription. Weak ascription (:) means that the constructors of abstract types are still accessible, but the signature does hide all unrelated declarations in the module. Strong ascription (:>) makes the abstract types actually abstract.
Every once in a while we need to modify a signature. We can do this with the keywords where type. For example, we might implement a specialization of MAP for integer keys and want our signature to express this
    structure IntMap :> MAP where type key = int =
      struct ... end
This incantation leaves the type of the table abstract but specializes the keys to an int.
Last but not least, let’s talk about abstraction in module land.
Functors
Last but not least let’s talk about the “killer feature” of ML module systems: functors. Functors are the “lifting” of functions into the module language. A functor is a function that maps modules with a certain signature to functions of a different signature.
Jumping back to our earlier example of maps, the equivalent in Haskell land is Data.Map. The big difference is that Haskell gives us maps for all keys that implement Ord. Our signature doesn’t give us a clear way to associate all these different modules, one for each Orderable key, that are really the same thing. We can represent this relationship in SML with
    signature ORD =
      sig
        type t
        val compare : t * t -> order
      end

    functor RBTree (O : ORD) : MAP where type key = O.t =
      struct
        open O
        ....
      end
Which reads as “For any module implementing Ord, I can give you a module implementing MAP which keys of type O.t”. We can then instantiate these
    structure IntOrd =
      struct
        type t = int
        val compare = Int.compare end
      end
    structure IntMap = RBTree(IntOrd)
Sadly SML’s module language isn’t higher order. This means we can’t assign functors a type (there isn’t an equivalent of ->) and we can’t pass functors to functors. Even with this restriction functors are tremendously useful.
One interesting difference between SML and OCaml is how functors handle abstract types. Specifically, is it the case that
F(M).t = F(M).t
In SML the answer is (surprisingly) no! Applying a functor generates brand new abstract types. This is actually beneficial when you remember SML and OCaml aren’t pure. For example you might write a functor for handling symbol tables and internally use a mutable symbol table. One nifty trick would be to keep of type of symbols abstract. If you only give back a symbol upon registering something in the table, this would mean that all symbols a user can supply are guaranteed to correspond to some entry.
This falls apart however if functors are extensional. Consider the following REPL session
    > structure S1 = SymbolTable(WhateverParameters)
    > structure S2 = SymbolTable(WhateverParameters)
    > val key = S1.register "I'm an entry"
    > S2.lookup key
    Error: no such key!
This will not work if S1 and S2 have separate key types.
To my knowledge, the general conclusion is that generative functors (ala SML) are good for impure code, but applicative functors (ala OCaml and BackPack) really shine with pure code.
Wrap Up
We’ve covered a lot of ground in this post. This wasn’t an exhaustive tour of every feature of ML module systems, but hopefully I got the jist across.
If there’s one point to take home: In a lot of languages modules are clearly a bolted on construction. They’re something added on later to fix “that library problem” and generally consist of the same “module <-> file” and “A module imports others to bring them into scope”. In ML that’s simply not the case. The module language is a rich, well thought out thing with it’s own methods of abstraction, composition, and even a notion of types!
I wholeheartedly recommend messing around a bit with OCaml or SML to see how having these things impacts your thought process. I think you’ll be pleasantly surprised.

          
          
          comments powered by Disqus



Examining Hackage: folds
Danny Gratzer — Sat, 27 Dec 2014 00:00:00 UT

    Posted on December 27, 2014
    


    
    Tags: haskell
    


In keeping with the rest of the “Examining Hackage” series I’d like to go through the source folds package today. We’ll try to go through most of the code in an attempt to understand what exactly folds does and how it does it. To be honest, I hadn’t actually heard of this one until someone mentioned it to me on /r/haskell but it looks pretty cool. It also has the word “comonadic” in the description, how can I resist?
It’s similar to Gabriel’s foldl library, but it also seems to provide a wider suite of types folds. In retrospect, folds has a general framework for talking about types of folds and composing them where as foldl defines only 2 types of folds, but defines a whole heap of prebuilt (left) folds.
Poking Around
After grabbing the source and looking at the files we see that folds is actually reasonable large
~$ cabal get folds && cd folds-0.6.2 && ag -g "hs$"
    src/Data/Fold.hs
    src/Data/Fold/L.hs
    src/Data/Fold/L'.hs
    src/Data/Fold/Class.hs
    src/Data/Fold/M1.hs
    src/Data/Fold/L1.hs
    src/Data/Fold/R.hs
    src/Data/Fold/Internal.hs
    src/Data/Fold/L1'.hs
    src/Data/Fold/R1.hs
    src/Data/Fold/M.hs
    Setup.lhs
    tests/hlint.hs
One that jumps out at me is Internal since it likely doesn’t depend on anything. We’ll start there.
Internal
Looking at the top gives a hint for what we’re in for
    {-# LANGUAGE FlexibleContexts #-}
    {-# LANGUAGE UndecidableInstances #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE DeriveDataTypeable #-}
    module Data.Fold.Internal
      ( SnocList(..)
      , SnocList1(..)
      , List1(..)
      , Maybe'(..), maybe'
      , Pair'(..)
      , N(..)
      , Tree(..)
      , Tree1(..)
      , An(..)
      , Box(..)
      ) where
This module seems to be mostly a bunch of (presumably useful) data types + their instances for Foldable, Functor, and Traversable. Since all 3 of these are simple enough you can actually just derive them I’ll elide them in most cases.
First up is SnocList, if the name didn’t give it away it is a backwards list (snoc is cons backwards)
    data SnocList a = Snoc (SnocList a) a | Nil
      deriving (Eq,Ord,Show,Read,Typeable,Data)
Then we have the boilerplatey instances for Functor and Foldable. What’s a bit odd is that both foldl and foldMap are implemented where we only need foldl. Presumably this is because just foldMap gives worse performance but that’s a little disappointing.
Next is SnocList1 and List1 which are quite similar.
    data SnocList1 a = Snoc1 (SnocList1 a) a | First a
      deriving (Eq,Ord,Show,Read,Typeable,Data)

    data List1 a = Cons1 a (List1 a) | Last a
If you’ve never seen this before, notice how instead of Nil we have a constructor which requires an element. This means that no matter how we construct a list we need to supply at least element. Among other things this means that head would be safe.
We also have a couple strict structures. Notice that these cannot be functors since they break fmap f . fmap g = fmap (f . g) (why?). We have
    data Maybe' a = Nothing' | Just' !a

    data Pair' a b = Pair' !a !b
And we have the obvious instance for Foldable Maybe' and Monoid (a, b). Now it may seem a little silly to define these types, but from experience I can say anything that makes strictness a bit more explicit is wonderfully helpful. Now we can just use seq on a Pair' and know that both components will be forced.
Next we define a type for trees. One thing I noticed was the docs mentioned that this type reflects the structure of a foldMap
    data Tree a
      = Zero
      | One a
      | Two (Tree a) (Tree a)
      deriving (Eq,Ord,Show,Read,Typeable,Data)
When we foldMap each One should be an element of the original collection. From there we can fmap with the map part of foldMap, and we can imagine traversing the tree and replacing Two l r with l <> r, each Zero with mempty, and each One a with a.
So that’s rather nifty. On top of this we have Foldable, Traversable, and Functor instances.
We also have Tree1 which is similar but elides the Zero
    data Tree1 a = Bin1 (Tree1 a) (Tree1 a) | Tip1 a
As you’d expect, this implements the same type classes as Tree.
Now is where things get a bit weird. First up is a type for reifying monoids using reflection. I actually was thinking about doing a post on it and then I discovered Austin Seipp has done an outstanding one. So we have this N type with the definition
    newtype N a s = N { runN :: a }
      deriving (Eq,Ord,Show,Read,Typeable,Data)
Now with reflection there are two key components, there’s the type class instance floating around and a fresh type s that keys it. If we have s then we can easily demand a specific instance with reflect (Proxy :: Proxy s). That’s exactly what we do here. We can create a monoid instance using this trick with
    instance Reifies s (a -> a -> a, a) => Monoid (N a s) where
      mempty = N $ snd $ reflect (Proxy :: Proxy s)
      mappend (N a) (N b) = N $ fst (reflect (Proxy :: Proxy s)) a b
So at each point we use our s to grab the tuple of monoid operations we expect to be around and use them in the obvious manner. The only reason I could imagine doing this is if we had a structure which we want to use as a monoid in a number of different ways. I suppose we also could have just passed the dictionary around but maybe this was extremely ugly. We shall see later I suppose.
Last comes two data types I do not understand at all. There’s An and Box. The look extremely boring.
    data Box a = Box a
    newtype An a = An a
Their instances are the same everywhere as well.. I have no clue what these are for. Grepping shows they are used though so hopefully this mystery will become clearer as we go.
Class
Going in order of the module DAG gives us Data.Fold.Class.hs. This exports two type classes and one function
    module Data.Fold.Class
      ( Scan(..)
      , Folding(..)
      , beneath
      ) where
One thing that worries me a little is that this imports Control.Lens which I don’t understand nearly as well as I’d like to.. We’ll see how this turns out.
Our first class is
    class Choice p => Scan p where
      prefix1 :: a -> p a b -> p a b
      postfix1 :: p a b -> a -> p a b
      run1 :: a -> p a b -> b
      interspersing :: a -> p a b -> p a b
So right away we notice this is a subclass of Choice which is in turn a subclass of Profunctor. Choice captures the ability to pull an Either through our profunctor.
    left' :: p a b -> p (Either a c) (Either b c)
    right' :: p a b -> p (Either c a) (Either c b)
Note that we can’t do this with ordinary profunctors since we’d need a function from Either a c -> a which isn’t complete.
Back to Scan p. Scan p takes a profunctor which apparently represents our folds. We then can prefix the input we supply, postfix the input we supply, and run our fold on a single element of input. This is a bit weird to me, I’m not sure if the intention is to write something like
    foldList :: Scan p => [a] -> p a b -> b
    foldList [x] = run1 x
    foldList (x : xs) = foldList xs . prefix1 x
or something else entirely. Additionally this doesn’t really conform to my intuition of what a scan is. I’d expect a scan to produce all of the intermediate output involved in folding. At this point, with no instances in scope, it’s a little tricky to see what’s supposed to be happening here.
There are a bunch of default-signature based implementations of these methods if your type implements Foldable. Since this is the next type class in the module let’s look at that and then skip back to the defaults.
    class Scan p => Folding p where
      prefix :: Foldable t => t a -> p a b -> p a b
      prefixOf :: Fold s a -> s -> p a b -> p a b
      postfix :: Foldable t => p a b -> t a -> p a b
      postfixOf :: Fold s a -> p a b -> s -> p a b
      run :: Foldable t => t a -> p a b -> b
      runOf :: Fold s a -> s -> p a b -> b
      filtering :: (a -> Bool) -> p a b -> p a b
At this point I looked at a few of the types and my first thought was “Oh dammit lens..” but it’s actually not so bad! The first thing to do is ignore the *Of functions which work across lens’s Fold type. There seems to be a nice pair for each “running” function where it can work across a Foldable container or lens’s notion of a fold.
      prefix :: Foldable t => t a -> p a b -> p a b
      postfix :: Foldable t => p a b -> t a -> p a b
      run :: Foldable t => t a -> p a b -> b
The first two functions let us create a new fold that will accept some input and supplement it with a bunch of other inputs. prefix gives the supplemental input followed by the new input and postfix does the reverse. We can actually supply input and run the whole thing with run.
All of these are defined with folded from lens which reifies a foldable container into a Fold. so foo = fooOf folded is the default implementation for all of these. Now for the corresponding fold functions I’m reading them as “If you give me a lens to treat s as a container that I can get elements from and a fold, I’ll feed the elements of s into the fold.”
The types are tricky, but this type class seems to capture what it means to run a fold across some type of structure.
Now that we’ve seen how An comes in handy. It’s used as a single object Foldable container. Since it’s newtyped, this should basically run the same as just passing a single element in.
    prefix1 = prefix . An
    run1 = run . An
    postfix1 p = postfix p . An
So a Scan here apparently means a fold over a single element at a time. Still not sure why this is deserving of the name Scan but there you are.
Last but not least we have a notion of dragging a fold through an optic with beneath.
    beneath :: Profunctor p => Optic p Identity s t a b -> p a b -> p s t
    beneath l f = runIdentity #. l (Identity #. f)
Those #.’s are like lmaps but only work when the function we apply is a “runtime identity”. Basically this means we should be able to tell whether or not we applied the function or just used unsafeCoerce when running the code. Otherwise all we do is set up our fold f to work across Identity and feed it into the optic.
Concrete Implementations
Now a lot of the rest of the code is implementing those two type classes we went over. To figure out where all these implementations are I just ran
~$ cabal repl
  > :info Scan
  ....
  instance Scan R1 -- Defined at src/Data/Fold/R1.hs:25:10
  instance Scan R -- Defined at src/Data/Fold/R.hs:27:10
  instance Scan M1 -- Defined at src/Data/Fold/M1.hs:25:10
  instance Scan M -- Defined at src/Data/Fold/M.hs:33:10
  instance Scan L1' -- Defined at src/Data/Fold/L1'.hs:24:10
  instance Scan L1 -- Defined at src/Data/Fold/L1.hs:25:10
  instance Scan L' -- Defined at src/Data/Fold/L'.hs:33:10
  instance Scan L -- Defined at src/Data/Fold/L.hs:33:10
Looking at the names, I really don’t want to go through each of these with this much detail. Instead I’ll skip all the *1’s and go over R, L', and M to get a nice sampling of the sort of folds we get.
R.hs
Up first is R.hs. This defines the first type for a fold we’ve seen.
    data R a b = forall r. R (r -> b) (a -> r -> r) r
Reading this as “a right fold from a to b” we notice a few parts here. It looks like that existential r encodes our fold’s inner state and r -> b maps the current state into the result of the fold. That leaves a -> r -> r as the stepping function. All in all this doesn’t look too different from
    foldAndPresent :: (a -> r -> r) -> r -> (r -> b) -> [a] -> b
    foldAndPresent f z p = p . foldr f z
The rest of this module is devoted to making a lot of instances for R. Some of these are really uninteresting like Bind, but quite a few are enlightening. To start with, Profunctor.
    instance Profunctor R where
      dimap f g (R k h z) = R (g . k) (h . f) z
      rmap g (R k h z) = R (g . k) h z
      lmap f (R k h z) = R k (h . f) z
This should more or less by what you expect since it’s really the only the way to get the types to fit together. We fit the map from b -> d onto the presentation piece of the fold and stick the map from a -> c onto the stepper so it can take the new pieces of input.
Next we have the instance for Choice.
    instance Choice R where
      left' (R k h z) = R (_Left %~ k) step (Left z) where
        step (Left x) (Left y) = Left (h x y)
        step (Right c) _ = Right c
        step _ (Right c) = Right c

      right' (R k h z) = R (_Right %~ k) step (Right z) where
        step (Right x) (Right y) = Right (h x y)
        step (Left c) _ = Left c
        step _ (Left c) = Left c
This was slightly harder for me to read, but it helps to remember that here _Left %~ and _Right %~ are just mapping over the left and right sides of an Either. That clears up the presentation bit. For the initial state, when we’re pulling our computation through the left side we wrap it in a Left, when we’re pulling it through the right, we wrap it in Right.
The interesting bit is the new step function. It short circuits if either our state or our new value is the wrong side of an Either otherwise it just applies our stepping function and wraps it back up as an Either.
In addition to being a profunctor, R is also a monad and comonad as well as a whole bunch of more finely grained classes built around those two. I’ll just show the Monad Applicative, and Comonad instance here.
    instance Applicative (R a) where
      pure b = R (\() -> b) (\_ () -> ()) ()
      R xf bxx xz <*> R ya byy yz = R
        (\(Pair' x y) -> xf x $ ya y)
        (\b ~(Pair' x y) -> Pair' (bxx b x) (byy b y))
        (Pair' xz yz)

    instance Comonad (R a) where
      extract (R k _ z) = k z
      duplicate (R k h z) = R (R k h) h z

    instance Monad (R a) where
      return b = R (\() -> b) (\_ () -> ()) ()
      m >>= f = R (\xs a -> run xs (f a)) (:) [] <*> m
Looking at the Comonad instance nesting a fold within a fold doesn’t change the accumulator, only the presentation. A nested fold is one that runs and returns a new fold which is identical except the starting state is the result of the old fold.
The <*> operator here is kind of nifty. First off it zips both folds together using the strict Pair'. Finally when we get to the presentation stage we map the final state for the left which gives us a function, and the final state for the right maps to its argument. Applying these two gives us our final result.
Notice that there’s some craziness happening with irrefutable patterns. When we call this function we won’t attempt to force the second argument until bxx forces x or byy forces y. This is important because it makes sure that <*> preserves short circuiting.
The monad instance has a suitably boring return and >>= is a bit odd. We have one machine which accumulates all the elements it’s given in a list, this is an “identity fold” of sorts. From there our presentation function returns a lambda which expects an a and runs f a with all the input we’ve saved. We combine this with m by running it in parallel with <*> and feeding the result of m back into the lambda generated by the right.
Now we’re finally in a position to define our Scan and Folding instances. Since the Scan instance can be determined from the Folding one I’ll show Folding.
    instance Folding R where
      run t (R k h z)     = k (foldr h z t)
      prefix s            = extend (run s)
      postfix t s         = run s (duplicate t)

      runOf l s (R k h z) = k (foldrOf l h z s)
      prefixOf l s        = extend (runOf l s)
      postfixOf l t s     = runOf l s (duplicate t)
      filtering p (R k h z) = R k (\a r -> if p a then h a r else r) z
It took some time, but I understand how this works! The first thing to notice is that actually running a fold just relies on the foldr we have from Foldable. Postfixing a fold is particularly slick with right folds. Remember that z represents the accumulated state for the remainder of the items in our sequence.
Therefore, to postfix a number of elements all we need do is run the fold on the container we’re given and store the results as the new initial state. This is precisely what happens with run s (duplicate t).
Now prefix is the inefficient one here. To prefix an element we want to change how presentation works. Instead of just using the default presentation function, we actually want to take the final state we get and run the fold again using this prefixing sequence and then presenting the result. For this we have another helpful comonandic function, extend. This leaks because it holds on to the sequence a lot longer than it needs to.
The rest of these functions are basically the same thing except maybe postfixing (ha) a function with Of here and there.
L’.hs
Next up is (strict) left folds. As with right folds this module is just a data type and a bunch of instances for it.
    forall r. L' (r -> b) (r -> a -> r) r
One thing that surprised me here was that our state r isn’t stored strictly! That’s a bit odd but presumably there’s a good reason for this. Now all the instances for L' are the same as those for R up to isomorphism because the types are well.. isomorphic.
The real difference comes in the instances for Scan and Folding. Remember how Folding R used foldr, well here we just use foldl'. This has the upshot that all the strictness and whatnot is handled entirely by the foldable instance!
    instance Folding L' where
      run t (L' k h z)     = k $! foldl' h z t
      prefix s             = run s . duplicate
      postfix t s          = extend (run s) t

      runOf l s (L' k h z) = k $! foldlOf' l h z s
      prefixOf l s         = runOf l s . duplicate
      postfixOf l t s      = extend (runOf l s) t
      filtering p (L' k h z) = L' k (\r a -> if p a then h r a else r) z
So everywhere we had foldr we have foldl'. The other interesting switch is that our definitions of prefix and postfix are almost perfectly swapped! This actually makes perfect sense when you think about it. In a left fold the state is propagating from the beginning to the end versus a right fold where it propagates from the end to the beginning! So to prefix something when folding to the left we add it to the initial state and when postfixing we use the presentation function to take our final state and continue to fold with it.
If you check above, you’ll find this to be precisely the opposite of what we had for right folds and since they both have the same comonad instance, we can swap the two implementations.
In fact, having read the implementation for right folds I’m noticing that almost everything in this file is so close to what we had before. It really seems like there is a clever abstraction just waiting to break out.
M.hs
Now that we’ve seen how left and right folds are more or less the same, let’s try something completely different! M.hs captures the notion of a foldMap and looks pretty different than what we’ve seen before.
First things first, here’s the type in question.
    data M a b = forall m. M (m -> b) (a -> m) (m -> m -> m) m
We still have a presentation function m -> b, and we still have an internal state m. However, we also have a conversion function to map our inputted values onto the values we know how to fold together and we have a tensor operation m -> m -> m.
Now as before we have a profunctor instance
    instance Profunctor M where
      dimap f g (M k h m e) = M (g.k) (h.f) m e
      rmap g (M k h m e) = M (g.k) h m e
      lmap f (M k h m e) = M k (h.f) m e
Which might start to look familiar from what we’ve seen so far. Next we have a Choice instance which is still a little intimidating.
    instance Choice M where
      left' (M k h m z) = M (_Left %~ k) (_Left %~ h) step (Left z) where
        step (Left x) (Left y) = Left (m x y)
        step (Right c) _ = Right c
        step _ (Right c) = Right c

      right' (M k h m z) = M (_Right %~ k) (_Right %~ h) step (Right z) where
        step (Right x) (Right y) = Right (m x y)
        step (Left c) _ = Left c
        step _ (Left c) = Left c
As before we use prisms and %~ to drag our presentation and conversion functions into Either, similarly our starting state is wrapped in the appropriate constructor and we define a new stepping function with similar characteristic’s to what we’ve seen before.
As before, we’ve got a wonderful world of monads and comonads to dive into now. We’ll start with monads here to mix it up.
    instance Applicative (M a) where
      pure b = M (\() -> b) (\_ -> ()) (\() () -> ()) ()
      M xf bx xx xz <*> M ya by yy yz = M
        (\(Pair' x y) -> xf x $ ya y)
        (\b -> Pair' (bx b) (by b))
        (\(Pair' x1 y1) (Pair' x2 y2) -> Pair' (xx x1 x2) (yy y1 y2))
        (Pair' xz yz)

    instance Monad (M a) where
      return = pure
      m >>= f = M (\xs a -> run xs (f a)) One Two Zero <*> m
Our return/pure just instantiates a trivial fold that consumes ()s and outputs the value we gave it. For <*> we run both machines strictly next to each other and apply the final result of one to the final result of the other.
Bind creates a new fold that creates a tree. This tree contains every input fed to it as it’s folding and stores each merge a node in the tree. While we run this, we also run the original m we were given. Finally, when we reach the end, we apply f to the result of m and run this over the tree we’ve created which is foldable. If you remember back to the comment of Tree a capturing foldMap this is what was meant by it: we’re using a tree to suspend a foldMap until we’re in a position to run it.
Now for comonad.
    instance Comonad (M a) where
      extract (M k _ _ z) = k z
      duplicate (M k h m z) = M (\n -> M (k . m n) h m z) h m z
We can be pleasantly surprised that most of this code is the same. Extraction grabs our current state and presents it. Duplication creates a fold which will run and return a new fold. This new fold has the same initial state as the original fold, but when it goes to present its results it will merge it with the final state of the outer fold. This is very different from before and I suspect it will significantly impact our Folding instance.
    instance Folding M where
      run s (M k h m (z :: m)) = reify (m, z) $
        \ (_ :: Proxy s) -> k $ runN (foldMap (N #. h) s :: N m s)
      prefix s (M k h m (z :: m)) = reify (m, z) $
        \ (_ :: Proxy s) -> case runN (foldMap (N #. h) s :: N m s) of
          x -> M (\y -> k (m x y)) h m z
      postfix (M k h m (z :: m)) s = reify (m, z) $
        \ (_ :: Proxy s) -> case runN (foldMap (N #. h) s :: N m s) of
          y -> M (\x -> k (m x y)) h m z
      filtering p (M k h m z) = M k (\a -> if p a then h a else z) m z
This was a little intimidating so I took the liberty of ignoring *Of functions which are pretty much the same as what we have here.
To run a fold we use foldMap, but foldMap wants to work over monoids and we only have z and m. To promote this to a type class we use reify and N. Remember N from way back when? It’s the data type that uses reflection to yank a tuple out of our context and treat it as a monoid instance. In all of this code we use reify to introduce a tuple to our environment and N as a pseudo-monoid that uses m and z.
with this in mind, this code uses N #. h which uses the normal conversion function to introduce something into the N monoid. Then foldMap takes care of the rest and all we need do is call runN to extract the results.
prefix and postfix are actually markedly similar. They both start by running the fold over the supplied structure which reduces it to an m. From there, we create a new fold which is identical in all respects except the presentation function. The new presentation function uses m to combine the pre/post-fixed result with the new result. If we’re postfixing, the postfixed result is on the right, if we’re prefixing, the left.
What’s particularly stunning is that neither of these leak! We don’t need to hold onto the structure in our new fold so we can prefix and postfix in constant memory.
Fold.hs
Now that we’ve gone through a bunch of instances of Folding and Scanning, we’re in a position to actually look at what Data.Fold exports.
    module Data.Fold
      ( Scan(..)
      , Folding(..)
      , beneath
      , L1(..)  -- lazy Mealy machine
      , L1'(..) -- strict Mealy machine
      , M1(..) -- semigroup reducer
      , R1(..) -- reversed lazy Mealy machine
      , L(..) -- lazy Moore machine
      , L'(..) -- strict Moore machine
      , M(..) -- monoidal reducer
      , R(..) -- reversed lazy Moore machine
      , AsRM1(..)
      , AsL1'(..)
      , AsRM(..)
      , AsL'(..)
      ) where
So aside from the folds we’ve examined before, there are 4 new classes, AsRM[1], and AsL[1]'. We’ll look at the non-1 versions.
    class AsRM1 p => AsRM p where
      asM :: p a b -> M a b
      asR :: p a b -> R a b
So this class covers the class of p’s that know how to convert themselves to middle and right folds. Most of these instances are what you’d expect if you’ve ever done the “write foldl as foldr” trick or similar shenanigans.
For M
    instance AsRM M where
      asR (M k h m z) = R k (m.h) z
      asM = id
asM is trivially identity and since m is expected to be associative we don’t really care that R is going to associate it strictly to the right. We just glue h onto the front to map the next piece of input into something we know how to merge.
Next is R
    instance AsRM R where
      asM (R k h z) = M (\f -> k (f z)) h (.) id
      asR = id
For right folds we do something a bit different. We transform each value into a function of type m -> m which is the back half of a folding function. We can compose these associatively with . since they are just functions. Finally, when we need to present this, we apply this giant pipeline to the initial state and present the result. Notice here how we took a nonassociative function and bludgeoned it into associativity by partially applying it.
For L' we do something similar
    instance AsRM L' where
      asR (L' k h z) = R (\f -> k (f z)) (\b g x -> g $! h x b) id
      asM = asM . asR
We once again build up a pipeline of functions to make everything associative and apply it at the end. We can’t just use . though for composition because we need to force intermediate results. That’s why you see \b g x -> g $! h x b, it’s just strict composition.
It makes sense that we’d bundle right and monoidal folds together because every right fold can be converted to a monoidal and every monoidal fold to a right. That means that every time we can satisfy one of these functions we can build the second.
This isn’t the case for left folds because we can’t convert a monoidal or right fold to a left one. For the people who are dubious of this, foldl doesn’t let us capture the same amount of laziness we need. I forgot about this too and subsequently hung my machine trying to prove Edward Kmett wrong.
This means that the AsL' is a fairly boring class,
    class (AsRM p, AsL1' p) => AsL' p where
      asL' :: p a b -> L' a b

    instance AsL' L where
      asL' (L k h z) = L' (\(Box r) -> k r) (\(Box r) a -> Box (h r a)) (Box z)
Now we finally see the point of Box, it’s designed to stubbornly block attempts at making its contents strict. You can see this because all the instance for L does is wrap everything in Boxes! Since L' is the same as L with some extra seqs, we can use Box to nullify those attempts at strictness and give us a normal left fold.
That’s it! We’re done!
Wrap Up
Now that we’ve gone through a few concrete implementations and the overall structures in this package hopefully this has come together for you. I must say, I’m really quite surprised at how effectively comonadic operations can capture compositional folds. I’m certainly going to make an effort to use this package or Gabriel’s foldl a bit more in my random “tiny Haskell utility programs”.
If you’re as entranced by these nice little folding libraries as I am, I’d recommend

Gabriel’s post
Ed Kmett’s post
Max Rabkin’s post

Trivia fact: this is the longest article out of all 52 posts on Code & Co.
Update: I decided it might be helpful to write some utility folds for folds. I figured this might be interesting to some.

          
          
          comments powered by Disqus



Examining Hackage: operational
Danny Gratzer — Thu, 25 Dec 2014 00:00:00 UT

    Posted on December 25, 2014
    


    
    Tags: haskell
    


In this installment of “jozefg is confused by other people’s code” we turn to operational. This is a package that’s a little less known than I’d like. It provides a monad for transforming an ADT of instructions, a monad that can be used with do notation and separates out interpretation.
Most people familiar with free monads are wondering what the difference is between operational’s approach and using free monads. Going into this, I have no clue. Hopefully this will become clear later on.
Diving Into The Source
Let’s get started shall we
~$ cabal get operational
Happily enough, there’s just one (small) file so we’ll go through that.
To start with Control.Monad.Operational exports
    module Control.Monad.Operational (
        Program, singleton, ProgramView, view,
        interpretWithMonad,
        ProgramT, ProgramViewT(..), viewT,
        liftProgram,
        ) where
Like with most “provides a single monad” packages, I’m most interested in how Program works. Looking at this, we see that it’s just a synonym
    type Program instr = ProgramT instr Identity
Just like the mtl, this is defined in terms of a transformer. So what’s this transformer?
    data ProgramT instr m a where
        Lift   :: m a -> ProgramT instr m a
        Bind   :: ProgramT instr m b -> (b -> ProgramT instr m a)
               -> ProgramT instr m a
        Instr  :: instr a -> ProgramT instr m a
So ProgramT is a GADT, this is actually important because Bind has an existential type variable: b. Otherwise this is really just a plain tree, I assume (>>=) = Bind and return = Lift . return in the monad instance for this. And finally we can see that instructions are also explicitly supported with Instr.
We can confirm that the Monad instance is as boring as we’d expect with
    instance Monad m => Monad (ProgramT instr m) where
        return = Lift . return
        (>>=)  = Bind

    instance MonadTrans (ProgramT instr) where
        lift   = Lift

    instance Monad m => Functor (ProgramT instr m) where
        fmap   = liftM

    instance Monad m => Applicative (ProgramT instr m) where
        pure   = return
        (<*>)  = ap
So clearly there’s no interesting computation happening here. Looking at the export list again, we see that there’s a helpful combinator singleton for building up these Program[T]s since they’re kept abstract.
    singleton :: instr a -> ProgramT instr m a
    singleton = Instr
Which once again is very boring.
So this is a lot like free monads it seems since neither one of these actually does much in its monad instance. Indeed the equivalent with free monads would be
    data Free f a = Pure a | Free (f (Free f a))
    instance Functor f => Monad (Free f) where
      return = Pure
      Pure a >>= f = f a
      (Free a) >>= f = Free (fmap (>>= f) a)

    singleton :: Functor f => f a -> Free f a
    singleton = Free . fmap Pure
The obvious differences is that

Free requires a functor while Program doesn’t
Frees monad instance automatically guarantees laws

2 is the bigger one for me. Free has a tighter set of constraints on its f so it can guarantee the monad laws. This is clearly false with Program since return a >>= f introduces an extra Bind instead of just giving f a.
This would explain why ProgramT is kept abstract, it’s hopelessly broken just to expose it in its raw form. Instead what we have to do is somehow partially normalize it before we present it to the user.
Indeed that’s exactly what ProgramViewT is representing. It’s a simpler data type
    data ProgramViewT instr m a where
        Return :: a -> ProgramViewT instr m a
        (:>>=) :: instr b
               -> (b -> ProgramT instr m a)
               -> ProgramViewT instr m a
This apparently “compiles” a Program so that everything is either binding an instruction or a pure value. What’s interesting is that this seems to get rid of all Lift’s as well.
How do we produce one of these? Well that seems to be viewT’s job.
    viewT :: Monad m => ProgramT instr m a -> m (ProgramViewT instr m a)
    viewT (Lift m)                = m >>= return . Return
    viewT ((Lift m)     `Bind` g) = m >>= viewT . g
    viewT ((m `Bind` g) `Bind` h) = viewT (m `Bind` (\x -> g x `Bind` h))
    viewT ((Instr i)    `Bind` g) = return (i :>>= g)
    viewT (Instr i)               = return (i :>>= return)
Note that this function returns an m (ProgramViewT instr m a), not just a plain ProgramViewT. This makes sense because we have to get rid of the lifts. What I think is particularly interesting here is that the 2nd and 3rd cases are just the monad laws!
The second one says binding to a computation is just applying the function to it in the obvious manner. The third re-associates bind in a way guaranteed by the monad laws.
This means that while ProgramT isn’t going to satisfy the monad laws, we can’t tell because all the things said to be equal by the monad laws will compile to the same view. Terribly clever stuff.
The rest of the module is mostly boring stuff like Monad* instances. The last interesting functions is interpretWithMonad
    interpretWithMonad :: forall instr m b.
        Monad m => (forall a. instr a -> m a) -> (Program instr b -> m b)
    interpretWithMonad f = eval . view
        where
        eval :: forall a. ProgramView instr a -> m a
        eval (Return a) = return a
        eval (m :>>= k) = f m >>= interpretWithMonad f . k
This nicely highlights how you’re supposed to write an interpreter for a Program. eval handles the two cases of the view using the mapping to a monad we provided and view handles actually compiling the program into these two cases. All in all, not too shabby.
Surprise, There Were Docs The Whole Time!
Now I assume that most people didn’t actually download the source to operational, but you really should! Inside you’ll find a whole directory, doc. It contains a few markdown files with explanations and references to the appropriate papers as well as a couple examples of actually building things with operational.
Now that you understand how the current implementation works, you should be able to understand most of what is being said there.
Wrap Up
So operational illustrates a neat trick I rather like: using modularity to provide an O(1) implementation of >>= and hide its rule breaking with a view.
This package also drops the positivity requirement that Free implies with its functor constraint. Which I suppose means you could have
    data Foo a where
      Bar :: (a -> ...) -> Bar a
Which is potentially useful.
Last but not least, operational really exemplifies having a decent amount of documentation even though there’s only ~100 lines of code. I think the ratio of documentation : code is something like 3 : 1 which I really appreciate.

          
          
          comments powered by Disqus



What Are Impredicative Types?
Danny Gratzer — Tue, 23 Dec 2014 00:00:00 UT

    Posted on December 23, 2014
    


    
    Tags: haskell, types
    


So the results from Stephen’s poll are in! Surprisingly, impredicative types topped out the list of type system extensions people want to talk about so I figured I can get the ball rolling.
First things first, all the Haskell code will need the magical incantation
    {-# LANGUAGE ImpredicativeTypes #-}
What Is Impredicative Polymorphism
We have a lot of extensions that make polymorphism more flexible in Haskell, RankNTypes and Rank2Types spring to mind. However, one important feature lacking is “first class polymorphism”.
With impredicative polymorphism forall’s become a normal type like any other. We can embed them in structures, toss them into polymorphic functions, and generally treat them like any other type.
Readers with a mathematical background will wonder why these are called “impredicative” types then. The idea is that since we can have polymorphic types embedded in other structures, we could have something like
    type T = (Int, forall a. a -> Int)
That a could assume any time including T. So each type definition can quantify over itself which nicely corresponds to the mathematical notion of impredicativity.
One simple example where this might come up is when dealing with lenses. Remember lenses have the type
    type Lens {- viciously -} s t a b = forall f. (a -> f b) -> s -> f t
If we were to embed lenses in let’s say a tuple,
    type TLens a b = (Lens a a (a, b) (a, b), Lens b b (a, b) (a, b))

    foo :: TLens Int Bool
    foo = (_1, _2)
We’d need impredicative types because suddenly a polymorphic type has appeared within a structure.
Why No One Uses It
Now that we’ve seen how amazing impredicative polymorphism, let’s talk about how no one uses it. There are two main reasons

GHC’s support for impredicative types is fragile at best and broken at worst
Avoiding the need for impredicative types is very straightforward

Reason 1 isn’t exactly a secret. In fact, SPJ has stated a number of times that he’d like to deprecate the extension since it’s very hard to maintain with everything else going on.
As it stands right now, our only choice is more or less to type check a program and add type signatures when GHC decides to instantiate our beautiful polymorphic type with fresh monomorphic type variables.
For this reason alone, impredicative types aren’t really the most useful thing. The final nail in the coffin is that we can easily make things more reliable by using newtypes. In lens for example we avoid impredicativity with
    newtype ScopedLens s t a b =
      ScopedLens {getScopedLens :: Lens s t a b}
This means that instead of impredicative types we just need rank N types, which are much more polished.
Wrap Up
Well, I’m sorry to be the bearer of bad news for those who filled out -XImpredicativeTypes on the poll, but there you are.
To end on a positive note however, I do know of two example of where impredicative types did save the day. I’ve used impredicative type exactly once to handle church lists properly. Lennart Augustson’s Python DSL makes heavy use of them to present a unified face for variables.

          
          
          comments powered by Disqus



Notes on Parametricity
Danny Gratzer — Mon, 22 Dec 2014 00:00:00 UT

    Posted on December 22, 2014
    


    
    Tags: types, notes
    


I like types. If you haven’t figured this out from my blog I really don’t know where you’ve been looking :) If you’ve ever talked to me in real life about why I like types, chances are I mentioned ease of reasoning and correctness.
Instead of showing how to prove parametricity I’d like to show how to rigorously apply parametricity. So we’ll be a step above handwaving and a step below actually proving everything correct.
What is Parametricity
At a high level parametricity is about the behavior of well typed terms. It basically says that when we have more polymorphic types, there are fewer programs that type check. For example, the type
    const :: a -> b -> a
Tells us everything we need to know about const. It returns it’s first argument. In fact, if it returns anything (non-bottom) at all, it simply must be its first argument!
Parametricity isn’t limited to simple cases like this however, it can be used to prove that the type
    forall c. c -> (a -> c -> c) -> c
Is completely isomorphic to [a]!
We can use parametricity to prove free theorems, like if map id = id then map f . map g = map (f . g).
These are non-obvious properties and yet parametricity gives us the power to prove all of them without even looking at the implementation of these functions. That’s pretty cool!
Handwavy Parametricity
In order to get an idea of how to use parametricity, let’s do some handwavy proofs to get some intuition for how parametricity works.
Start with id.
    id :: a -> a
We know right away that id takes some value of type a and returns another value a. Most people would safely guess that the returned value is the one we fed it.
In fact, we can kinda see that this is the only thing it could do. If it didn’t, then somehow it’d have to create a value of type a, but we know that that’s impossible! (Yeah, yeah, I know, bottom. Look the other way for now)
Similarly, if map id is just id, then we know that map isn’t randomly dropping some elements of our list. Since map isn’t removing elements, in order to take an a to a b, map has to be applying f to each element! Since that’s true, we can clearly see that
    map f . map g = map (f . g)
because we know that applying f and then applying g is the same as apply f and g at the same time!
Now these handwavy statements are all based on one critical point. No matter how we instantiate a type variable, the behaviour we get is related. Instantiating something to Bool or Int doesn’t change the fundamental behaviour about what we’re instantiated.
Background
Before we can formally define parametricity we need to flesh out a few things. First things first, we need to actually specify the language we’re working in. For our purposes, we’ll just deal with pure System F.
ty ::= v                [Type Variables]
     | ty -> ty         [Function Types]
     | forall v. ty     [Universal Quantification]
     | Bool             [Booleans]

exp ::= v               [Variables]
      | exp exp         [Application]
      | λv : ty -> exp  [Abstraction]
      | Λv -> exp       [Type Abstraction]
      | exp[ty]         [Type Application]
      | true            [Boolean]
      | false           [Boolean]
The only real notable feature of our language is that all polymorphism is explicit. In order to have a full polymorphic type we have to use a “big lambda” Λ. This acts just like a normal lambda except instead of abstracting over a term this abstracts over a type.
For example the full term for the identity function is
id = Λ A -> \x : A -> x
From here we can explicitly specialize a polymorphic type with type application.
id[Bool] true
Aside from this, the typing rules for this language are pretty much identical to Haskell’s. In the interest of brevity I’ll elide them.
Actual Parametricity
Now that we have our language, let’s talk about what we’re interested in proving. Our basic goal is to show that two expressions e1 and e2 are equal. However, we don’t want to use a == sort of equality. We really mean that they can’t be distinguished by our programs. That for all programs with a “hole”, filling that hole with e1 or e2 will produce identical results. This is called “observational equivalence” usually and notated with ≅.
This is a bit more general than just ==, for example it let’s us say that flip const () ≅ id. Now let’s define another notion of equality, logical equivalence.
This logical equivalence is an attempt to define equality without just saying “running everything produces the same result”. It turns out it’s really really hard to prove things that aren’t syntactically equivalent will always produce the same result!
Our logical equivalence ~ is defined in a context η : δ ↔ δ'. The reason for this is that our terms may have free type variables and we need to know how to deal with them. Each δ maps the free types in the types of our terms to a concrete types and η is a relationship for comparing δ(v) with δ'(v).
Put less scarily, η is a set of rules that say how to compare two terms when the have both are of type v. This is an important part of our logical relation: it deals with open terms, terms with free variables.
Now η isn’t composed of just any relationship between terms, it has to be “admissible”. Admissibility means that for some relation R, two conditions hold

If e R e' and d ⇒ e and d' ⇒ e', then d R d'
If e R e' and d ≅ e and d' ≅ e', then d R d'

The first rule means that R is closed under evaluation and the second says that R respects observational equivalence.
Now we define our logical equivalence in some context δ to be

When e, e' : τ, e ~ e' [η] if e δ(t) e'
When e, e' : Bool, e ~₂ e' [η] if e ⇓ v and e' ⇓ v
When f, g : a → b, f ~ g [η] if when a ~ b [η], f a ~ g b [η]
When e e' : ∀ v. t, e ~ e' [η]

if R : p ↔ p', e[p] ~ e'[p'] [η[v ↦ R]]

Now this rule has 4 cases, one for each type. That’s the first critical bit of this relation, we’re talking about things by the structure of the type, not the value itself.
Now with this in mind we can state the full parametricity theorem.

For all expressions e and mappings η, e ~ e [η]

That’s it! Now this is only really useful when we’re talking about polymorphic type, then parametricity states that for any admissible relation R, two different instantiations are related.
While I won’t go into how to prove it, another important results we’ll use for proofs with parametricity is that (∀η. e ~ e' [η]) ⇔ e ≅ e'.
Applying Parametricity
Now that I’ve said exactly what parametricity is, I’d like to step through a few proofs. The goal here is to illustrate how we can use this to prove some interesting properties.
First we just have to prove the classic result that any f : forall a. a -> a is equivalent to id = Λa. λx : a. x.
To prove this we need to show f ~ id [η]. For this we need to show that for any admissible relation R between τ and τ', then f[τ] ~ λx : τ'. x [η[a ↦ R]. Stepping this one more time we end up with the goal that e R e' then f[τ] e ~ e' ⇔ f[τ] e R e'
Now this is where things get tricky and where we can apply parametricity. We know by definition that f ~ f [η]. We then choose a new relation S : τ' ↔ τ' where d S d' if and only d ≅ e' and d' ≅ e'. Exercise to the reader: show admissibility.
From here we know that f[τ] ~ f[τ] [η[a ↦ R]] and since e S e then f[τ] e ~ f[τ] e which implies f[τ] e S f[τ] e. This means that f[τ] e ≅ e. From our note above, f[τ] e ~ e and by transitivity we have f[τ] e R e'.
Now we can prove something similar, that (f : a → b → a) ≅ const. The proof is very similar,
 f ~ const [η]
 f[τ][ν] ~ const[τ'][ν'] [η[a ↦ R][b ↦ S]]
 f[τ][ν] a b ~ a' [η[a ↦ R][b ↦ S]] where a R a'
Now we need to show that f a b ≅ a. For this we define T to be an admissible relationship where d T d' if and only if d ≅ a ≅ d'. From here we also define U to be an admissible relation where a U b if and only if a ~ b.
Now we know that f ~ f [η] and so
f[τ][ν] ~ f[τ'][ν'] [η[a ↦ T][b ↦ U]]`
And since a T a and b U b, we know that
f[τ][ν] a b ~ f[τ'][ν'] a b [η[a ↦ T][b ↦ U]]
this means that f a b ≅ a and completes our proofs. Hopefully this reinforces the idea of using parametricity and admissible relationships to produces our properties.
Now for something a bit trickier. Church numerals are a classic idea from lambda calculus where
 0 ≡ λs. λz. z
 1 ≡ λs. λz. s z
 2 ≡ λs. λz. s (s z)
And so on. In terms of types,
    type Nat = forall c. (c → c) → c → c
Now intuitively from this type it seems obvious that this only allows us to apply the first argument n types to the second, like a church numeral. Because of this we want to claim that we can compose the first argument with itself n times before applying it to the second or for all c : Nat, there exists an n so that compose n ≡ c.
To prove this we proceed as before and we end up with
 compose[τ] s z ~ c[τ'] s' z' [η[c ↦ R]]
Now we define a new relation S where

a S b if a ≅ z' ≅ b
a S b if n S n' and a ≅ s' n and b ≅ s' n'

Now we know that c[τ'] s' z' S c[τ'] s' z' so by inversion on this we can determine that n applications of s' followed by z'.
Set the n for compose to this new n. From here our result follows by induction on n.
This proof means there’s a mapping from c to n. The curious reader is encouraged to show this is an invertible mapping and complete the proof of isomorphism.
A Note on Free Theorems
Now most people in the Haskell community have heard the term “free theorem”, what they may not realize is that free theorems are a direct result of parametricity.
In fact, if you read Wadler’s original paper sections 5 and onwards establish parametricity. What’s interesting here is that Wadler opts to establish it in a similar way to how Reynolds did. He first defines a mathematical structure called a “type frame”.
This structure lets us map a program in something like System F or Haskell into pure math functions. From there it defines relationships in a similar way to our logical relation and shows it’s reflexive.
I didn’t opt for this route because

Denotational semantics scare me a bit
Type frames need more math to make sense

It’s still definitely worth reading for the curious though.
Wrap Up
Now that we’ve defined parametricity and established a few theorems for it, I hope you can start to see the advantage of types to guide our programs. General enough types can give us assurances without every even looking at the code in question.
Aren’t types cool?

          
          
          comments powered by Disqus



Treating Programs like Vending Machines
Danny Gratzer — Fri, 19 Dec 2014 00:00:00 UT

    Posted on December 19, 2014
    


    
    Tags: haskell, types
    


Proving things about programs is quite hard. In order to make it simpler, we often lie a bit. We do this quite a lot in Haskell when we say things like “assuming everything terminates” or “for all sane values”. Most of the time, this is alright. We sometimes need to leave the universe of terminating things, though, whenever we want to prove things about streams or other infinite structures.
In fact, once we step into the wondrous world of the infinite we can’t rely on structural induction anymore. Remember that induction relies on the “well foundedness” of the thing we’re inducting upon, meaning that there can only be finitely many things smaller than any object. However there is an infinite descending chain of things smaller than an infinite structure! For some intuition here, something like foldr (which behaves just like structural induction) may not terminate on an infinite list.
This is quite a serious issue since induction was one of our few solid tools for proof. We can replace it though with a nifty trick called coinduction which gives rise to a useful notion of equality with bisimulation.
Vending Machines
Before we get to proving programs correct, let’s start with proving something simpler. The equivalence of two simple machines. These machines (A and B) have 3 buttons. Each time we push a button the machine reconfigures itself. A nice real world example of such machines would be vending machines. We push a button for coke and out pops a (very dusty) can of coke and the machine is now slightly different.
Intuitively, we might say that two vending machines are equivalent if and only if our interactions with them can’t distinguish one from the other. That is to say, pushing the same buttons on one gives the same output and leaves both machines in equivalent states.
To formalize this, we first need to formalize our notion of a vending machine. A vending machine is a comprised set of states. These states are connected by arrows labeled with a transition. We’ll refer to the start of a transition as its domain and its target as the codomain. This group of transitions and states is called a labeled transition system (LTS) properly.
To recap how this all relates back to vending machines

A state is a particular vending machine at a moment in time
A transition between A and B would mean we could push a button on A and wind up with B
The label of such a transition is the delicious sugary drink produced by pushing the button

Notice that this view pays no attention to all the mechanics going on behind the scenes of pushing a button, only the end result of the button push. We refer to the irrelevant stuff as the “internal state” of the vending machine.
Let’s consider a relation R with A R B if and only if

There exists a function f from transitions from A to transitions from B so that x and f(x) have the same label.
Further, if A R B and A has a transition x, then the codomain of x is related to the codomain of f(x).
There is a g satisfying 1. and 2., but from transitions from B to transitions from A.

This definition sets out to capture the notion that two states are related if we can’t distinguish between them. The fancy term for such a relation is a bisimulation. Now our notion of equivalence is called bisimilarity and denoted ~, it is the union of all bisimulations.
Now how could we prove that A ~ B? Since ~ is the union of all bisimulations, all we need to is construct a bisimulation so that A R B and hey presto, they’re bisimilar.
To circle back to vending machine terms, if for every button on machine A there’s a button on B that produces the same drink and leaves us with related machines then A and B are the same.
From Vending Machines to Programs
It’s all very well and good that we can talk about the equality of labeled transition systems, but we really want to talk about programs and pieces of data. How can we map our ideas about LTSs into programs?
Let’s start with everyone’s favorite example, finite and infinite lists. We define our domain of states to be
L(A) = {nil} ∪ {cons(n, xs) | n ∈ ℕ ∧ xs ∈ A}
We have to define this as a function over A which represents the tail of the list which means this definition isn’t recursive! It’s equivalent to
    data ListF a = Cons Int a | Nil
What we want here is a fixed point of L, an element X so that L(X) = X. This is important because it means
    cons :: ℕ → X → L(X)
    cons :: ℕ → L(X) → L(X)
Which is just the type we’d expect cons to have. There’s still a snag here, what fixed point do we want? How do we know one even exists? I’d prefer to not delve into the math behind this (see TAPL’s chapter on infinite types) but the gist of it is, if for any function F

F is monotone so that x ⊆ y ⇒ F(x) ⊆ F(y)
F is cocontinuous so that ∩ₓF(x) = F(∩ₓ x)

Then there exists an X = F(X) which is greater or equal to all other fixpoints. The proof of this isn’t too hard, I encourage the curious reader to go and have a look. Furthermore, poking around why we need cocontinuity is enlightening, it captures the notion of “nice” lazy functions. If you’ve looked at any domain theory, it’s similar to why we need continuity for least fixed pointed (inductive) functions.
This greatest fixed point what we get with Haskell’s recursive types and that’s what we want to model. What’s particularly interesting is that the greatest fixed point includes infinite data which is very different than the least fixed point which is what we usually prefer to think about when dealing with things like F-algebras and proofs by induction.
Now anyways, to show L has a fixed point we have to show it’s monotone. If X ⊆ Y then L(X) ⊆ L(Y) because x ∈ L(X) means x = nil ∈ L(Y) or x = cons(h, t), but since t ∈ X ⊆ Y then cons(h, t) ∈ L(Y). Cocontinuity is left as an exercise to the reader.
So L has a greatest fixed point: X. Let’s define an LTS with states being L(X) and with the transitions cons(a, x) → x labeled by a. What does bisimilarity mean in this context? Well nil ~ nil since neither have any transitions. cons(h, t) ~ cons(h', t') if and only if h = h' and t ~ t'. That sounds a lot like how equality works!
Demonstrate this let’s define two lists
    foo = cons(1, foo)
    bar = cons(1, cons(1, bar))
Let’s prove that foo ~ bar. Start by defining a relation R with foo R bar. Now we must show that each transition from foo can be matched with one from bar, since there’s only one from each this is easy. There’s a transition from foo → foo labeled by 1 and a transition from bar → cons(1, bar) also labeled by one. Here lies some trouble though, since we don’t know that foo R cons(1, bar), only that foo R bar. We can easily extend R with foo R cons(1, bar) though and now things are smooth sailing. The mapping of transitions for this new pair is identical to what we had before and since we know that foo R bar, our proof is finished.
To see the portion of the LTS our proof was about
 foo                  bar
  | 1                  | 1
 foo             cons(1, bar)
  | 1                  | 1
 foo                  bar
and our bisimulation R is just given by {(foo, bar), (foo, cons(1, bar))}.
Now that we’ve seen that we can map our programs into LTSs and apply our usual tricks there, let’s formalize this a bit.
A More Precise Formulation of Coinduction
First, what exactly is [co]induction? Coinduction is a proof principle for proving something about elements of the greatest fixed point of a function, F. We can prove that the greatest fixed point, X, is the union of all the sets Y so that Y ⊆ F(Y).
If we can prove that there exists an Y ⊆ F(Y) that captures our desired proposition then we know that Y ⊆ gfp(F). That is the principle of coinduction. Unlike the principle of induction we don’t get proofs about all members of a set, rather we get proves that there exists members which satisfy this property. It also should look very similar to how we proved things about ~.
So now that we’ve defined coinduction across a function, what functions do we want to actually plop into this? We already now what we want for lists,
List(A) = {nil} ∪ {cons(h, t) | h ∈ X ∧ t ∈ A}
But what about everything else? Well, we do know that each value is introduced by a rule. These rules are always of the form
Some Premises Here
——————————————————
 conclusion here
So for example, for lists we have
——————————————
 nil ∈ List(A)

    h ∈ H   t ∈ A
————————————–————————
 cons(h, t) ∈ List(A)
Now our rules can be converted into a function with a form like
 F(A) = ∪ᵣ {conclusion | premises}
So for lists this gives
F(A) = {nil} ∪ {cons(h, t) | h ∈ H ∧ t ∈ A}
as expected. We can imagine generalizing this to other things, like trees for example
————————–—————
leaf ∈ Tree(A)

x ∈ H    l ∈ A    r ∈ A
————————–—————————————–
 node(h, l, r) ∈ Tree(A)

Tree(A) = {leaf} ∪ {node(h, l, r) | x ∈ H ∧ l ∈ A ∧ r ∈ A}
Now the most common thing we want to prove is some notion of equability. This is harder then it seems because the usual notions of equality don’t work.
Instead we can apply bisimulation. Our approach is the same, we define a criteria for what it means to be a bisimulation across a certain type and define ~ as the union of all bisimulations. On lists we wanted the heads to be equal and the tails to be bisimilar, but what about on trees? We can take the same systematic approach we did before by considering what an LTS on trees would look like. leaf has no information contained in it and therefore no transitions. node(a, l, r) should have two transitions, left or right. Both of these give you a subtree contained by this node. What should they be labeled with? We can follow our intuitions from lists and label them both with a.
This leaves us with the following definition, R is a bisimulation on trees if and only if

leaf R leaf
node(a, l, r) R node(a', l', r') if and only if a = a' ∧ l R l' and r R r'

So to prove an equality between trees, all we must do is provide such an R and then we know R ⊆ ~!
This describes how we deal with coinductive things in general really. We define what it means for a relation to partially capture what we’re talking about (like a bisimulation) and then the full thing is the union of all of these different views! Sortedness could be expressed as a unitary relation S where

nil ∈ S
cons(a, xs) ∈ S → xs ∈ S ∧ Just a ≤ head xs

the sorted predicate is the union of all such relations!
Dithering about Duality
I’d like to reel off some pithy dualities between induction and coinduction

Coinduction is about observation, induction is about construction
Coinduction proves existentials, induction proves universals
Coinduction is proving your property is a subset of a group, induction is about proving a group is a subset of your property

Wrap Up
So that about wraps up this post.
We’ve seen how infinite structures demand a fundamentally different approach to proofs then finite ones. It’s not all puppies and rainbows though, considering how we managed to spend nearly 300 lines talking about it, coinduction is a lot less intuitive. It is however, our only choice if we want to have “real proofs” in Haskell (terms and conditions may apply).

          
          
          comments powered by Disqus



Cooking λΠ 3 ways
Danny Gratzer — Wed, 17 Dec 2014 00:00:00 UT

    Posted on December 17, 2014
    


    
    Tags: haskell, compilers
    


After my last post, I didn’t quite feel like ending there. I was a little dissatisfied with how binding was handled in the type checker, the odd blend of HOAS, GUIDs, and DeBruijn variables was… unique.
In the post I explore 3 versions of the same code

The original method
Using bound to handle all binding
Full HOAS

There’s a lot of code in this post, enough that I think it’s worth hosting the code on its own. You can find it on github and bitbucket.
The Original
I’ve already described most of the original method here. To recap

Values were HOAS
Terms were DeBruijn
To bridge the gap, we had “free constants” randomly generated

The issue I had with this is we almost got the worst of all 3 worlds! We were constantly bumping a counter to keep up with the free constants we needed to generate. We had to muddy up the types of values with another notion of free constants so we could actually inspect variables under HOAS binders! And finally, we had to do the painful and tedious substitutions on DeBruijn terms.
On the other hand, if you’d never used any of those binding schemes together, you too can go triple or nothing and try to understand that code :)
What I really wanted was to unify how I represented values and terms. I still wanted a clearly correct notion of equality, but in this way I could probably dodge at least two of the above.
The obvious thing to do would be to stick with DeBruijn variables and just instantiate free variables with constants. This is ugly, but it’s moderately less horrible if we use a library to help us with the process.
bound
So my first stab at this approach was with Edward Kmett’s bound. For those who aren’t familiar with this library, it centers around the data type Scope. Scope b f a binds variables of type b in the structure f with free variables of type a. The assumption is that f will be a monad which represents our AST.
Further, f is parameterized over variables, it doesn’t attempt to distinguish between bound and free ones however. This means that >>= corresponds to substitution. Then what Scope does is instantiate these variables to B b a which is precisely equivalent to Either b a.
What this results in is that each free variable is a different type from bound ones. Scope provides various functions for instantiating bound variables and abstracting over free ones. That’s bound in a nutshell.
It’s a bit easier to grok this by example, here’s our calculus ported to use Scope
    data Expr a = Var a
                | App (Expr a) (Expr a)
                | Annot (Expr a) (Expr a)
                | ETrue
                | EFalse
                | Bool
                | Star
                | Pi (Expr a) (Scope () Expr a)
                | Lam (Scope () Expr a)
                | C String
                deriving(Functor, Eq)
So the first major difference is that our polarization between inferrable and checkable terms is gone! This wasn’t something I was happy about, but in order to use Scope we need a monad instance and we can’t define two mutually dependent monad instances without a function from CExpr -> IExpr, something that clearly doesn’t exist.
Since each binder can only bind one variable at a time, we represent the newly bound variable as just (). This would be more complicated if we supported patterns or something similar.
Now in addition to just this, we also need a bunch of boilerplate to define some type class instances for Scope’s benefit.
    instance Eq1 Expr where (==#) = (==)
    instance Applicative Expr where
      pure = return
      (<*>) = ap
    instance Monad Expr where
      return = Var
      Var a >>= f = f a
      (App l r) >>= f = App (l >>= f) (r >>= f)
      ETrue >>= _ = ETrue
      EFalse >>= _ = EFalse
      Bool >>= _ = Bool
      Star >>= _ = Star
      C s >>= _ = C s
      Annot l r >>= f = Annot (l >>= f) (r >>= f)
      Pi l s >>= f = Pi (l >>= f) (s >>>= f)
      Lam e >>= f = Lam (e >>>= f)
That weird >>>= is just >>= that works through Scopes. It’s a little bit frustrating that we need this somewhat boilerplate-y monad instance, but I think the results might be worth it.
From here we completely forgo an explicit Val type. We’re completely scrapping that whole HOAS and VConst ordeal. Instead we’ll just trust Scope’s clever Eq instance to handle alpha conversion. We do need to implement normalization though
    type Val = Expr

    nf :: Expr a -> Val a
    nf = \case
      (Annot e t) -> nf e -- Important, nf'd data throws away annotations
      (Lam e) -> Lam (toScope . nf . fromScope $ e)
      (Pi l r) -> Pi (nf l) (toScope . nf . fromScope $ r)
      (App l r) ->
        case l of
         Lam f -> nf (instantiate1 r f)
         l' -> App l' (nf r)
      e -> e
What’s interestingly different is actual work is shifted from within the higher order binders we had before into the case expression in App.
It’s also worth mentioning the few bound specifics here. toScope and fromScope expose the underlying f (V b a) that a Scope is hiding. We’re then can polymorphically recur (eat your heart out sml) over the now unbound variables and continue on our way.
Again, notice that I’ve defined nothing to do with substitution or scoping, this is all being handled by bound.
Now our actual type checker is still essentially identical. We’re still using monad-gen to generate unique variable names, it’s just that now bound handles the messy substitution. The lack of distinction between inferrable, checkable, and normalized terms did trip me up once our twice though.
    data Env = Env { localVars :: M.Map Int (Val Int)
                   , constants  :: M.Map String (Val Int) }
    type TyM = ReaderT Env (GenT Int Maybe)

    unbind :: (MonadGen a m, Functor m, Monad f) => Scope () f a -> m (a, f a)
    unbind scope = ((,) <*> flip instantiate1 scope . return) <$> gen

    unbindWith :: Monad f => a -> Scope () f a -> f a
    unbindWith = instantiate1 . return

    inferType :: Expr Int -> TyM (Val Int)
    inferType (Var i) = asks (M.lookup i . localVars) >>= maybe mzero return
    inferType (C s) = asks (M.lookup s . constants) >>= maybe mzero return
    inferType ETrue = return Bool
    inferType EFalse = return Bool
    inferType Bool = return Star
    inferType Star = return Star
    inferType (Lam _) = mzero -- We can only check lambdas
    inferType (Annot e ty) = do
      checkType ty Star
      let v = nf ty
      v <$ checkType e v
    inferType (App f a) = do
      ty <- inferType f
      case ty of
       Pi aTy body -> nf (App (Lam body) a) <$ checkType a aTy
       _ -> mzero
    inferType (Pi t s) = do
      checkType t Star
      (newVar, s') <- unbind s
      local (\e -> e{localVars = M.insert newVar (nf t) $ localVars e}) $
        Star <$ checkType s' Star

    checkType :: Expr Int -> Val Int -> TyM ()
    checkType (Lam s) (Pi t ts) = do
      (newVar, s') <- unbind s
      local (\e -> e{localVars = M.insert newVar (nf t) $ localVars e}) $
        checkType s' (nf $ unbindWith newVar ts)
    checkType e t = inferType e >>= guard . (== t)
I defined two helper functions unbind and unbindWith which both ease the process of opening a scope and introducing a new free variable. I actually split these off into a tiny library, but I haven’t uploaded it to hackage yet.

Code size decreased by ~50 lines
No more explicit substitution
All the annoying plumbing is in the monad instance which is pretty mechanical
We did lose the really nice separation of terms we had before though :(

I suppose that 4. would be a nonissue for a lot of people who don’t care about bidirectional type checkers.
HOAS
Higher order abstract syntax is a really nifty trick. The idea is that Haskell already has a perfectly good notion of variables and substitution lying around! Let’s just use that. We represent our functions with actual ->s and we don’t have a constructor for variables anymore.
The only issue is that Haskell doesn’t let us inspect the bodies of functions. We need to do this, however, for a type checker! To deal with this we dirty our AST a bit and add in IGen’s, placeholders for where normal Haskell variables would normally go. Our new AST looks like this
    data Expr = App Expr Expr
              | Annot Expr Expr
              | ETrue
              | EFalse
              | Bool
              | Star
              | Pi Expr (Expr -> Expr)
              | Lam (Expr -> Expr)
              | C String
              | IGen Int

    type NF = Expr
Notice how both Pi and Lam have functions embedded in them. Now normalization is actually quite slick because functions are easy to work with in Haskell
    nf :: Expr -> NF
    nf ETrue = ETrue
    nf EFalse = EFalse
    nf Bool = Bool
    nf Star = Star
    nf (C s) = C s
    nf (IGen i) = IGen i
    nf (Annot l _) = nf l
    nf (Pi t f) = Pi (nf t) (nf . f)
    nf (Lam f) = Lam (nf . f)
    nf (App l r) = case nf l of
      Lam f -> nf . f $ l
      l' -> App l' (nf r)
This is actually quite similar to the Val type we started with. That was also used HOAS and we end up with a similarly structured normalization.
For the same reason, the equivalence checking procedure is pretty much the same thing
    eqTerm :: NF -> NF -> Bool
    eqTerm l r = runGenWith (successor s) (IGen 0) $ go l r
      where s (IGen i) = IGen (i + 1)
            s _ = error "Impossible!"
            go Star Star = return True
            go Bool Bool = return True
            go ETrue ETrue = return True
            go EFalse EFalse = return True
            go (Annot l r) (Annot l' r') = (&&) <$> go l l' <*> go r r'
            go (App l r) (App l' r') = (&&) <$> go l l' <*> go r r'
            go (Pi t f) (Pi t' g) =
              (&&) <$> go t t' <*> (gen >>= \v -> go (f v) (g v))
            go (IGen i) (IGen j) = return (i == j)
            go _ _ = return False
In fact, the only differences are that

There are a few more cases, even though they won’t ever be called
We don’t need that horrible top level Enum instance

The only reason for two is that the amazing maintainer of monad-gen (hi!) rejiggered some the library to not be so Enum dependent.
Now from here our type checker is basically what we had before. In the interest of saving time, I’ll highlight the interesting bits: the constructors that bind variables.
    data Env = Env { localVars :: M.Map Int NF
                   , constants :: M.Map String NF }
    type TyM = GenT Int (ReaderT Env Maybe)

    inferType :: Expr -> TyM NF
    inferType (Pi t f) = do
      checkType t Star
      let t' = nf t
      i <- gen
      local (\e -> e{localVars = M.insert i t' $ localVars e}) $
        Star <$ checkType (f $ IGen i) Star

    checkType :: Expr -> NF -> TyM ()
    checkType (Lam f) (Pi t g) = do
      i <- gen
      let t' = nf t
          rTy = nf (g $ IGen i)
      local (\e -> e{localVars = M.insert i t' $ localVars e}) $
        checkType (f $ IGen i) rTy
At this point you may have started to notice the pattern, the only real difference here is that substitution is completely free. Otherwise, I don’t really have much to say about HOAS.
Wrap Up
In conclusion, I think we can all agree that the original version of this type checker was unpleasant to say the least. It did considerably improve with bound mostly because the normalize-and-compare equivalence checking is really easy since bound handles alpha conversion. On the other hand, actually doing work beneath a binder is a bit of a pain since we have to take care to never unwrap a binder with a previously bound variable. We handled this with a hacky little trick with monad-gen, but a permanent and clean solution still seems hard.
We can avoid this fully by hitching a ride on Haskell’s variables and substitution using HOAS, this is wonderful until it’s not. The issue is that comparing functions for equality is still a pain so we ended up with an equivalence check much like what we had in the original version.
In the future it’d be interesting to try this with unbound, a library in the same domain as bound with a very different approach.

          
          
          comments powered by Disqus



Examining Hackage: concurrent-supply
Danny Gratzer — Wed, 26 Nov 2014 00:00:00 UT

    Posted on November 26, 2014
    


    
    Tags: haskell
    


It’s been a while since I posted about some code I’ve been reading, but today I found a little gem: concurrent-supply. This package sets out to provide fast way to generate unique identifiers in a way that’s splittable and supports concurrency.
What’s particularly cool about this package is that the code is only about ~100 lines and a goodly chunk of that is pragramas to tell GHC to actually inline trivial functions.
The API is just 5 functions
    type Supply
    newSupply :: IO Supply
    freshId :: Supply -> (Int, Supply)
    splitSupply :: Supply -> (Supply, Supply)

    freshId# :: Supply -> (# Int, Supply #)
    splitSupply# :: Supply -> (# Int, Supply #)
Supply is the type for well.. supplies of fresh integers. We can grab an Int out of a supply producing a new supply as well. We can also split a supply so that we have two new supplies that will produce disjoint identifiers.
The idea here is that we can have supplies that are used from multiple concurrent threads and they won’t ever

Duplicate identifiers between supply
Hammer on the same supply and destroy all our concurrency

It does go without saying that eventually we run out of ints, so I suppose if you sit and prod a supply for a very long time, something bad will happen.
With that in mind, let’s take a look at the imports for Control.Concurrent.Supply.
    import Data.Hashable
    import Data.IORef
    import Data.Functor ((<$>))
    import Data.Monoid
    import GHC.IO (unsafeDupablePerformIO, unsafePerformIO)
    import GHC.Types (Int(..))
    import GHC.Prim (Int#)
So you can see that some interesting stuff is going to happen, we have both unboxed ints, and unsafe*PerformIOs. As a quick review, unsafeDupablePerformIO is for IO actions which are okay being forced at the same time by different threads which unsafePerformIO is a little bit more modest and ensures we only force things from one thread at a time.
With this in mind, the code starts with the classic definition of streams in Haskell.
    infixr 5 :-
    data Stream a = a :- Stream a
This is followed with some rather a few definitions,
    instance Functor Stream where
      fmap f (a :- as) = f a :- fmap f as

    extract :: Stream a -> a
    extract (a :- _) = a

    units :: Stream ()
    units = () :- units
    {-# NOINLINE units #-}
Do note that units won’t be inlined, this is unfortunately important when we’re thinking about with unsafe functions.
Now on top of streams we can define a rather important type, blocks.
    data Block = Block Int !(Stream Block)

    instance Eq Block where
      Block a (Block b _ :- _) == Block c (Block d _ :- _) = a == c && b == d

    instance Ord Block where
      Block a (Block b _ :- _) `compare` Block c (Block d _ :- _) = compare a c `mappend` compare b d

    instance Show Block where
      showsPrec d (Block a (Block b _ :- _)) = showParen (d >= 10) $
        showString "Block " . showsPrec 10 a . showString " (Block "
                            . showsPrec 10 b . showString " ... :- ...)"

    instance Hashable Block where
      hashWithSalt s (Block a (Block b _ :- _)) = s `hashWithSalt` a `hashWithSalt` b
So a block is an integer and an infinite number of other blocks. Notice that block identity is purely determined by the first two ints. This is contingent on the fact that all blocks are made with
    blockSize :: Int
    blockSize = 1024
    {-# INLINE blockSize #-}

    -- Minimum size to be worth splitting a supply rather than
    -- just CAS'ing twice to avoid multiple subsequent biased splits
    blockCounter :: IORef Int
    blockCounter = unsafePerformIO (newIORef 0)
    {-# NOINLINE blockCounter #-}

    modifyBlock :: a -> IO Int
    modifyBlock _ =
      atomicModifyIORef blockCounter $ \ i ->
        let i' = i + blockSize in i' `seq` (i', i)
    {-# NOINLINE modifyBlock #-}

    gen :: a -> Block
    gen x = Block (unsafeDupablePerformIO (modifyBlock x)) (gen <$> units)
    {-# NOINLINE gen #-}

    newBlock :: IO Block
    newBlock = return $! gen ()
    {-# NOINLINE newBlock #-}
This is the first bit of unsafe code, so let’s look at what’s going on. We have a normal constant blockSize which represents something, it’s not immediately clear what yet. There’s a global mutable variable blockCounter starting from zero. From there, we have gen which creates a block by making a thunk which unsafely bumps the block counter by 1024, returning its previous size. To get the stream of blocks we fmap units.
It’s worth wondering why we need this polymorphic argument. I’m reasonable certain it’s to prevent GHC from being clever and sharing that (unsafeDupablePerformIO ...) between blocks. That would be very bad. It might not do that if we where to use () instead of a but there’s no reason a future optimization (if it doesn’t exist already) wouldn’t figure out that there’s only one possible result type and reduce the whole thing to a CAF.
Now a newBlock wraps all this unsafe updating in IO and returns the application of gen ().
So what does all of this mean? Well each block thunk is going to have its own unique ID, separated by 1024 and only claimed whenever we actually force its first component. We have this gnarly chunk of mutable shared memory that we only ever modify with atomicModifyIORef, we actually touch it whenever we inspect the first thunk in a Block. What’s particularly interesting is that this can happen in pure code! By putting off this costly operation as long as possible we amortize the cost of all that contention.
Now we also have to support split, luckily it’s easy to split blocks since we have an infinite number of them nested!
    splitBlock# :: Block -> (# Block, Block #)
    splitBlock# (Block i (x :- xs)) = (# x, Block i xs #)
It becomes a bit clearer now why we can completely determine blocks by their “first two” elements. The head is completely unique to each sequence so we know at minimum that if i == j in Block i xs and Block j ys then either xs or ys is the tail of the other. This is an invariant we maintain throughout the code not exposing Block and by ensuring we never :- any new ones onto its internal stream. If these streams have the same head (also unique) then they must be the same sequence so the original blocks are equivalent. Nifty.
Now this still isn’t quite enough, we need one final data type: Supply
    data Supply = Supply {-# UNPACK #-} !Int {-# UNPACK #-} !Int Block
        deriving (Eq,Ord,Show)

    blockSupply :: Block -> Supply
    blockSupply (Block i bs) = Supply i (i + blockSize - 1) (extract bs)
    {-# INLINE blockSupply #-}
A supply should be seen almost an iterator over a chunk of a number line. We know that each block is 1024 away from each other and a supply is almost an iterator from the blocks starting value over the next 1023 elements. We know that Supplys could intersect because the blocks are spaced this far apart.
Once we run out of those elements though, we need to get more. For this we have another block hidden in the back of the supply. It’s kept lazily so that it won’t fire of its first thunk to go bump our global store. When we run out of things to enumerate we call blockSupply, which will force i which will go bother the global counter for another chunk of 1024 unique values.
With this understanding, splitSupply and freshId are quite easy.
    -- | An unboxed version of freshId
    freshId# :: Supply -> (# Int#, Supply #)
    freshId# (Supply i@(I# i#) j b)
      | i /= j = (# i#, Supply (i + 1) j b #)
      | otherwise = (# i#, blockSupply b #)
    {-# INLINE freshId# #-}

    -- | An unboxed version of splitSupply
    splitSupply# :: Supply -> (# Supply, Supply #)
    splitSupply# (Supply i k b) = case splitBlock# b of
        (# bl, br #)
          | k - i >= minSplitSupplySize
          , j <- i + div (k - i) 2 ->
            (# Supply i j bl, Supply (j + 1) k br #)
          | Block x (l :- r :- _) <- bl
          , y <- x + div blockSize 2
          , z <- x + blockSize - 1 ->
            (# Supply x (y - 1) l, Supply y z r #)
    {-# INLINE splitSupply# #-}
freshId# is more or less what we’d expect for an iterator. It returns the lower bound and returns the new supply with the lower bound bumped by one. Notice how cheap this is. In particular, since we haven’t forced b anywhere we’ve just copied a couple of words. The expensive bit is when we actually run out of values in our range, in this case we return our final value and force operation to produce a new supply. This goes off and hammers on blockCounter. Happily we only end up doing this 1/1024th of the time.
splitSupply# is a bit more complicated. When we go to split a supply we’re going to partition its range of values into two separate ranges. However, we want to watch out for splitting extremely small ranges. In this case, it’s slightly more efficient to just bite the bullet and incur the cost of hitting the blockCounter.
The way we determine this is to split the block b, giving us two new blocks. If we have more in the current set of ids then minimumSplitSize all we give the two blocks to two new supplies, each with one half of the original range.
If the block size is indeed two small, we poke the first block in the pair. This causes it to go hammer blockCounter and from there we divide the range we got back into two and return these smaller supplies over the new range. Notice that we’ve completely tossed the remaining elements in the supply on the floor since there weren’t that many. More interestingly, we completely ignored the second result of our split! The idea is that the most expensive operation we can do here is force that first thunk in a block. However, is long as we don’t force their first components blocks are dirt cheap! Hence it’s cheaper to accept that we only get half of blockSize on each Supply but we only had to perform one CAS to get them.
So now that we’ve done all of that, all that’s left in the module is the paper-thin wrappers over these functions so we don’t always have to use unboxed tuples
    -- | Obtain a fresh Id from a Supply.
    freshId :: Supply -> (Int, Supply)
    freshId s = case freshId# s of
      (# i, s' #) -> (I# i, s')
    {-# INLINE freshId #-}

    -- | Split a supply into two supplies that will return disjoint identifiers
    splitSupply :: Supply -> (Supply, Supply)
    splitSupply s = case splitSupply# s of
      (# l, r #) -> (l, r)
    {-# INLINE splitSupply #-}
And that’s all. I’ll hope this illustrated a fairly unique mix of laziness in side effects to help reduce contention for a difficult concurrent problem.
Cheers

          
          
          comments powered by Disqus



Bidirectional Type Checkers for λ→ and λΠ
Danny Gratzer — Sat, 22 Nov 2014 00:00:00 UT

    Posted on November 22, 2014
    


    
    Tags: haskell, types, compilers
    


This week I learned that my clever trick for writing a type checker actually has a proper name: bidirectional type checking. In this post I’ll explain what exactly that is and we’ll use it to write a few fun type checkers.
First of all, let’s talk about one of the fundamental conflicts when designing a statically typed language: how much information need we demand from the user? Clearly we can go too far in either direction. Even people who are supposedly against type inference support at least some inference. I’m not aware of a language that requires you to write something like
my_function((my_var : int) + (1 : int) : int) : string
Clearly inferring the types of some expressions are necessary. On the other hand, if we leave out all type annotations then it becomes a lot harder for a human reader to figure out what’s going on! I at least, need to see signatures for top level functions or I become grumpy.
So inside a type checker we always have two sort of processes

I know this must have the type T, I’ll check to make sure this is the case
I have no idea what the type of this expression is, I’ll examine the expression to figure it out

In a bidirectional type checker, we acknowledge these two phases by explicitly separating the type checker into two functions
    inferType :: Expr -> Maybe Type
    checkType :: Type -> Expr -> Maybe ()
Our type checker thus has two directions, one where we use the type to validate the expression (the type flows in) or we synthesize the type form the expression (the type flows out). That’s all that this is!
It turns out that a technique like this is surprisingly robust. It handles everything from subtyping to simple dependent types! To see how this actually plays out I think it’d be best to just dive in and do something with it.
Laying Out Our Language
Now when we’re building a bidirectional type checker we really want our AST to explicitly indicate inferrable vs checkable types. Clearly the parser might not care so much about this distinction, but prior to type checking it’s helpful to create this polarized tree.
For a simple language you can imagine
    data Ty = Bool
            | Arr Ty Ty
            deriving(Eq, Show)

    data IExpr = Var Int
               | App IExpr CExpr
               | Annot CExpr Ty
               | If CExpr IExpr IExpr
               | ETrue
               | EFalse

    data CExpr = Lam CExpr
               | CI IExpr
This is just simply typed lambda calculus with booleans. We’re using DeBruijn indices so we need not specify a variable for Lam. The IExpr type is for expressions we can infer types for, while CExpr is for types we can check.
Much this isn’t checking, we can always infer the types of variables, inferring the types of lambdas is hard, etc. Something worth noting is CI. For any inferrable type, we can make it checkable by inferring a type and checking that it’s equal to what we expected. This is actually how Haskell works, GHC is just inferring type without bothering with your signature and then just checks you were right in the first place!
Now that we’ve separated out our expressions, we can easily define our type checker.
    type Env = [Ty]

    (?!) :: [a] -> Int -> Maybe a
    xs ?! i = if i < length xs then Just (xs !! i) else Nothing

    inferType :: Env -> IExpr -> Maybe Ty
    inferType env (Var i) = env ?! i
    inferType env (App l r) =
      case inferType env l of
       Just (Arr lTy rTy) -> checkType env r lTy >> return rTy
       _ -> Nothing
    inferType env (Annot e an) = checkType env e an >> return an
    inferType _ ETrue = return Bool
    inferType _ EFalse = return Bool
    inferType env (If i t e) = do
      checkType env i Bool
      lTy <- inferType env t
      rTy <- inferType env e
      guard (lTy == rTy)
      return lTy

    checkType :: Env -> CExpr -> Ty -> Maybe ()
    checkType env (Lam ce) (Arr l r) = checkType (l : env) ce r
    checkType env (CI e) t = inferType env e >>= guard . (t ==)
    checkType _ _ _ = Nothing
So our type checker doesn’t have many surprises in it. The environment is easy to maintain since DeBruijn indices are easily stored in a list.
Now that we’ve seen how a bidirectional type checker more or less works, let’s kick it up a notch.
Type Checking Dependent Types
Type checking a simple dependently typed language is actually not nearly as bad as you’d expect. The first thing to realize is that since dependent types have only one syntactic category.
We maintain the distinction between inferrable and checkable values, resulting in
    data IExpr = Var Int
               | App IExpr CExpr
               | Annot CExpr CExpr
               | ETrue
               | EFalse
               | Bool
               | Star -- New stuff starts here
               | Pi CExpr CExpr
               | Const String
               | Free Int
               deriving (Eq, Show, Ord)

    data CExpr = Lam CExpr
               | CI IExpr
               deriving (Eq, Show, Ord)
So you can see we’ve added 4 new expressions, all inferrable. Star is just the kind of types as it is in Haskell. Pi is the dependent function type, it’s like Arr, except the return type can depend on the supplied value.
For example, you can imagine a type like
    replicate :: (n : Int) -> a -> List n a
Which says something like “give me an integer n and a value and I’ll give you back a list of length n”.
Interestingly, we’ve introduce constants. These are necessary simply because without them this language is unbelievable boring. Constants would be defined in the environment and they represent constant, irreducible terms. You should think of them almost like constructors in Haskell. For example, one can imagine that 3 constants
    Nat :: Star
    Zero :: Nat
    Succ :: (_ : Nat) -> Nat
Which serve to define the natural numbers.
Last but not least, we’ve added “free variables” as an explicit
Now an important piece of a type checker is comparing types for equality, in STLC, equivalent types are syntactically equal so that was solved with deriving Eq. Here we need a bit more subtlety. Indeed, now we need to check arbitrary expressions for equality! This is hard. We’ll reduce things as much as possible and then just check syntactic equality. This means that if True then a else b would equal a as we’d hope, but \x -> if x then a else a wouldn’t.
Now in order to facilitate this check we’ll define a type for fully reduced expressions. Since we’re only interested in checking equality on these terms we can toss the inferrable vs checkable division out the window.
    data VConst = CAp VConst Val
                | CVar String
                | CFree Int

    data Val = VStar
             | VBool
             | VTrue
             | VFalse
             | VConst VConst
             | VArr Val Val
             | VPi Val (Val -> Val)
             | VLam (Val -> Val)
             | VGen Int
Now since we have constants we can have chains of application that we can’t reduce, that’s what VConst is. Notice that this handles the case of just having a constant nicely.
The value dichotomy uses a nice trick from the “Simple Easy!” paper, we use HOAS to have functions that reduce themselves when applied. The downside of this is that we need VGen to peek inside the now opaque VLam and VPi. The idea is we’ll generate a unique Int and apply the functions to VGen i.
Now in order to conveniently generate these fresh integers I used monad-gen (it’s not self promotion if it’s useful :). Equality checking comes to
    -- *Whistle and fidget with hands*
    instance Enum Val where
      toEnum = VGen
      fromEnum _ = error "You're a bad person."

    eqTerm :: Val -> Val -> Bool
    eqTerm l r = runGen $ go l r
      where go VStar VStar = return True
            go VBool VBool = return True
            go VTrue VTrue = return True
            go VFalse VFalse = return True
            go (VArr f a) (VArr f' a') = (&&) <$> go f f' <*> go a a'
            go (VLam f) (VLam g) = gen >>= \v -> go (f v) (g v)
            go (VPi f) (VPi g) = gen >>= \v -> go (f v) (g v)
            go (VGen i) (VGen j) = return (i == j)
            go (VConst c) (VConst c') = case (c, c') of
              (CVar v, CVar v') -> return (v == v')
              (CAp f a, CAp f' a') ->
                (&&) <$> go (VConst f) (VConst f') <*> go a a'
              _ -> return False
            go _ _ = return False
Basically we just recurse and return true or false at the leaves.
Now that we know how to check equality of values, we actually need to map terms into those values. This involves basically writing a little interpreter.
    inf :: [Val] -> IExpr -> Val
    inf _ ETrue = VTrue
    inf _ EFalse = VFalse
    inf _ Bool = VBool
    inf _ Star = VStar
    inf _ (Free i) = VConst (CFree i)
    inf _ (Const s) = VConst (CVar s)
    inf env (Annot e _) = cnf env e
    inf env (Var i) = env !! i
    inf env (Pi l r) = VPi (cnf env l) (\v -> cnf (v : env) r)
    inf env (App l r) =
      case inf env l of
       VLam f -> f (cnf env r)
       VConst c -> VConst . CAp c $ cnf env r
       _ -> error "Impossible: evaluated ill-typed expression"

    cnf :: [Val] -> CExpr -> Val
    cnf env (CI e) = inf env e
    cnf env (Lam c) = VLam $ \v -> cnf (v : env) c
The interesting cases are for Lam, Pi, and App. For App we actually do reductions wherever we can, otherwise we know that we’ve just got a constant so we slap that on the front. Lam and Pi are basically the same, they wrap the evaluation of the body in a function and evaluate it based on whatever is fed in. This is critical, otherwise App’s reductions get much more complicated.
We need one final thing. You may have noticed that all Val’s are closed, there’s no free DeBruijn variables. This means that when we go under a binder we can’t type check open terms since we’re representing types as values and the term we’re checking shares variables with its type.
This means that our type checker when it goes under a binder is going to substitute the now free variable for a fresh Free i. Frankly, this kinda sucks. I poked about for a better solution but this is what “Simple Easy!” does too..
To do these substitutions we have
    ibind :: Int -> IExpr -> IExpr -> IExpr
    ibind i e (Var j) | i == j = e
    ibind i e (App l r) = App (ibind i e l) (cbind i e r)
    ibind i e (Annot l r) = Annot (cbind i e l) (cbind i e r)
    ibind i e (Pi l r) = Pi (cbind i e l) (cbind i e r)
    ibind _ _ e  = e -- Non recursive cases

    cbind :: Int -> IExpr -> CExpr -> CExpr
    cbind i e (Lam b) = Lam (cbind (i + 1) e b)
    cbind i e (CI c) = CI (ibind i e c)
This was a bit more work than I anticipated, but now we’re ready to actually write the type checker!
Since we’re doing bidirectional type checking, we’re once again going to have two functions, inferType and checkType. Our environments is now a record
    data Env = Env { localVar :: M.Map Int Val
                   , constant :: M.Map String Val }
The inferring stage is mostly the same
    inferType :: Env -> IExpr -> GenT Int Maybe Val
    inferType _ (Var _) = lift Nothing -- The term is open
    inferType (Env _ m) (Const s) = lift $ M.lookup s m
    inferType (Env m _) (Free i) = lift $ M.lookup i m
    inferType _ ETrue = return VBool
    inferType _ EFalse = return VBool
    inferType _ Bool = return VStar
    inferType _ Star = return VStar
    inferType env (Annot e ty) = do
      checkType env ty VStar
      let v = cnf [] ty
      checkType env e v >> return v
    inferType env (App f a) = do
      ty <- inferType env f
      case ty of
       VPi aTy body -> do
         checkType env a aTy
         return (body $ cnf [] a)
       _ -> lift Nothing
    inferType env (Pi ty body) = do
      checkType env ty VStar
      i <- gen
      let v = cnf [] ty
          env' = env{locals = M.insert i v (locals env)}
      checkType env' (cbind 0 (Free i) body) VStar
      return VStar
The biggest difference is that now we have to compute some types on the fly. For example in Annot we check that we are in fact annotating with a type, then we reduce it to a value. This order is critical! Remember that cnf requires well typed terms.
Beyond this there are two interesting cases, there’s App which nicely illustrates what a pi type means and Pi which demonstrates how to deal with a binder.
For App we start in the same way, we grab the (function) type of the function. We can then check that the argument has the right type. To produce the output type however, we have to normalize the argument as far as we can and then feed it to body which computes the return type. Remember that if there’s some free variable in a then it’ll just be represented as VConst (CFree ...).
Pi checks that we’re quantifying over a type first off. From there it generates a fresh free variable and updates the environment before recursing. We use cbind to replace all occurrences of the now unbound variable for an explicit Free.
checkType is pretty trivial after this. Lam is almost identical to Pi and CI is just eqTerm.
    checkType :: Env -> CExpr -> Val -> GenT Int Maybe ()
    checkType env (CI e) v = inferType env e >>= guard . eqTerm v
    checkType env (Lam ce) (VPi argTy body) = do
      i <- gen
      let ce' = cbind 0 (Free i) ce
          env' = env{locals = M.insert i argTy (locals env)}
      checkType env' ce' (body $ VConst (CFree i))
    checkType _ _ _ = lift Nothing
And that’s it!
Wrap Up
So let’s circle back to where we started: bidirectional type checking! Hopefully we’ve seen how structuring a type checker around these two core functions yields something quite pleasant.
What makes this really interesting though is how well it scales. You can use this style type checker to handle subtyping, [dependent] pattern matching, heaps and tons of interesting features.
At 400 lines though, I think I’ll stop here :)

          
          
          comments powered by Disqus



Functors and Recursion
Danny Gratzer — Wed, 19 Nov 2014 00:00:00 UT

    Posted on November 19, 2014
    


    
    Tags: haskell
    


One of the common pieces of folklore in the functional programming community is how one can cleanly formulate recursive types with category theory. Indeed, using a few simple notions we can build a coherent enough explanation to derive some concrete benefits.
In this post I’ll outline how one thinks of recursive types and then we’ll discuss some of the practical ramifications of such thoughts.
Precursor
I’m assuming the reader is familiar with some basic notions from category theory. Specifically familiarity with the definitions of categories and functors.
Let’s talk about endofunctors, which are functors whose domain and codomain are the same. spoiler: These are the ones we care about in Haskell. An interesting notion that comes from endofunctors is that of algebras. An algebra in this sense is a pair of an object C, and a map F C → C. Here F is called the “signature” and C is called the carrier.
If you curious about why these funny terms, in abstract algebra we deal with algebras which are comprised of a set of distinguished elements, functions, and axioms called the signature. From there we look at sets (called carriers) which satisfy the specification. We can actually cleverly rearrange the specification for something like a group into an endofunctor! It’s out of scope for this post, but interesting if algebras your thing.
Now we can in fact define a category for F-algebras. in such a category an object is α : F A → A and each arrow is a triplet.

normal arrow f : A → B
An F-algebra α : F A → A
Another F-algebra β : F B → B

So that f ∘ α = β ∘ F f. In picture form
         F f
F A ———————————————–→ F B
 |                    |
 |                    |
 | α                  | β
 ↓                    ↓
 A —————————————————→ B
           f
commutes. I generally elide the fact that we’re dealing with triplets and instead focus on the arrow, since that’s the interesting bit.
Now that we’ve established F-algebras, we glance at one more thing. There’s one more concept we need, the notion of initial objects. An initial object is an… object, I in a category so that for any object C
          f
 I - - - - - - - - → C
So that f is unique.
Now what we’re interested in investigating is the initial object in the category of F-algebras. That’d mean that
           α
F I ————————————————–→ I
 |                     |
 |
 | F λ                 | λ
 |
 ↓                     ↓
F C —————————————————→ C
Commutes only for a unique λ.
A List is just an Initial Object in the Category of F-Algebras.
What’s the problem?
Now, remembering that we’re actually trying to understand recursive types, how can we fit the two together? We can think of recursive types as solutions to certain equations. In fact, our types are what are called the least fixed point solutions. Let’s say we’re looking at IntList. We can imagine it defined as
    data IntList = Cons Int IntList | Nil
We can in fact, factor out the recursive call in Cons and get
    data IntList a = Cons Int a | Nil
                   deriving Functor
Now we can represent a list of length 3 as something like
    type ThreeList = IntList (IntList (IntList Void))
Which is all well and good, but we really want arbitrary length list. We want a solution to the equation that
X = IntList X
We can view such a type as a set {EmptyList, OneList, TwoList, ThreeList ... }. Now how can we actually go about saying this? Well we need to take a fixed point of the equation! This is easy enough in Haskell since Haskell’s type system is unsound.
    -- Somewhere, somehow, a domain theorist is crying.
    data FixedPoint f = Fix {unfix :: f (FixedPoint f)}
Now we can regain our normal representation of lists with
    type List = FixedPoint IntList
To see how this works
    out :: FixedPoint IntList -> [Int]
    out (Fix f) = case fmap out f of
                    Nil -> []
                    Cons a b -> a : b

    in :: [Int] -> FixedPoint IntList
    in [] = Nil
    in (x : xs) = Fix (Cons x (in xs))
Now this transformation is interesting for one reason in particular, IntList is a functor. Because of this, we can formulate an F-algebra for IntList.
    type ListAlg a = IntList a -> a
Now we consider what the initial object in this category would be. It’d be something I so that we have a function
    cata :: Listalg a -> (I -> a) -- Remember that I -> a is an arrow in F-Alg
    cata :: (List a -> a) -> I -> a
    cata :: (Either () (a, Int) -> a) -> I -> a
    cata :: (() -> a) -> ((a, Int) -> a) -> I -> a
    cata :: a -> (Int -> a -> a) -> I -> a
    cata :: (Int -> a -> a) -> a -> I -> a
Now that looks sort of familiar, what’s the type of foldr again?
    foldr :: (a -> b -> b) -> b -> [a] -> a
    foldr :: (Int -> a -> a) -> a -> [Int] -> a
So the arrow we get from the initiality of I is precisely the same as foldr! This leads us to believe that maybe the initial object for F-algebras in Haskell is just the least fixed point, just as [Int] is the least fixed point for IntList.
To confirm this, let’s generalize a few of our definitions from before
    type Alg f a = f a -> a
    data Fix f = Fix {unfix :: f (Fix f)}

    type Init f = Alg f (Fix f)

    cata :: Functor f => Alg f a -> Fix f -> a
    cata f = f . fmap (cata f) . unfix
Exercise, draw out the reduction tree for cata on lists
Our suspicion is confirmed, the fixed point of an functor is indeed the initial object. Further more, we can easily show that initial objects are unique up to isomorphism (exercise!) so anything that can implement cata is isomorphic to the original, recursive definition we were interested in.
When The Dust Settles
Now that we’ve gone and determined a potentially interesting fact about recursive types, how can we use this knowledge? Well let’s start with a few things, first is that we can define a truly generic fold function now:
    fold :: Functor f => (f a -> a) -> Fix f -> a
This delegates all the messy details of how one actually thinks about handling the “shape” of the container we’re folding across by relegating it to the collapsing function f a -> a.
While this may seem like a small accomplishment, it does mean that we can build off it to create data type generic programs that can be fitted into our existing world.
For example, what about mutual recursion. Fold captures the notion of recurring across one list in a rather slick way, however, recurring over two in lockstep involves a call to zip and other fun and games. How can we capture this with cata?
We’d imagine that the folding functions for such a scenario would have the type
    f (a, b) -> a
    f (a, b) -> b
From here we can build
    muto :: (f (a, b) -> a) -> (f (a, b) -> b) -> Fix f -> (a, b)
    muto f g = cata ((,) <$> f <*> g)
Similarly we can build up oodles of combinators for dealing with folding all built on top of cata!
That unfortunately sounds like a lot of work! We can shamelessly free-load of the hard work of others thanks to hackage though. In particular, the package recursion-schemes has built up a nice little library for dealing with initial algebras. There’s only one big twist between what we’ve laid out and what it does.
One of the bigger stumbling blocks for our library was changing the nice recursive definition of a type into the functorfied version. Really it’s not realistic to write all your types this way. To help simplify the process recursion-schemes provides a type family called Base which takes a type and returns its functorfied version. We can imagine something like
    data instance Base [a] b = Cons a b | Nil
This simplifies the process of actually using all these combinators we’re building. To use recursion-schemes, all you need to is define such an instance and write project :: t -> Base t t. After that it’s all kittens and recursion.
Wrap Up
So dear reader, where are we left? We’ve got a new interesting formulation of recursive types that yields some interesting results and power. There’s one interesting chunk we’ve neglected though: what does unfolding look like?
It turns out there’s a good story for this as well, unfolding is the operation (anamorphism) defined by a terminal object in a category. A terminal object is the precise dual of an initial one. You can notice this all in recursion-schemes which features ana as well as cata.

          
          
          comments powered by Disqus



The Guts of a Spineless Machine
Danny Gratzer — Tue, 28 Oct 2014 00:00:00 UT

    Posted on October 28, 2014
    


    
    Tags: haskell, c
    


It’s fairly well known that Haskell is a bit um.. different from how stock hardware sees the world. I’m not aware of too many processors that have decided that immutability and higher order functions are the right way to go.
Compiling Haskell and its ilk, however, does have one interesting wrinkle on top of the normal problem: laziness. Laziness stands completely at odds with how most everything else works. Moreover, whether or not you think it’s the right default, it’s an interesting question of how to efficiently compile some evaluation strategy other than call by value or name.
To this end, people have built a lot of abstract machines that lazy languages could target. These machines can be mapped easily to what the hardware wants and transitively, we can get our compiler. Most of these work by “graph reduction” (that’s the G in STG) and the latest incarnation of these graph machines is the spineless tagless graph machine which lies at the heart of GHC and a few other compilers.
In this post, I’d like to go over how exactly the STG machine actually works. Turns out it’s pretty cool!
Core Concepts
The basic idea behind a compiler intent on going the STG route is something like

.. front end stuff ..
Translate IL to STG language
Compile STG language to C/ASM/LLVM/Javascript

In GHC case I understand the pipeline is something like

Parsing
Type checking
Desugaring + a few bobs and bits
Translation to core
Lion share of optimization
Translation to STG language
STG language to C–
C– to assembly or llvm

We’re really concerned with parts 6 and 7 here. First things first, let’s lay out what’s exactly in the STG language. It’s a tiny functional language that looks a bit like Haskell or Core, with a few restrictions. A program is simply a series of bindings, much like Haskell. The top levels look something like
f = {x y z} flag {a b c} -> ...
You should read this for now as f = \a b c -> .... The first set of variables and the flag correspond to some stuff we’ll discuss later.
Inside the ... we can write most of what you would expect from Haskell. We have let[rec] bindings, case expressions, application, constructors, literals, and primitives. There is a caveat though: first off all, constructor applications must be fully saturated. This isn’t unlike OCaml or something where you can’t just treat a constructor as a function with an arbitrary name. We would write
\a -> Just a
instead of just Just. Another bit of trickiness: our language has no lambdas! So we can’t even write the above. Instead if we had something like
 map Just [1, 2, 3]
We’d have to write
 let f   = \a -> Just a
     l'' = 3 : nil
     l'  = 2 : l''
     l   = 1 : l'
 in map f l
The reason for the awkward l'' series is that we’re only allowed to apply constructors and functions to atoms (literals and variables).
One other noteworthy feature of STG is that we have primitive operations. They need to be fully saturated, just like constructors, but they work across unboxed things. For example there would probably be something like +# which adds to unboxed integers. To work with these we also have unboxed literals, 1#, 2#, so on and so on.
Now, despite all these limitations placed on STG, it’s still a pretty stinking high level language. There’s letrec, higher order functions, a lot of the normal stuff we’d expect in a functional language. This means it’s not actually to hard to compile something like Haskell or Core to STG (I didn’t say “compile efficiently”).
As an example, let’s look at translating factorial into STG language. We start with
f :: Int -> Int
f i = case i of
  0 -> 1
  i -> i * (f (i - 1))
Now the first step is we change the binding form
f = {} n {i} -> ...
The case expressions clause can remain the same, we’re already casing on an atom
case i of
  (MkInt# i#) -> ...
Now comes the first big change, our boxed integers are going to get in the way here, so the case expression strips away the constructor leaving us with an unboxed integer. We can similarly refactor the body to make evaluation order explicit
 case i of
   MkInt i# ->
     case i# -# 1# of
       dec# ->
         let dec = \{dec#} u {} -> MkInt dec#
         in case fact dec of
              MkInt rest# ->
                case i# * rest# of
                  result# -> MkInt result#
Notice how the case expressions here are used to make the evaluation of various expressions explicit and let was used to create a new thing to evaluate.
Now we can see what those extra {}’s were for. They notate the free variables for a thunk. Remember how we can have all sorts of closures and it can make for some really nice code? Well the machine doesn’t exactly support those naively. What we need to do and note the variables that we close over explicitly and then generate code that will store these free variables with the value that closes over them. This pair is more or less what is called a “closure” for the rest of this post.
Actually, I’ll sometimes use “thunk” as another descriptor for this pair. This is because closures in STG land do quite a lot! In particular, they are used to represent the fundamental unit of lazy code, not just closing over variables. We’ll have closures that actually don’t close over anything! This would be a bit strange, but each “thunk” in Haskell land is going to become a closure in STG-ville. The notion of forcing a thunk in Haskell is analogous to evaluating an STG closure and creating a thunk is creating a new closure. This is helpful to keep in mind as we examine the rest of the machine.
dec for example has a free variable dec# and it exists to box that result for the recursive call to factorial. We use case expressions to get evaluation. Most programs thus become chains of case’s and let alternating between creating thunks and actually doing work.
That u in between the {}’s in dec was also important. It’s the update flag. Remember how in Haskell we don’t want to force the same thunk twice. If I say
let i = 1 + 1 in i + i
We should only evaluate 1 + 1 once. That means that the thunk i will have to be mutated to not evaluate 1 + 1 twice. The update flag signifies the difference between thunks that we want to update and thunks that we don’t. For example, if we replaced the thunk for + with the first result it returned, we’d be mighty surprised. Suddenly 1 + 1 + 1 is just 2!
The u flag says “yes, I’m just a normal expression that should be updated” and the n flag says the opposite.
That about wraps up our discussion of the STG language, let’s talk about how to implement it now.
Semantics
This language wouldn’t be much good if it didn’t lend itself to an easy implementation, indeed we find that the restrictions we placed upon the language prove to be invaluable for its compilation (almost like they were designed that way!).
In order to decide how best to implement it, we first define the formal semantics for our language, which operates on a tuple of 6 things:

The code - the instruction we’re currently executing
The argument stack - A stack of integers or pointers to closures
The return stack - A stack of continuations
The update stack - A stack of update frames
The heap - A map from addresses to closures
The environment - A map from names to addresses of toplevel closures

A code is more or less the current thing we’re attempting to do. It’s either

Eval e p - evaluate an expression in an environment (p)
Enter a - Enter a closure
ReturnCon c ws - Return a constructor applied to some arguments
ReturnInt - Return an integer

Now the idea is we’re going to “unroll” our computations into pushing things onto the continuation stack and entering closures. We start with the code Eval main {}. That is to say, we start by running main. Then if we’re looking at a case we do something really clever
 EVAL(case expr of {pat1 -> expr1; ...}, p) as rs us h o
becomes
EVAL (expr, p) as ({pat1 -> expr1; ...} : rs) us h o
That is to say, we just push the pattern matching on to the continuation stack and evaluate the expression.
At some point we’ll get to a “leaf” in our expression. That is random literal (a number) or constructor. At this point we make use of our continuation stack
EVAL (C ws, p) as ((...; c vs -> expr; ...) : rs) us h o
ReturnCon (C ws) as ((...; c vs -> expr; ...) : rs) us h o
EVAL (expr, p[vs -> ws]) as rs us h o
So our pattern matching is rolled into ReturnCon. ReturnCon will just look on top of the return stack looking for a continuation which wants its constructor and evaluate its expression, mapping the constructor’s variables to the pattern’s variables.
The story is similar for literals
EVAL (Int i, p) as ((...; c vs -> expr; ...) : rs) us h o
ReturnInt i as ((...; i -> expr; ...) : rs) us h o
EVAL (expr, p) as rs us h o
Another phase is how we handle let’s and letrec’s. In this phase instead of dealing with continuations, we allocate more thunks onto the heap.
EVAL ((let x = {fs} f {xs} -> e; ... in expr), p) as rs us h o
EVAL e p' as us h' o
So as we’d expect, evaluating a let expression does indeed go and evaluate the body of the let expression, but changes up the environment in which we evaluate them. We have
p' = p[x -> Addr a, ...]
h' = h[a -> ({fs} f {xs} -> e) p fs, ...]
In words “the new environment contains a binding for x to some address a. The heap is extended with an address a with a closure {fs} f {xs} -> ... where the free variables come from p”. The definition for letrec is identical except the free variables come from p' allowing for recursion.
So the STG machine allocates things in lets, adds continuations with case, and jumps to continuation on values.
Now we also have to figure out applications.
EVAL (f xs, p) as rs us h o
ENTER a (values of xs ++ as) rs us h o
where the value of f is Addr a. So we push all the arguments (remember they’re atoms and therefore trivial to evaluate) on to the argument stack and enter the closure of the function.
How do we actually enter a closure? Well we know that our closures are of the form
({fs} f {vs} -> expr) frees
If we have enough arguments to run the closure (length vs > length of argument stack), then we can just EVAL expr [vs -> take (length vs) as, fs -> frees]. This might not be the case in something like Haskell though, we have partial application. So what do we do in this case?
What we want is to somehow get something that’s our closure but also knows about however many arguments we actually supplied it. Something like
({fs ++ supplied} f {notSupplied} -> expr) frees ++ as
where supplied ++ notSupplied = vs. This updating of a closure is half of what our update stack us is for. The other case is when we do actually enter the closure, but f = u so we’re going to want to update it. If this is the case we add an update from to the stack (as, rs, a) where as is the argument stack, rs is the return stack, and a is the closure which should be updated. Once we’ve pushed this frame, we promptly empty the argument stack and return stack.
We then add the following rules to the definition of ReturnCon
ReturnCon c ws {} {} (as, rs, a) : us h o
ReturnCon c ws as rs us h' o
where h' is the new heap that’s replaced our old closure at a with our new, spiffy, updated closure
h' = h[a -> ({vs} n {} -> c vs) ws]
So that’s what happens when we go to update a closure. But what about partial application?
Enter a as {} (asU, rs, aU) : us h o
Enter a (as ++ asU) rs us h' o
where
h a = ({vs} n {xs} -> expr) frees
h' = h [aU -> ((vs ++ bound) n xs -> e) (frees ++ as)]
This is a simplified rule from what’s actually used, but gives some intuition to what’s happening: we’re minting a new closure in which we use the arguments we’ve just bound and that’s what the result of our update is.
Compiling This
Now that we have some idea of how this is going to work, what does this actually become on the machine?
The original paper by SPJ suggests an “interpreter” approach to compilation. In other words, we actually almost directly map the semantics to C and call it compiled. There’s a catch though, we’d like to represent the body of closures as C functions since they’re well.. functions. However, since all we do is enter closures and jump around to things till the cows come home, it had damn well better be fast. C function calls aren’t built to be that fast. Instead the paper advocates a tiny trampolining-esque approach.
When something wants to enter a closure, it merely returns it and our main loop becomes
 while(1){cont = (*cont)();}
Which won’t stackoverflow. In reality, more underhanded tricks are applied to make the performance suck less, but for we’ll ignore such things.
In our compiled results there will be 2 stacks, not the 3 found in our abstract machine. In the first stack (A-stack) there are pointer things and the B-stack has non-pointers. This are monitored by two variables/registers SpA and SpBwhich keep track of the heights of the two stacks. Then compilation becomes reasonably straightforward.
An application pushes the arguments onto appropriate stacks, adjusts Sp*, and enters the function. A let block allocates each of the bound variables, then the body. Entering a closure simply jumps to the closure’s code pointer. This is actually quite nifty. All the work of figuring out exactly what Enter will do (updates, continuation jiggering) is left to the closure itself.
A case expression is a bit more complicated since a continuation’s representation involves boxing up the local environment for each branch. Once that’s bundled away, we represent a continuation as a simple code pointer. It is in charge of scrutinizing the argument stack and selecting an alternative and then running the appropriate code. This is a lot of work, and, unless I’m crazy, we’ll need two types of bound variables for each branch (really just ptr/non-ptr). The selection of an alternative would be represented as a C switch, letting all sorts of trickery with jump tables be done by the C compiler.
In order to return a value, we do something clever. We take a constructor and point a global variable at its constructor closure, containing its values and jump to the continuation. The continuation can then peek and poke at this global variable to bind things as needed for the alternatives. There is potentially a massive speedup by returning through registers, but this is dangerously close to work.
From here, primitive operations can be compiled to statements/instructions in whatever environment we’re targeting. In C for example we’d just use the normal + to add our unboxed integers.
The last beast to slay is updates. We represent update frames by pointers to argument stacks and a pointer to a closure. That means that the act of updating is merely saving Sp* in an update form, clobbering them, and then jumping into the appropriate closure. We push the update form onto stack B and keep on going.
I realize that this is a glancing overview and I’m eliding a lot of the tricky details, but hopefully this is sufficient to understand a bit about what’s going on at an intuitive level.
Wrap Up
So now that you’ve put all the effort to get through this post, I get to tell you it’s all lies! In reality GHC has applied all manner of tricks and hacks to get fast performance out of the STG model. To be honest I’m not sure where I should point to that explains these tricks because well… I have no idea what they are.
I can point to

SPJ’s original paper
The Relevant GHC Wiki Page

If you have any suggestions for other links I’d love to add them!
Thanks Chris Ganas for proof reading

          
          
          comments powered by Disqus



Notes on Focusing
Danny Gratzer — Mon, 27 Oct 2014 00:00:00 UT

    Posted on October 27, 2014
    


    
    Tags: types, notes
    


I’ve been spending a lot of time whacking my head on focusing literature. I’d like to jot down some intuition around what a focused system is and how it relates to the rest of the world. I’m going to steer clear of actually proving things but I will point out where a proof would be needed.
What Is Focusing
In a nutshell, focusing is a strategy to create proofs that minimizes the amount of choices available at each step. Focusing is thus amenable to mechanization since a computer is very good at applying a lot of deterministic procedures but incredibly bad at nondeterministic choice.
Now when we set out to define a focused system we usually do something like

Formalize our logical framework with natural deduction
Translate our framework into a sequent calculus
Transform our sequent calculus into a focused variant

At each of these steps there’s a proof that says something like “System 2 is sound and complete with respect to System 1”. We can then chain these proofs together to get that we can transform any nonfocused proof into a focused one (focalization) and the reverse (de-focalization).
In order to actually carry out these proofs there’s a fair amount of work and pain. Usually we’ll need something like cut elimination and/or identity expansion.
Groundwork
Now before we go on to define an example logic, let’s notice a few things. First off, in sequent calculus there are left and right rules. Left rules decompose known facts into other known facts while right rules transform our goal. There’s also an identity sequent which more or less just states
 A is an atom
 —————————————
   Γ, A → A
This is a bit boring though.
Now certain rules are invertible: their conclusion implies their premise in addition to the reverse. For example if I said you must prove A ∧ B clearly we’ll have to prove both A and B in order to prove A ∧ B; there’s no alternative set of rule applications that let us circumvent proving A and B.
This means that if we were mechanically trying to prove something of the form A ∧ B we can immediately apply the right rule that decomposes ∧ into 2 goals.
We can these sort of rules invertible or asynchronous. Dually, there are rules that when applied transform our goal into something impossible to prove. Consider ⊥ ∨ ⊤, clearly apply the rule that transforms this into ⊥ would be a bad idea!
Now if we begin classifying all the left and write rules we’ll notice that the tend to all into 2 categories

Things with invertible left rules and noninvertible right rules
Things with noninvertible left rules and invertible right rules

We dub the first group “positive” things and the second “negative” things. This is called polarization and isn’t strictly necessary but greatly simplifies a lot of our system.
Now there are a few things that could be considered both positive and negative. For example we can consider ∧ as positive with
  Γ → A⁺  Γ → B⁺
 ———————————————
   Γ → A⁺ ∧ B⁺

   Γ, A⁺, B⁺ → C
 —————————————————
   Γ, A⁺ ∧ B⁺ → C
In this case, the key determiner for the polarity of ∧ comes from its subcomponents. We can just treat ∧ as positive along with its subcomponents and with an appropriate dual ∧⁻, our proof system will still be complete.
As a quick example, implication ⊃ is negative. the right rule
 Γ, A → B
——————————
Γ → A ⊃ B
While its left rule isn’t
 Γ, A ⊃ B → A  Γ, B, A ⊃ B → C
 ——————————————————————————————
         Γ, A ⊃ B → C
Since we could easily have something like ⊥ ⊃ ⊤ but using this rule would entail (heh) proving ⊥! Urk. If our system applied this rules remorselessly, we’d quickly end up in a divergent proof search.
An Actual Focused System
Do note that these typing rules are straight out of Rob Simmons’ paper, linked below
Now that we’ve actually seen some examples of invertible rules and polarized connectives, let’s see how this all fits into a coherent system. There is one critical change we must make to the structure of our judgments: an addition to the form _ → _. Instead of just an unordered multiset on the left, in order to properly do inversion we change this to Γ; Ω ⊢ A where Ω is an ordered list of propositions we intend to focus on.
Furthermore, since we’re dealing with a polarized calculus, we occasionally want to view positive things as negative and vice versa. For this we have shifts, ↓ and ↑. When we’re focusing on some proposition and we reach a shift, we pop out of the focused portion of our judgment.
Our system is broken up into 3 essentially separate judgments. In this judgment we basically apply as many invertible rules as many places as we can.
 Γ, A⁻; Q ⊢ U
——————————————
Γ; ↓A⁻, Q ⊢ U

Γ; A⁺, Ω ⊢ U  Γ; B+; Ω ⊢ U
———————————————————————————
    Γ; A⁺ ∨ B⁺, Ω ⊢ U

  Γ; A⁺, B⁺, Ω ⊢ U
————————————————————
  Γ; A⁺ ∧ B⁺, Ω ⊢ U

——————————————
 Γ; ⊥, Ω ⊢ U
We first look at how to break down Ω into simpler forms. The idea is that we’re going to keep going till there’s nothing left in Ω. Ω can only contain positive propositions so eventually we’ll decompose everything to shifts (which we move into Γ) ⊤+ (which we just drop on the floor) or ⊥ (which means we’re done). These are all invertible rules to we can safely apply them eagerly and we won’t change the provability of our goal.
Once we’ve moved everything out of Ω we can make a choice. If U is “stable” meaning that we can’t break it down further easily, we can pick a something negative out of our context and focus on it
   Γ; [A⁻] ⊢ U
  ————————————–
  Γ, A⁻; • ⊢ U
This pops us into the next judgment in our calculus. However, if U is not stable, then we have to decompose it further as well.
  Γ; • ⊢ A⁺
——————————————
  Γ; • ⊢ ↑ A⁺

———————————
 Γ; • ⊢ ⊤⁻

  Γ; A⁺ ⊢ B⁻
—————————————
Γ; • ⊢ A⁺ ⊃ B⁻

Γ; • ⊢ A⁻   Γ; • ⊢ B⁻
—————————————————————
   Γ; • ⊢ A⁻ ∧ B⁻
If we have a negative connective at the top level we can decompose that further, leaving us with a strictly smaller goal. Finally, we may reach a positive proposition with nothing in Ω. In this case we focus on the right.
  Γ ⊢ [A⁺]
———————————
 Γ; • ⊢ A⁺
Now we’re in a position to discuss these two focused judgments. If we focus on the right we decompose positive connectives
——————————
 Γ ⊢ [⊤⁺]

Γ; • ⊢ A⁻
—————————
Γ ⊢ ↓ A⁻

   Γ ⊢ [A⁺]
—————————————
 Γ ⊢ [A⁺ ∨ B⁺]

   Γ ⊢ [B⁺]
—————————————
 Γ ⊢ [A⁺ ∨ B⁺]

Γ ⊢ [A⁺]   Γ ⊢ [B⁺]
———————————————————
   Γ ⊢ [A⁺ ∧ B⁺]
These judgments follow the ones we’ve already seen. If we encounter a shift, we stop focusing. Otherwise we decompose the topmost positive connective. Now looking at these, you should see that sometimes these rules we’ll lead us to a “mistake”. Imagine if we applied the 4th rule to ⊤ ∨ ⊥! This is why these rules are segregated into a separate judgment.
In this judgment’s dual we essentially apply the exact same rules to the left of the turnstile and on negative connectives.
  Γ; A⁺ ⊢ U
————————————
Γ; [↑A⁺] ⊢ U

Γ ⊢ [A⁺]   Γ; [B⁻] ⊢ U
——————————————————————
  Γ; [A⁺ ⊃ B⁻] ⊢ U

   Γ; [A⁻] ⊢ U
—————————————————
 Γ; [A⁻ ∧ B⁻] ⊢ U

   Γ; [B⁻] ⊢ U
—————————————————
 Γ; [A⁻ ∧ B⁻] ⊢ U
That wraps up our focused system. The idea is now we have this much more limited system which can express the same things our original, unfocused system could. A computer can be easily programmed to do a focused search since there’s much less backtracking everywhere leading to fewer rules being applicable at each step. I think Pfenning has referred to this as removing most of the “don’t-care” nondeterminism from our rules.
Wrap Up
I’m going to wrap up the post here. Proving focalization or even something like cut elimination is quite fiddly and I have no desire at all to try to transcribe it (painfully) into markdown and get it wrong in the process.
Instead, now that you have some idea of what focusing is about, go read Rob Simmons’ paper. It provides a clear account of proving everything necessary prove a focused system is complete and sound with respect to its unfocused counterpart.
Cheers

          
          
          comments powered by Disqus



Update on Old Projects
Danny Gratzer — Fri, 24 Oct 2014 00:00:00 UT

    Posted on October 24, 2014
    


    
    Tags: haskell, personal
    


All though most people I talk to know me for my blog, I do occasionally actually write software instead of just talking about it :)
Sadly, as a mercurial user most of my stuff has languished with on bitbucket. I’ve had a few people tell me that this is annoying for various reasons. Yesterday, I finally got around to fixing that!
As of yesterday, all of my interesting projects are mirrored on [github][my-github]. I’m still using mercurial but thanks to the lovely git-hg tool this is not an issue. You can fork, pull-request, or generally peek and poke as you please. From my end all of these actions look like nice mercurial changesets so I can continue to live under my rock where I don’t need to understand Git.
As a quick list of what haskell code is up there now

c-dsl
c_of_scheme
ds-kanren
generic-church
hasquito
hlf
monad-gen
reified-records
this blog

Which I think includes every project I’ve blogged about here as well as a few others. Sorry it took so long!

          
          
          comments powered by Disqus



Notes on Quotients Types
Danny Gratzer — Fri, 17 Oct 2014 00:00:00 UT

    Posted on October 17, 2014
    


    
    Tags: types, notes
    


Lately I’ve been reading a lot of type theory literature. In effort to help my future self, I’m going to jot down a few thoughts on quotient types, the subject of some recent google-fu.
But Why!
The problem quotient types are aimed at solving is actually a very common one. I’m sure at some point or another you’ve used a piece of data you’ve wanted to compare for equality. Additionally, that data properly needed some work to determine whether it was equal to another piece.
A simple example might would be representing rational numbers. A rational number is a fraction of two integers, so let’s just say
    type Rational = (Integer, Integer)
Now all is well, we can define a Num instance and what not. But what about equality? Clearly we want equivalent fractions to be equal. That should mean that (2, 4) = (1, 2) since they both represent the same number.
Now our implementation has a sticky point, clearly this isn’t the case on its own! What we really want to say is “(2, 4) = (1, 2) up to trivial rejiggering”.
Haskell’s own Rational type solves this by not exposing a raw tuple. It still exists under the hood, but we only expose smart constructors that will reduce our fractions as far as possible.
This is displeasing from a dependently typed setting however, we want to be able to formally prove the equality of some things. This “equality modulo normalization” leaves us with a choice. Either we can really provide a function which is essentially
    foo : (a b : Rational)
        -> Either (reduce a = reduce b) (reduce a /= reduce b)
This doesn’t really help us though, there’s no way to express that a should be observationally equivalent to b. This is a problem seemingly as old as dependent types: How can we have a simple representation of equality that captures all the structure we want and none that we don’t.
Hiding away the representation of rationals certainly buys us something, we can use a smart constructor to ensure things are normalized. From there we could potentially prove a (difficult) theorem which essentially states that
    =-with-norm : (a b c d : Integer)
                -> a * d = b * c -> mkRat a b = mkRat c d
This still leaves us with some woes however, now a lot of computations become difficult to talk about since we’ve lost the helpful notion that denominator o mkRat a = id and similar. The lack of transparency shifts a lot of the burden of proof onto the code privy to the internal representation of the type, the only place where we know enough to prove such things.
Really what we want to say is “Hey, just forget about a bit of the structure of this type and just consider things to be identical up to R”. Where R is some equivalence relation, eg

a R a
a R b implies b R a
a R b and b R c implies a R c

If you’re a mathematician, this should sound similar. It’s a lot like how we can take a set and partition it into equivalence classes. This operation is sometimes called “quotienting a set”.
For our example above, we really mean that our rational is a type quotiented by the relation (a, b) R (c, d) iff a * c = b * d.
Some other things that could potentially use quotienting

Sets
Maps
Integers
Lots of Abstract Types

Basically anything where we want to hide some of the implementation details that are irrelevant for their behavior.
More than Handwaving
Now that I’ve spent some time essentially waving my hand about quotient types what are they? Clearly we need a rule that goes something like
 Γ ⊢ A type, E is an equivalence relation on A
———————————————–———————————————————————————————
        Γ ⊢ A // E type
Along with the typing rule
    Γ ⊢ a : A
——————————————————
  Γ ⊢ a : A // E
So all members of the original type belong to the quotiented type, and finally
  Γ ⊢ a : A, Γ ⊢ b : A, Γ ⊢ a E b
–——————————————–——————————————————
         Γ ⊢ a ≡ b : A // E
Notice something important here, that ≡ is the fancy shmancy judgmental equality baked right into the language. This calls into question decidability. It seems that a E b could involve some non-trivial proof terms.
More than that, in a constructive, proof relevant setting things can be a bit trickier than they seem. We can’t just define a quotient to be the same type with a different equivalence relation, since that would imply some icky things.
To illustrate this problem, imagine we have a predicate P on a type A where a E b implies P a ⇔ P b. If we just redefine the equivalence relation on quotes, P would not be a wellformed predicate on A // E, since a ≡ b : A // E doesn’t mean that P a ≡ P b. This would be unfortunate.
Clearly some subtler treatment of this is needed. To that end I found this paper discussing some of the handling of NuRPL’s quotients enlightening.
How NuPRL Does It
The paper I linked to is a discussion on how to think about quotients in terms of other type theory constructs. In order to do this we need a few things first.
The first thing to realize is that NuPRL’s type theory is different than what you are probably used to. We don’t have this single magical global equality. Instead, we define equality inductively across the type. This notion means that our equality judgment doesn’t have to be natural in the type it works across. It can do specific things at each case. Perhaps the most frequent is that we can have functional extensionality.
f = g ⇔ ∀ a. f a = g a
Okay, so now that we’ve tossed aside the notion of a single global equality, what else is new? Well something new is the lens through which many people look at NuRPL’s type theory: PER semantics. Remember that PER is a relationship satisfying

a R b → then b R a
a R b ∧ b R c → a R c

In other words, a PER is an equivalence relationship that isn’t necessarily reflexive at all points.
The idea is to view types not as some opaque “thingy” but instead to be partial equivalence relations across the set of untyped lambda calculus terms. Inductively defined equality falls right out of this idea since we can just define a ≡ b : A to be equivalent to (a, b) ∈ A.
Now another problem rears it head, what does a : A mean? Well even though we’re dealing with PERs, but it’s quite reasonable to say something is a member of a type if it’s reflexive. That is to say each relation is a full equivalence relation for the things we call members of that type. So we can therefore define a : A to be (a, a) ∈ A.
Another important constraint, in order for a type family to be well formed, it needs to respect the equality of the type it maps across. In other words, for all B : A → Type, we have (a, a') ∈ A' ⇒ (B a = B a') ∈ U. This should seem on par with how we defined function equality and we call this “type functionality”.
Let’s all touch on another concept: squashed types. The idea is to take a type and throw away all information other than whether or not it’s occupied. There are two basic types of squashing, extensional or intensional. In the intensional we consider two squashed things equal if and only if the types they’re squashing are equal
     A = B
  ————————————
   [A] = [B]
Now we can also consider only the behavior of the squashed type, the extensional view. Since the only behavior of a squashed type is simply existing, our extensional squash type has the equivalence
   ∥A∥ ⇔ ∥B∥
   ————————–
    ∥A∥ = ∥B∥
Now aside from this, the introduction of these types are basically the same: if we can prove that a type is occupied, we can grab a squashed type. Similarly, when we eliminate a type all we get is the trivial occupant of the squashed type, called •.
    Γ ⊢ A
   ———————
   Γ ⊢ [A]

    Γ, x : |A|, Δ[̱•] ⊢ C[̱•]
  ——————————————————————————
    Γ, x : |A|, Δ[x] ⊢ C[x]
What’s interesting is that when proving an equality judgment, we can unsquash obth of these types. This is only because NuRPL’s equality proofs computationally trivial.
Now with all of that out of the way, I’d like to present two typing rules. First
  Γ ⊢ A ≡ A';  Γ, x : A, y : A ⊢ E[x; y] = E'[x; y]; E and E' are PERS
  ————————————————————————————————————————————————————————————————————
                      Γ ⊢ A ‌// E ≡ A' // E'
In English, two quotients are equal when the types and their quotienting relations are equal.
 Γ, u : x ≡ y ∈ (A // E), v :  ∥x E y∥, Δ[u] ⊢ C [u]
 ———————————————————————————————————————————————————–
       Γ, u : x ≡ y ∈ (A // E), Δ[u] ⊢ C [u]
There are a few new things here. The first is that we have a new Δ [u] thing. This is a result of dependent types, can have things in our context that depend on u and so to indicate that we “split” the context, with Γ, u, Δ and apply the depend part of the context Δ to the variable it depends on u.
Now the long and short of this is that when we’re of this is that when we’re trying to use an equivalence between two terms in a quotient, we only get the squashed term. This done mean that we only need to provide a squash to get equality in the first place though
Γ ⊢ ∥ x E y  ∥; Γ ⊢ x : A; Γ ⊢ y : A
——————————————————————————————————–
      Γ ⊢ x ≡ y : A // E
Remember that we can trivially form an ∥ A ∥ from A’.
Now there’s just one thing left to talk about, using our quotiented types. To do this the paper outlines one primitive elimination rule and defines several others.
Γ, x : A, y : A, e : x E y, a : ND, Δ[ndₐ{x;y}] ⊢ |C[ndₐ{x;y}]|
——————————————————————————————————————————————————————————————–
               Γ, x : A // E, Δ[x] ⊢ |C[x]|
ND is a admittedly odd type that’s supposed to represent nondeterministic choice. It has two terms, tt and ff and they’re considered “equal” under ND. However, nd returns its first argument if it’s fed tt and the second if it is fed ff. Hence, nondeterminism.
Now in our rule we use this to indicate that if we’re eliminating some quotiented type we can get any value that’s considered equal under E. We can only be assured that when we eliminate a quotiented type, it will be related by the equivalence relation to x. This rule captures this notion by allowing us to randomly choose some y : A so that x E y.
Overall, this rule simply states that if C is occupied for any term related to x, then it is occupied for C[x].
Wrap up
As with my last post, here’s some questions for the curious reader to pursue

What elimination rules can we derive from the above?
If we’re of proving equality can we get more expressive rules?
What would an extensional quotient type look like?
Why would we want intensional or extensional?
How can we express quotient types with higher inductive types from HoTT

The last one is particularly interesting.
Thanks to Jon Sterling for proof reading

          
          
          comments powered by Disqus



Notes on Abstract and Existential Types
Danny Gratzer — Mon, 29 Sep 2014 00:00:00 UT

    Posted on September 29, 2014
    


    
    Tags: haskell, types, notes
    


I’m part of a paper reading club at CMU. Last week we talked about a classic paper, Abstract Types have Existential Type. The concept described in this paper is interesting and straightforward. Sadly some of the notions and comparisons made in the paper are starting to show their age. I thought it might be fun to give a tldr using Haskell.
The basic idea is that when we have an type with an abstract implementation some functions upon it, it’s really an existential type.
Some Haskell Code
To exemplify this let’s define an abstract type (in Haskell)
    module Stack (Stack, empty, push, pop) where
    newtype Stack a = Stack [a]

    empty :: Stack a
    empty = Stack []

    push :: a -> Stack a -> Stack a
    push a (Stack xs) = Stack (a : xs)

    pop :: Stack a -> Maybe a
    pop (Stack []) = Nothing
    pop (Stack (x : xs)) = Just x

    shift :: Stack a -> Maybe (Stack a)
    shift (Stack []) = Nothing
    shift (Stack (x : xs)) = Just (Stack xs)
Now we could import this module and use its operations:
    import Stack

    main = do
      let s = push 1 . push 2 . push 3 $ empty
      print (pop s)
What we couldn’t do however, is pattern match on stacks to take advantage of its internal structure. We can only build new operations out of combinations of the exposed API. The classy terminology would be to say that Stack is abstract.
This is all well and good, but what does it mean type theoretically? If we want to represent Haskell as a typed calculus it’d be a shame to have to include Haskell’s (under powered) module system to talk about abstract types.
After all, we’re not really thinking about modules as so much as hiding some details. That sounds like something our type system should be able to handle without having to rope in modules. By isolating the concept of abstraction in our type system, we might be able to more deeply understand and reason about code that uses abstract types.
This is in fact quite possible, let’s rephrase our definition of Stack
    module Stack (Stack, StackOps(..), ops) where

    newtype Stack a = Stack [a]

    data StackOps a = StackOps { empty :: Stack a
                               , push  :: a -> Stack a -> Stack a
                               , pop   :: Stack a -> Maybe a
                               , shift :: Stack a -> Maybe (Stack a) }
    ops :: StackOps
    ops = ...
Now that we’ve lumped all of our operations into one record, our module is really only exports a type name, and a record of data. We could take a step further still,
    module Stack (Stack, StackOps(..), ops) where

    newtype Stack a = Stack [a]

    data StackOps s a = StackOps { empty :: s a
                                 , push  :: a -> s a -> s a
                                 , pop   :: s a -> Maybe a
                                 , shift :: s a -> Maybe (s a) }
    ops :: StackOps Stack
    ops = ...
Now the only thing that needs to know the internals of Stack. It seems like we could really just smush the definition into ops, why should the rest of the file see our private definition.
    module Stack (StackOps(..), ops) where

    data StackOps s a = StackOps { empty :: s a
                                 , push  :: a -> s a -> s a
                                 , pop   :: s a -> Maybe a
                                 , shift :: s a -> Maybe (s a) }
    ops :: StackOps ???
    ops = ...
Now what should we fill in ??? with? It’s some type, but it’s meant to be chosen by the callee, not the caller. Does that sound familiar? Existential types to the rescue!
    {-# LANGUAGE PolyKinds, KindSignatures, ExistentialQuantification #-}
    module Stack where

    data Packed (f :: k -> k' -> *) a = forall s. Pack (f s a)

    data StackOps s a = StackOps { empty :: s a
                                 , push  :: a -> s a -> s a
                                 , pop   :: s a -> Maybe a
                                 , shift :: s a -> Maybe (s a) }
    ops :: Packed StackOps
    ops = Pack ...
The key difference here is Packed. It lets us take a type function and instantiate it with some type variable and hide our choice from the user. This means that we can even drop the whole newtype from the implementation of ops
    ops :: Packed StackOps
    ops = Pack $ StackOps { empty = []
                          , push  = (:)
                          , pop   = fmap fst . uncons
                          , shift = fmap snd . uncons }
      where uncons [] = Nothing
            uncons (x : xs) = Just (x, xs)
Now that we’ve eliminated the Stack definition from the top level, we can actually just drop the notion that this is in a separate module.
One thing that strikes me as unpleasant is how Packed is defined, we must jump through some hoops to support StackOps being polymorphic in two arguments, not just one.
We could get around this with higher rank polymorphism and making the fields more polymorphic while making the type less so. We could also just wish for type level lambdas or something. Even some of the recent type level lens stuff could be aimed at making a general case definition of Packed.
From the client side this definition isn’t actually so unpleasant to use either.
    {-# LANGUAGE RecordWildCards #-}

    someAdds :: Packed Stack Int -> Maybe Int
    someAdds (Pack Stack{..}) = pop (push 1 empty)
With record wild cards, there’s very little boilerplate to introduce our record into scope. Now we might wonder about using a specific instance rather than abstracting over all possible instantiations.
    someAdds :: Packed Stack Int -> Maybe Int
    someAdds =
      let (Pack Stack{..}) = ops in
        pop (push 1 empty)
The resulting error message is amusing :)
Now we might wonder if we gain anything concrete from this. Did all those language extensions actually do something useful?
Well one mechanical transformation we can make is that we can change our existential type into a CPS-ed higher rank type.
    unpackPacked :: (forall s. f s a -> r) -> Packed f a -> r
    unpackPacked cont (Pack f) = cont f

    someAdds' :: Stack s Int -> Maybe Int
    someAdds' Stack{..} = pop (push 1 empty)

    someAdds :: Packed Stack Int -> Maybe Int
    someAdds = unpackPacked someAdds'
Now we’ve factored out the unpacking of existentials into a function called unpack. This takes a continuation which is parametric in the existential variable, s.
Now our body of someAdds becomes someAdds, but notice something very interesting here, now s is a normal universally quantified type variable. This means we can apply some nice properties we already have used, eg parametricity.
This is a nice effect of translating things to core constructs, all the tools we already have figured out can suddenly be applied.
Wrap Up
Now that we’ve gone through transforming our abstract types in existential ones you can final appreciate at least one more thing: the subtitle on Bob Harper’s blog. You can’t say you didn’t learn something useful :)
I wanted to keep this post short and sweet. In doing this I’m going to some of the more interesting questions we could ask. For the curious reader, I leave you with these

How can we use type classes to prettify our examples?
What can we do to generalize Packed?
How does this pertain to modules? Higher order modules?
How would you implement “sharing constraints” in this model?
What happens when we translate existentials to dependent products?

Cheers.

          
          
          comments powered by Disqus



Introduction to Dependent Types: Off, Off to Agda Land
Danny Gratzer — Sun, 21 Sep 2014 00:00:00 UT

    Posted on September 21, 2014
    


    
    Tags: agda, types
    


First, an apology. Sorry this has take so long to push out. I’ve just started my first semester at Carnegie Mellon. I fully intend to keep blogging, but it’s taken a little while to get my feet under me. Happy readings :)
In this second post of my “intro to dependent types” series we’re going on a whirlwind tour of Agda. Specifically we’re going to look at translating our faux-Haskell from the last post into honest to goodness typecheckable Agda.
There are 2 main reasons to go through the extra work of using a real language rather than pseudo-code

This is typecheckable. I can make sure that all the i’s are dotted and t’s crossed.
It’s a lot cleaner We’re only using the core of Agda so it’s more or less a very stripped down Haskell with a much more expressive but simpler type system.

With that in mind let’s dive in!
What’s the Same
There’s quite a bit of shared syntax between Agda and Haskell, so a Haskeller can usually guess what’s going on.
In Agda we still give definitions in much the same way (single : though)
    thingy : Bool
    thingy = true
where as in Haskell we’d say
    name :: Type
    name = val
In fact, we even get Haskell’s nice syntactic sugar for functions.
    function : A -> B -> ... -> C
    function a b ... = c
Will desugar to a lambda.
    function : A -> B -> ... -> C
    function = \a b ... -> c
One big difference between Haskell and Agda is that, due to Agda’s more expressive type system, type inference is woefully undecidable. Those top level signatures are not optional sadly. Some DT language work a little harder than Agda when it comes to inference, but for a beginner this is a bit of a feature: you learn what the actual (somewhat scary) types are.
And of course, you always give type signatures in Haskell I’m sure :)
Like Haskell function application is whitespace and functions are curried
    -- We could explicitly add parens
    -- foo : A -> (B -> C)
    foo : A -> B -> C
    foo = ...

    a : A
    a = ...

    bar : B -> C
    bar = foo a
Even the data type declarations should look familiar, they’re just like GADTs syntactically.
    data Bool : Set where
      true  : Bool
      false : Bool
Notice that we have this new Set thing lurking in our code. Set is just the kind of normal types, like * in Haskell. In Agda there’s actually an infinite tower of these Bool : Set : Set1 : Set2 ..., but won’t concern ourselves with anything beyond Set. It’s also worth noting that Agda doesn’t require any particular casing for constructors, traditionally they’re lower case.
Pattern matching in Agda is pretty much identical to Haskell. We can define something like
    not : Bool -> Bool
    not true  = false
    not false = true
One big difference between Haskell and Agda is that pattern matching must be exhaustive. Nonexhaustiveness is a compiler error in Agda.
This brings me to another point worth mentioning. Remember that structural induction I mentioned the other day? Agda only allows recursion when the terms we recurse on are “smaller”.
In other words, all Agda functions are defined by structural induction. This together with the exhaustiveness restriction means that Agda programs are “total”. In other words all Agda programs reduce to a single value, they never crash or loop forever.
This can occasionally cause pain though since not all recursive functions are modelled nicely by structural induction! A classic example is merge sort. The issue is that in merge sort we want to say something like
    mergeSort : List Nat -> List Nat
    mergeSort [] = []
    mergeSort (x :: []) = x :: []
    mergeSort xs = let (l, r) = split xs in
                     merge (mergeSort l, mergeSort r)
But wait, how would the typechecker know that l and r are strictly smaller than xs? In fact, they might not be! We know that the length of length xs > 1, but convincing the typechecker of that fact is a pain! In fact, without elaborate trickery, Agda will reject this definition.
So, apart from these restriction for totality Agda has pretty much been a stripped down Haskell. Let’s start seeing what Agda offers over Haskell.
Dependent Types
There wouldn’t be much point in writing Agda if it didn’t have dependent types. In fact the two mechanisms that comprise our dependent types translate wonderfully into Agda.
First we had pi types, remember those?
    foo :: (a :: A) -> B
    foo a = ...
Those translate almost precisely into Agda, where we’d write
    foo : (a : A) -> B
The only difference is the colons! In fact, Agda’s pi types are far more general than what we’d discussed previously. The extra generality comes from what we allow A to be. In our previous post, A was always some normal type with the kind * (Set in Agda). In Agda though, we allow A to be Set itself. In Haskell syntax that would be something like
    foo :: (a :: *) -> B
What could a be then? Well anything with the kind * is a type, like Bool, (), or Nat. So that a is like a normal type variable in Haskell
    foo :: forall a. B
In fact, when we generalize pi types like this, they generalize parametric polymorphism. This is kind of like how we use “big lambdas” in System F to write out polymorphism explicitly.
Here’s a definition for the identity function in Agda.
    id : (A : Set) -> A -> A
    id A a = a
This is how we actually do all parametric polymorphism in Agda, as a specific use of pi types. This comes from the idea that types are also “first class”. We can pass them around and use them as arguments to functions, even dependent arguments :)
Now our other dependently typed mechanism was our generalized generalized algebraic data types. These also translate nicely to Agda.
    data Foo : Bool -> Set where
      Bar : Foo True
We indicate that we’re going to index our data on something the same way we would in Haskell++, by adding it to the type signature on the top of the data declaration.
Agda’s GGADTs also allow us to us to add “parameters” instead of indices. These are things which the data type may use, but each constructor handles uniformly without inspecting it.
For example a list type depends on the type of it’s elements, but it doesn’t poke further at the type or value of those elements. They’re handled “parametrically”.
In Agda a list would be defined as
    data List (A : Set) : Set where
      nil  : List A
      cons : A -> List A -> List A
If your wondering what on earth the difference is, don’t worry! You’ve already in fact used parametric/non-parametric type arguments in Haskell. In Haskell a normal algebraic type can just take several type variables and can’t try to do clever things depending on what the argument is. For example, our definition of lists
    data List a = Cons a (List a) | Nil
can’t do something different if a is Int instead of Bool or something like that. That’s not the case with GADTs though, there we can do clever things like
    data List :: * -> * where
      IntOnlyCons :: Int -> List Int -> List Int
      ...
Now we’re not treating our type argument opaquely, we can figure things out about it depending on what constructor our value uses! That’s the core of the difference between parameters in indices in Agda.
Next let’s talk about modules. Agda’s prelude is absolutely tiny. By tiny I mean essentially non-existant. Because of this I’m using the Agda standard library heavily and to import something in Agda we’d write
import Foo.Bar.Baz
This isn’t the same as a Haskell import though. By default, imports in Agda import a qualified name to use. To get a Haskell style import we’ll use the special shortcut
open import Foo.Bar
which is short for
import Foo.Bar
open Bar
Because Agda’s prelude is so tiny we’ll have to import things like booleans, numbers, and unit. These are all things defined in the standard library, not even the core language. Expect any Agda code we write to make heavy use of the standard library and begin with a lot of imports.
Finally, Agda’s names are somewhat.. unique. Agda and it’s standard library are unicode heavy, meaning that instead of unit we’d type ⊤ and instead of Void we’d use ⊥. Which is pretty nifty, but it does take some getting used to. If you’re familiar with LaTeX, the Emacs mode for Agda allows LaTeX style entry. For example ⊥ can be entered as \bot.
The most common unicode name we’ll use is ℕ. This is just the type of natural numbers as their defined in Data.Nat.
A Few Examples
Now that we’ve seen what dependent types look like in Agda, let’s go over a few examples of their use.
First let’s import a few things
    open import Data.Nat
    open import Data.Bool
Now we can define a few simple Agda functions just to get a feel for how that looks.
    not : Bool -> Bool
    not true  = false
    not false = true

    and : Bool -> Bool -> Bool
    and true b  = b
    and false _ = false

    or : Bool -> Bool -> Bool
    or false b = b
    or true  _ = true
As you can see defining functions is mostly identical to Haskell, we just pattern match and the top level and go from there.
We can define recursive functions just like in Haskell
    plus : ℕ -> ℕ -> ℕ
    plus (suc n) m = suc (plus n m)
    plus zero    m = m
Now with Agda we can use our data types to encode “proofs” of sorts.
For example
    data IsEven : ℕ -> Set where
      even-z : IsEven zero
      even-s  : (n : Nat) -> IsEven n -> IsEven (suc (suc n))
Now this inductively defines what it means for a natural number to be even so that if Even n exists then n must be even. We can also state oddness
    data IsOdd : ℕ -> Set where
      odd-o : IsOdd (suc zero)
      odd-s : (n : ℕ) -> IsOdd n -> IsOdd (suc (suc n))
Now we can construct a decision procedure which produces either a proof of evenness or oddness for all natural numbers.
    open import Data.Sum -- The same thing as Either in Haskell; ⊎ is just Either

    evenOrOdd : (n : ℕ) -> Odd n ⊎ Even n
So we’re setting out to construct a function that, given any n, builds up an appropriate term showing it is either even or odd.
The first two cases of this function are kinda the base cases of this recurrence.
    evenOrOdd zero = inj₁ even-z
    evenOrOdd (suc zero) = inj₂ odd-o
So if we’re given zero or one, return the base case of IsEven or IsOdd as appropriate. Notice that instead of Left or Right as constructors we have inj₁ and inj₂. They serve exactly the same purpose, just with a shinier unicode name.
Now our next step would be to handle the case where we have
    evenOrOdd (suc (suc n)) = ?
Our code is going to be like the Haskell code
    case evenOrOdd n of
      Left evenProof -> Left (EvenS evenProof)
      Right oddProof -> Right (OddS  oddProof)
In words, we’ll recurse and inspect the result, if we get an even proof we’ll build a bigger even proof and if we can an odd proof we’ll build a bigger odd proof.
In Agda we’ll use the with keyword. This allows us to “extend” the current pattern matching by adding an expression to the list of expressions we’re pattern matching on.
    evenOrOdd (suc (suc n)) with evenOrOdd n
    evenOrOdd (suc (suc n)) | inj₁ x = ?
    evenOrOdd (suc (suc n)) | inj₂ y = ?
Now we add our new expression to use for matching by saying ... with evenOrOdd n. Then we list out the next set of possible patterns.
From here the rest of the function is quite straightforward.
    evenOrOdd (suc (suc n)) | inj₁ x = inj₁ (even-s n x)
    evenOrOdd (suc (suc n)) | inj₂ y = inj₂ (odd-s n y)
Notice that we had to duplicate the whole evenOrOdd (suc (suc n)) bit of the match? It’s a bit tedious so Agda provides some sugar. If we replace that portion of the match with ... Agda will just automatically reuse the pattern we had when we wrote with.
Now our whole function looks like
    evenOrOdd : (n : ℕ) -> IsEven n ⊎ IsOdd n
    evenOrOdd zero = inj₁ even-z
    evenOrOdd (suc zero) = inj₂ odd-o
    evenOrOdd (suc (suc n)) with evenOrOdd n
    ... | inj₁ x = inj₁ (even-s n x)
    ... | inj₂ y = inj₂ (odd-s n y)
How can we improve this? Well notice that that suc (suc n) case involved unpacking our Either and than immediately repacking it, this looks like something we can abstract over.
    bimap : (A B C D : Set) -> (A -> C) -> (B -> D) -> A ⊎ B -> C ⊎ D
    bimap A B C D f g (inj₁ x) = inj₁ (f x)
    bimap A B C D f g (inj₂ y) = inj₂ (g y)
If we gave bimap a more Haskellish siganture
    bimap :: forall a b c d. (a -> c) -> (b -> d) -> Either a b -> Either c d
One interesting point to notice is that the type arguments in the Agda function (A and B) also appeared in the normal argument pattern! This is because we’re using the normal pi type mechanism for parametric polymorphism, so we’ll actually end up explicitly passing and receiving the types we quantify over. This messed with me quite a bit when I first starting learning DT languages, take a moment and convince yourself that this makes sense.
Now that we have bimap, we can use it to simplify our evenOrOdd function.
    evenOrOdd : (n : ℕ) -> IsEven n ⊎ IsOdd n
    evenOrOdd zero = inj₁ even-z
    evenOrOdd (suc zero) = inj₂ odd-o
    evenOrOdd (suc (suc n)) =
      bimap (IsEven n) (IsOdd n)
            (IsEven (suc (suc n))) (IsOdd (suc (suc n)))
            (even-s n) (odd-s n) (evenOrOdd n)
We’ve gotten rid of the explicit with, but at the cost of all those explicit type arguments! Those are both gross and obvious. Agda can clearly deduce what A, B, C and D should be from the arguments and what the return type must be. In fact, Agda provides a convenient mechanism for avoiding this boilerplate. If we simply insert _ in place of an argument, Agda will try to guess it from the information it has about the other arguments and contexts. Since these type arguments are so clear from context, Agda can guess them all
    evenOrOdd : (n : ℕ) -> IsEven n ⊎ IsOdd n
    evenOrOdd zero = inj₁ even-z
    evenOrOdd (suc zero) = inj₂ odd-o
    evenOrOdd (suc (suc n)) =
      bimap _ _ _ _ (even-s n) (odd-s n) (evenOrOdd n)
Now at least the code fits on one line! This also raises something interesting, the types are so strict that Agda can actually figure out parts of our programs for us! I’m not sure about you but at this point in time my brain mostly melted :) Because of this I’ll try to avoid using _ and other mechanisms for Agda writing programs for us where I can. The exception of course being situations like the above where it’s necessary for readabilities sake.
One important exception to that rule is for parameteric polymorphism. It’s a royal pain to pass around types explicitly everywhere. We’re going to use an Agda feature called “implicit arguments”. You should think of these as arguments for which the _ is inserted for it. So instead of writing
    foo _ zero zero
We could write
    foo zero zero
This more closely mimicks what Haskell does for its parametric polymorphism. To indicate we want something to be an implicit argument, we just wrap it in {} instead of (). So for example, we could rewrite bimap as
    bimap : {A B C D : Set} -> (A -> C) -> (B -> D) -> A ⊎ B -> C ⊎ D
    bimap f g (inj₁ x) = inj₁ (f x)
    bimap f g (inj₂ y) = inj₂ (g y)
To avoid all those underscores.
Another simple function we’ll write is that if we can construct an IsOdd n, we can build an IsEven (suc n).
    oddSuc : (n : ℕ) -> IsOdd n -> IsEven (suc n)
Now this function has two arguments, a number and a term showing that that number is odd. To write this function we’ll actually recurse on the IsOdd term.
    oddSuc .1 odd-o = even-s zero even-z
    oddSuc .(suc (suc n)) (odd-s n p) = even-s (suc n) (oddSuc n p)
Now if we squint hard and ignore those . terms, this looks much like we’d expect. We build the Even starting from even-s zero even-z. From there we just recurse and talk on a even-s constructor to scale the IsEven term up by two.
There’s a weird thing going on here though, those . patterns. Those are a nifty little idea in Agda that pattern matching on one thing might force another term to be some value. If we know that our IsOdd n is odd-o n must be suc zero. Anything else would just be completely incorrect. To notate these patterns Agda forces you to prefix them with .. You should read .Y as “because of X, this must be Y”.
This isn’t an optional choice though, as . patterns may do several wonky things. The most notable is that they often use pattern variables nonlinearly, notice that n appeared twice in our second pattern clause. Without the . this would be very illegal.
As an exercise to the reader, try to write
    evenSuc : (n : ℕ) -> IsEven n -> IsOdd (suc n)
Wrap Up
That wraps up this post which came out much longer than I expected. We’ve now covered enough basics to actually discuss meaningful dependently typed programs. That’s right, we can finally kiss natural numbers good bye in the next post!
Next time we’ll cover writing a small program but interesting program and use dependent types to assure ourselves of it’s correctness.
As always, please comment with any questions :)

          
          
          comments powered by Disqus



Introduction to Dependent Types: Haskell on Steroids
Danny Gratzer — Mon, 25 Aug 2014 00:00:00 UT

    Posted on August 25, 2014
    


    
    Tags: haskell, types
    


I’d like to start another series of blog posts. This time on something that I’ve wanted to write about for a while, dependent types.
There’s a noticeable lack of accessible materials introducing dependent types at a high level aimed at functional programmers. That’s what this series sets out help fill. Therefore, if you’re a Haskell programmer and don’t understand something, it’s a bug! Please comment so I can help make this a more useful resource for you :)
There are four parts to this series, each answering one question

What are dependent types?
What does a dependently typed language look like?
What does it feel like to write programs with dependent types?
What does it mean to “prove” something?

So first things first, what are dependent types? Most people by now have heard the unhelpful quick answer

A dependent type is a type that depends on a value, not just other types.

But that’s not helpful! What does this actually look like? To try to understand this we’re going to write some Haskell code that pushes us as close as we can get to dependent types in Haskell.
Kicking GHC in the Teeth
Let’s start with the flurry of extensions we need
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE GADTs                #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE UndecidableInstances #-}
Now our first definition is a standard formulation of natural numbers
    data Nat = Z | S Nat
Here Z represents 0 and S means + 1. So you should read S Z as 1, S (S Z) as 2 and so on and so on.
If you’re having some trouble, this function to convert an Int to a Nat might help
    -- Naively assume n >= 0
    toNat :: Int -> Nat
    toNat 0 = Z
    toNat n = S (toNat $ n - 1)
We can use this definition to formulate addition
    plus :: Nat -> Nat -> Nat
    plus Z n     = n
    plus (S n) m = S (plus n m)
This definition proceeds by “structural induction”. That’s a scary word that pops up around dependent types. It’s not all that complicated, all that it means is that we use recursion only on strictly smaller terms.
There is a way to formally define smaller, if a term is a constructor applied to several (recursive) arguments. Any argument to the constructor is strictly smaller than the original terms. In a strict language if we restrict ourselves to only structural recursion we’re guaranteed that our function will terminate. This isn’t quite the case in Haskell since we have infinite structures.
    toInt :: Nat -> Int
    toInt (S n) = 1 + toInt n
    toInt Z     = 0

    bigNumber = S bigNumber

    main = print (toInt bigNumber) -- Uh oh!
Often people will cheerfully ignore this part of Haskell when talking about reasoning with Haskell and I’ll stick to that tradition (for now).
Now back to the matter at hand. Since our definition of Nat is quite straightforward, it get’s promoted to the kind level by DataKinds.
Now we can “reflect” values back up to this new kind with a second GADTed definition of natural numbers.
    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)
Now, let’s precisely specify the somewhat handwavy term “reflection”. I’m using it in the imprecise sense meaning that we’ve lifted a value into something isomorphic at the type level. Later we’ll talk about reflection precisely mean lifting a value into the type level. That’s currently not possible since we can’t have values in our types!
What on earth could that be useful for? Well with this we can do something fancy with the definition of addition.
    type family Plus n m :: Nat where
      Plus Z n     = n
      Plus (S n) m = S (Plus n m)
Now we’ve reflected our definition of addition to the type family. More than that, what we’ve written above is fairly obviously correct. We can now force our value level definition of addition to respect this type family
    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = RS (plus' n m)
Now if we messed up this definition we’d get a type error!
    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = plus' n m -- Unification error! n ~ S n
Super! We know have types that express strict guarantees about our program. But how useable is this?
To put it to the test, let’s try to write some code that reads to integers for standard input and prints their sum.
We can easily do this with our normal plus
    readNat :: IO Nat
    readNat = toNat <$> readLn

    main :: IO ()
    main = plus <$> readNat <*> readNat
Easy as pie! But what about RNat, how can we convert a Nat to an RNat? Well we could try something with type classes I guess
class Reify a where
  type N
  reify :: a -> RNat N
But wait, that doesn’t work since we can only have once instance for all Nats. What if we did the opposite
class Reify (n :: Nat) where
  nat :: RNat n -> Nat
This let’s us go in the other direction.. but that doesn’t help us! In fact there’s no obvious way to propagate runtime values back into the types. We’re stuck.
GHC with Iron Dentures
Now, if we could add some magical extension to GHC could we write something like above program? Yes of course! The key idea is to not reflect up our types with data kinds, but rather just allow the values to exist in the types on their own.
For these I propose two basic ideas

A special reflective function type
Lifting expressions into types

For our special function types, we allow the return type to use the supplied value. These are called pi types. We’ll give this the following syntax
(x :: A) -> B x
Where A :: * and B :: A -> * are some sort of type. Notice that that A in B’s kind isn’t the data kind promoted version, but just the goodness to honest normal value.
Now in order to allow B to actually make use of it’s supplied value, our second idea let’s normal types be indexed on values! Just like how GADTs can be indexed on types. We’ll call these GGADTs.
So let’s define a new version of RNat
    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)
This looks exactly like what we had before, but our semantics are different now. Those Z’s and S’s are meant to represent actual values, not members of some kind. There’s no promoting types to singleton kinds anymore, just plain old values being held in fancier types.
Because we can depend on normal values, we don’t even have to use our simple custom natural numbers.
    data RInt :: Int -> * where
      RZ :: RInt 0
      RS :: RInt n -> RInt (1 + n)
Notice that we allowed our types to call functions, like +. This can potentially be undecidable, something that we’ll address later.
Now we can write our function with a combination of these two ideas
    toRInt :: (n :: Int) -> RInt n
    toRInt 0 = RZ
    toRInt n = RS (toRInt $ n - 1)
Notice how we used pi types to change the return type dependent on the input value. Now we can feed this any old value, including ones we read from standard input.
    main = print . toInt $ plus' <$> fmap toRInt readLn <*> fmap toRInt readLn
Now, one might wonder how the typechecker could possibly know how to handle such things, after all how could it know what’ll be read from stdin!
The answer is that it doesn’t. When a value is reflected to the type level we can’t do anything with it. For example, if we had a type like
    (n :: Int) -> (if n == 0 then Bool else ())
Then we would have to pattern match on n at the value level to propagate information about n back to the type level.
If we did something like
    foo :: (n :: Int) -> (if n == 0 then Bool else ())
    foo n = case n of
      0 -> True
      _ -> ()
Then the typechecker would see that we’re matching on n, so if we get into the 0 -> ... branch then n must be 0. It can then reduce the return type to if 0 == 0 then Bool else () and finally Bool. A very important thing to note here is that the typechecker doesn’t evaluate the program. It’s examining the function in isolation of all other values. This means we sometimes have to hold its hand to ensure that it can figure out that all branches have the correct type.
This means that when we use pi types we often have to pattern match on our arguments in order to help the typechecker figure out what’s going on.
To make this clear, let’s play the typechecker for this function. I’m reverting to the Nat type since it’s nicer for pattern matching.
    toRNat :: (n :: Nat) -> RNat n
    toRNat Z = RZ -- We know that n is `Z` in this branch
    toRNat (S n) = RS (toRNat n {- This has the type RNat n' -})

    p :: (n :: Nat) -> (m :: Int) -> RNat (plus n m)
    p Z m     = toRNat m
    p (S n) m = RS (toRNat n m)
First the type checker goes through toRNat.
In the first branch we have n equals Z, so RZ trivially typechecks. Next we have the case S n.

We know that toRNat n has the type RNat n' by induction
We also know that S n' = n.
Therefore RS builds us a term of type RNat n.

Now for p. We start in much the same manner.
if we enter the p Z m case

we know that n is Z.
we can reduce plus n m since plus Z m is by definition equal to m Look at the definition of plus to confirm this).
We know how to produce RNat m easily since we have a function toRNat :: (n :: Nat) -> RNat n.
We can apply this to m and the resulting term has the type RNat m.

In the RS case we know that we’re trying to produce a term of type RNat (plus (S n) m).

Now since we know that the constructor for the first argument of plus, we can reduce plus (S n) m to S (plus n m) by the definition of plus.
We’re looking to build a term of type plus n m and that’s as simple as a recursive call.
From here we just need to apply RS to give us S (plus n m)
As we previously noted S (plus n m) is equal to plus (S n) m

Notice how as we stepped through this as the typechecker we never needed to do any arbitrary reductions. We only ever reduce definitions when we have the outer constructor (WHNF) of one of the arguments.
While I’m not actually proposing adding {-# LANGUAGE PiTypes #-} to GHC, it’s clear that with only a few orthogonal editions to system F we can get some seriously cool types.
Wrap Up
Believe or not we’ve just gone through two of the most central concepts in dependent types

Indexed type families (GGADTs)
Dependent function types (Pi types)

Not so bad was it? :) From here we’ll look in the next post how to translate our faux Haskell into actual Agda code. From there we’ll go through a few more detailed examples of pi types and GGADTs by poking through some of the Agda standard library.
Thanks for reading, I must run since I’m late for class. It’s an FP class ironically enough.

          
          
          comments powered by Disqus



Equality is Hard
Danny Gratzer — Wed, 06 Aug 2014 00:00:00 UT

    Posted on August  6, 2014
    


    
    Tags: types
    


Equality seems like one of the simplest things to talk about in a theorem prover. After all, the notion of equality is something any small child can intuitively grasp. The sad bit is, while it’s quite easy to hand-wave about, how equality is formalized seems to be a rather complex topic.
In this post I’m going to attempt to cover a few of the main different means of “equality proofs” or identity types and the surrounding concepts. I’m opting for a slightly more informal approach in the hopes of covering more ground.
Definitional Equality
This is not really an equality type per say, but it’s worth stating explicitly what definitional equality is since I must refer to it several times throughout this post.
Two terms A and B are definitional equal is a judgment notated
Γ ⊢ A ≡ B
This is not a user level proof but rather a primitive, untyped judgment in the meta-theory of the language itself. The typing rules of the language will likely include a rule along the lines of
Γ ⊢ A ≡ B, Γ ⊢ x : A
————————————————————–
     Γ ⊢ x : B
So this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker.
Now in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y if x is y only by definition but also by computation.
So in Coq for example
(2 + 2) ≡ 4
Even though “definitionally” these are entirely separate entities. In most theories, definitionally equal means “inlining all definitions and with normalization”, but not all.
In type theories that distinguish between the two, the judgment that when normalized x is y is called judgmental equality. I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts.
Propositional Equality
This is the sort of equality that we’ll spend the rest of our time discussing. Propositional equality is a particular type constructor with the type/kind
Id : (A : Set) → A → A → Type
We should be able to prove a number of definitions like
reflexivity  : (A : Set)(x     : A) → Id x x
symmetry     : (A : Set)(x y   : A) → Id x y → Id y x
transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z
This is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about.
One very important difference is that we can make proofs like
sanity : Id 1 2 → ⊥
Since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!
Intensional
This is arguably the simplest form of equality. Identity types are just normal inductive types with normal induction principles. The most common is equality given by Martin Lof
data Id (A : Set) : A → A → Type where
   Refl : (x : A) → Id x x
This yields a simple induction principle
id-ind : (P : (x y : A) → Id x y → Type)
       → ((x : A) → P x x (Refl x))
       → (x y : A)(p : Id x y) → P x y p
In other words, if we can prove that P holds for the reflexivity case, than P holds for any x and y where Id x y.
We can actually phrase Id in a number of ways, including
data Id (A : Set)(x : A) : A → Set where
  Refl : Id x x
This really makes a difference in the resulting induction principle
j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
  → P x Refl
  → (y : A)(p : Id x y) → P y p
This clearly turned out a bit differently! In particular now P is only parametrized over one value of A, y. This particular elimination is traditionally named j.
These alternative phrasings can have serious impacts on proofs that use them. It also has even more subtle effects on things like heterogeneous equality which we’ll discuss later.
The fact that this only relies on simple inductive principles is also a win for typechecking. Equality/substitution fall straight out of how normal inductive types are handled! This also means that we can keep decidability within reason.
The price we pay of course is that this is much more painful to work with. An intensional identity type means the burden of constructing our equality proofs falls on users. Furthermore, we lose the ability to talk about observational equality.
Observational equality is the idea that two “thingies” are indistinguishable by any test.
It’s clear that we can prove that if Id x y, then f x = f y, but it’s less clear how to go the other way and prove something like
fun_ext : (A B : Set)(f g : A → B)
         → ((x : A) → Id (f x) (g x)) → Id f g
fun_ext f g p = ??
Even though this is clearly desirable. If we know that f and g behave exactly the same way, we’d like our equality to be able to state that. However, we don’t know that f and g are constructed the same way, making this impossible to prove.
This can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following

Coherence
Inductive types
Extensionality
Decidability

Some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory.
In the type theory that we’ve outlined, this isn’t expressible sadly.
Definitional + Extensional
Some type theories go a different route to equality, giving us back the extensionality in the process. One of those type theories is extensional type theory.
In the simplest formulation, we have intensional type theory with a new rule, reflection
Γ ⊢ p : Id x y
——————————–————
  Γ ⊢ x ≡ y
This means that our normal propositional equality can be shoved back into the more magical definitional equality. This gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types!
It isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. For example Martin Lof’s system is undecidable with extensional equality.
It’s worth noting that no extensional type theory is implemented this way. Instead they’ve taken a different approach to defining types themselves!
In this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms.
These PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. So a : T is synonymous with (a, a) ∈ T. Notice that since we are dealing with a PER, we know that ∀ a. (a, a) ∈ T need not hold. This is reassuring, otherwise we’d be able to prove that every type was inhabited by every term!
The actual NuRPL&friends theory is a little more complicated than that. It’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea.
Propositional Extensionality
This is another flavor of extensional type theory which is really just intensional type theory plus some axioms.
We can arrive at this type theory in a number of ways, the simplest is to add axiom K
k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
  → P x (Refl x) → (p : Id x x) → P x p
This says that if we can prove that for any property P, P x (Refl x) holds, then it holds for any proof that Id x x. This is subtly different than straightforward induction on Id because here we’re not proving that a property parameterized over two different values of A, but only one.
This is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id as much meaning.
Using k we can prove that for any p q : Id x y, then Id p q. In Agda notation
    prop : (A : Set)(x y : A)(p q : x ≡ y)
         → p ≡ q
    prop A x .x refl q = k A P (λ _ → refl) x q
      where P : (x : A) → x ≡ x → Set
            P _ p = refl ≡ p
This can be further refined to show that that we can eliminate all proofs that Id x x are Refl x
    rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
    rec A P x .x p refl = p

    rec-refl-is-useless : (A : Set)(P : A → Set)(x : A)
                        → (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq
    rec-refl-is-useless A P x p eq with prop A x x eq refl
    rec-refl-is-useless A P x p .refl | refl = refl
This form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. However, with rec-refl-is–useless we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it.
We essentially have normal propositional equality, but with the knowledge that things can only be equal in 1 way, up to propositional equality!
Heterogeneous Equality
The next form of equality we’ll talk about is slightly different than previous ones. Heterogeneous equality is designed to co-exist in some other type theory and supplement the existing form of equality.
Heterogeneous equality is most commonly defined with John Major equality
    data JMeq : (A B : Set) → A → B → Set where
      JMrefl : (A : Set)(x : A) → JMeq A A x x
This is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal.
Now remember how earlier I’d mentioned that how we phrase these inductive equality types can have a huge impact? We’ll here we can see that because the above definition doesn’t typecheck in Agda!
That’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. We can however, cleverly phrase JMeq so to avoid this
    data JMeq (A : Set) : (B : Set) → A → B → Set where
      JMrefl : (a : A) → JMeq A A a a
Now the constructor avoids quantifying over Set and therefore fits inside the same universe as A and B.
JMeq is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof.
reflect : (A : Set)(x y : A) → JMeq x y → Id x y
This reflection doesn’t look necessary, but arises for similar reasons that dictate that k is unprovable.
It looks like this heterogeneous equality is a lot more trouble than it’s worth at first. It really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove.
If you’re looking for a concrete example, look no further than Observational Equality Now!. This paper gives allows observational equality to be jammed into a principally intensional system!
Wrap Up
So this has been a whirlwind tour through a lot of different type theories. I partially wrote this to gather some of this information in one (free) place. If there’s something here missing that you’d like to see added, feel free to comment or email me.
Thanks to Jon Sterling for proof reading and many subtle corrections :)

          
          
          comments powered by Disqus



Many Shades of Halting Oracles
Danny Gratzer — Wed, 30 Jul 2014 00:00:00 UT

    Posted on July 30, 2014
    


    
    Tags: types
    


I’m going to a take a quick break from arguing with people on the internet to talk about a common point of confusion with theorem provers.
People will often state things like “A program in Coq never diverges” or that “we must prove that X halts”. To an outsider, that sounds impossible! After all, isn’t the halting problem undecidable?
Now the thing to realize is that while yes the halting problem is undecidable, we’re not solving it. The halting problem essentially states

For an arbitrary turing machine P. There is no algorithm guaranteed to terminate that will return true if P halts and false if it diverges.

In theorem provers, we cleverly avoid this road block with two simple tricks. I’m going to discuss these in the context of Coq but these ideas generalize between most theorem provers.
Being Negative
A program in Coq must halt. To do otherwise would introduce a logical inconsistency. So to enforce this we need to statically decide whether some program halts.
We just said that this is impossible though! To escape this paradox Coq opts for a simple idea: reject good programs.
Rather than guaranteeing to return true for every good program, we state that we’ll definitely reject all bad programs and then some.
For example, this termination checker would be logically consistent
    terminates :: CoqProgram -> Bool
    terminates _ = False
It’d be useless of course, but consistent. Coq therefore accepts a certain set of programs which are known to terminate. For example, ones that limit themselves only to guarded coinduction or structural induction.
Getting Our Hands Dirty
While it may be impossible to decide the termination of an arbitrary program, it’s certainly possible to prove the termination of a specific program.
When Coq’s heuristics fail, we can always resort to manually proving that our code will terminate. This may not be pleasant, but it’s certainly doable. By lifting the burden of Coq, we go from “constructing arbitrary proof of termination” to “checking arbitrary proof of termination”, which is decidable.
In Coq we can do this will well founded recursion. Simply put, well founded recursion means that we shift from using only term “size” to decide what’s a smaller recursive call to any nice binary relation. If you’re not interested in Coq specifically, you can check out your preferred proof assistants formalization of well founded recursion.
To this end, we define a relation for some type A : Set, R : A -> A -> Prop. Read R x y as x is smaller than y.
Now we must show that this relation preserves some definition of “sanity”. This should mean that if when a function receives x, for any y so that R y x, we should be able to recurse on y. This should also mean that there’s no infinite stack of terms so that R x y, R z x, R w z …. because this would mean we could recurse infinitely. To capture this idea, we must prove well_founded A R. What’s this “well founded” thing you say?
Well it’s just
Definition well_founded A R := forall a : A, Acc R a
This Acc thing means “accessible”,
Inductive Acc (A : Type) (R : A -> A -> Prop) (x : A) : Prop :=
    Acc_intro : (forall y : A, R y x -> Acc R y) -> Acc R x
So something is accessible in R if everything less than it is also accessible.
We can easily prove that if R is well_founded there is no infinite chain that could lead us to infinite recursion.
Section founded.
  Variable A : Set.
  Variable R : A -> A -> Prop.

  Variable well_founded : well_founded R.

  CoInductive stream :=
  | Cons : A -> stream -> stream.

  CoInductive tower_of_bad : stream -> Prop :=
    OnTop : forall x y rest,
            R y x ->
            tower_of_bad (Cons y rest) ->
            tower_of_bad (Cons x (Cons y rest)).

  Lemma never_on_top :
    forall x, forall rest, ~ tower_of_bad (Cons x rest).
    intro; induction (well_founded x); inversion 1; try subst;
    match goal with
        [H : context[~ _] |- _ ] => eapply H; eauto
    end.
  Qed.

  Theorem no_chains :
    forall xs, ~ tower_of_bad xs.
    destruct 1; eapply never_on_top; eauto.
  Qed.
End founded.
We’re using a powerful trick in never_on_top, we’re inducting upon Acc! This is the key to using well founded recursion. By inducting upon the Acc instead of one of the terms of our function, we can easily recurse on any subterm y, if R y x.
This is handed to us by the lovely Fix (uppercase).
Fix : well_founded R ->
    forall P : A -> Type,
      (forall x : A, (forall y : A, R y x -> P y) -> P x) ->
      forall x : A, P x
So Fix is the better, cooler version of structural recursion that we were after. It lets us recurse on any y where R y x.
So in some sense, you can view Coq’s Fixpoint as just a specialization of Fix where R x y means that x is a subterm of y.
Wrap Up
So in conclusion, theorem provers don’t do the impossible. Rather they have a small battery of tricks to cheat the impossible general case and simplify common cases.
Back to the internet I go.

          
          
          comments powered by Disqus



A Tutorial on Church Representations
Danny Gratzer — Sat, 19 Jul 2014 00:00:00 UT

    Posted on July 19, 2014
    


    
    Tags: haskell, types
    


I’ve written a few times about church representations, but never aimed at someone who’d never heard of what a church representation is. In fact, it doesn’t really seem like too many people have!
In this post I’d like to fix that :)
What is a Church Representation
Simply put, a church representation (CR) is a way of representing a piece of concrete data with a function. The CR can be used through an identical way to the concrete data, but it’s comprised entirely of functions.
They where originally described by Alanzo Church as a way of modeling all data in lambda calculus, where all we have is functions.
Tuples
The simplest CR I’ve found is that of a tuples.
Let’s first look at our basic tuple API
    type Tuple a b = ...
    mkTuple :: a -> b -> Tuple a b
    fst     :: Tuple a b -> a
    snd     :: Tuple a b -> b
Now this is trivially implemented with (,)
    type Tuple a b = (a, b)
    mkTuple = (,)
    fst     = Prelude.fst
    snd     = Prelude.snd
The church representation preserves the interface, but changes all the underlying implementations.
    type Tuple a b = forall c. (a -> b -> c) -> c
There’s our church pair, notice that it’s only comprised of ->. It also makes use of higher rank types. This means that a Tuple a b can be applied to function producing any c and it must return something of that type.
Let’s look at how the rest of our API is implemented
    mkTuple a b = \f -> f a b
    fst tup     = tup (\a _ -> a)
    snd tup     = tup (\_ b -> b)
And that’s it!
It’s helpful to step through some reductions here
    fst (mkTuple 1 2)
    fst (\f -> f 1 2)
    (\f -> f 1 2) (\a _ -> a)
    (\a _ -> a) 1 2
    1
And for snd
    snd (mkTuple True False)
    fst (\f -> f True False)
    (\f -> f True False) (\_ b -> b)
    (\_ b -> b) True false
    False
So we can see that these are clearly morally equivalent. The only real question here is whether, for each CR tuple there exists a normal tuple. This isn’t immediately apparent since the function type for the CR looks a lot more general. In fact, the key to this proof lies in the forall c part, this extra polymorphism let’s us use a powerful technique called “parametricity” to prove that they’re equivalent.
I won’t actually go into such a proof now since it’s not entirely relevant, but it’s worth noting that both (,) and Tuple are completely isomorphic.
To convert between them is pretty straightforward
    isoL :: Tuple a b -> (a, b)
    isoL tup = tup (,)

    isoR :: (a, b) -> Tuple a b
    isoR (a, b) = \f -> f a b
Now that we have an idea of how to church representations “work” let’s go through a few more examples to start to see a pattern.
Booleans
Booleans have the simplest API of all
    type Boolean = ...
    true  :: Boolean
    false :: Boolean
    test  :: Boolean -> a -> a -> a
We can build all other boolean operations on test
    a && b = test a b false
    a || b = test a true b
    when t e = test t e (return ())
This API is quite simple to implement with Bool,
    type Boolean = Bool

    true  = True
    false = False
    test b t e = if b then t else e
But how could we represent this with functions? The answer stems from test,
    type Boolean = forall a. a -> a -> a
Clever readers will notice this is almost identical to test, a boolean get’s two arguments and returns one or the other.
    true  = \a _ -> a
    false = \_ b -> b
    test b t e = b t e
We can write an isomorphism between Bool and Boolean as well
    isoL :: Bool -> Boolean
    isoL b = if b then true else false

    isoR :: Boolean -> Bool
    isoR b = test b True False
Lists
Now let’s talk about lists. One of the interesting things is lists are the first recursive data type we’ve dealt with so far.
Defining the API for lists isn’t entirely clear either. We want a small set of functions that can easily cover any conceivable operations for a list.
The simplest way to do this is to realize that we can do exactly 3 things with lists.

Make an empty list
Add a new element to the front of an existing list
Pattern match on them

We can represent this with 3 functions
    type List a = ...

    nil   :: List a
    cons  :: a -> List a -> List a
    match :: List a -> b -> (a -> List a -> b) -> b
If match looks confusing just remember that
    f list = match list g h
Is really the same as
    f []       = g
    f (x : xs) = h x xs
In this way match is just the pure functional version of pattern matching. We can actually simplify the API by realizing that rather than this awkward match construct, we can use something cleaner.
foldr forms a much more pleasant API to work with since it’s really the most primitive form of “recursing” on a list.
    match :: List a -> (a -> List a -> b) -> b -> b
    match list f b = fst $ foldr list worker (b, nil)
      where worker x (b, xs) = (f x xs, cons x xs)
The especially nice thing about foldr is that it doesn’t mention List a in its two “destruction” functions, all the recursion is handled in the implementation.
We can implement CR lists trivially using foldr
    type List a = forall b. (a -> b -> b) -> b -> b

    nil = \ _ nil -> nil
    cons x xs = \ cons nil -> x `cons` xs cons nil
    foldr list cons nil = list cons nil
Notice that we handle the recursion in the list type by having a b as an argument? This is similar to how the accumulator to foldr gets the processed tail of the list. This is a common technique for handling recursion in our church representations.
Last but not least, the isomorphism arises from foldr (:) [],
    isoL :: List a -> [a]
    isoL l = l (:) []

    isoR :: [a] -> List a
    isoR l f z = foldr f z l
Either
The last case that we’ll look at is Either. Like Pair, Either has 3 different operations.
    type Or a b = ...
    inl :: a -> Or a b
    inr :: b -> Or a b

    or :: Or a b -> (a -> c) -> (b -> c)  -> c
This is pretty easy to implement with Either
    type Or a b = Either a b
    inl = Left
    inr = Right

    or (Left a)  f g = f a
    or (Right b) f g = g b
Once again, the trick to encoding this as a function falls right out of the API. In this case we use the type of or
     type Or a b = forall c. (a -> c) -> (b -> c) -> c

    inl a = \f g -> f a
    inr b = \f g -> g a

    or x = x
Last but not least, let’s quickly rattle off our isomorphism.
    isoL :: Or a b -> Either a b
    isoL o = o Left Right

    isoR o :: Either a b -> Or a b
    isoR o = or o
The Pattern
So now we can talk about the underlying pattern in CRs. First remember that for any type T, we have a list of n distinct constructors T1 T2 T3…Tn. Each of the constructors has a m fields T11, T12, T13…
Now the church representation of such a type T is
    forall c.  (T11 -> T12 -> T13 -> .. -> c)
            -> (T21 -> T22 -> T23 -> .. -> c)
            ...
            -> (Tn1 -> Tn2 -> Tn3 -> .. -> c)
            -> c
This pattern doesn’t map quite as nicely to recursive types. Here we have to take the extra step of substituting all occurrences of T for c in our resulting church representation.
This is actually such a pleasant pattern to work with that I’ve written a library for automatically reifying a type between its church representation and concrete form.
Wrap Up
Hopefully you now understand what a church representation is. It’s worth noting that a lot of stuff Haskellers stumble upon daily are really church representations in disguise.
My favorite example is maybe, this function takes a success and failure continuation with a Maybe and produces a value. With a little bit of imagination, one can realize that this is really just a function mapping a Maybe to a church representation!
If you’re thinking that CRs are pretty cool! Now might be a time to take a look at one of my previous posts on deriving them automagically.

          
          
          comments powered by Disqus



Examining Hackage: extensible-effects
Danny Gratzer — Tue, 15 Jul 2014 00:00:00 UT

    Posted on July 15, 2014
    


    
    Tags: haskell
    


I had a few people tell me after my last post that they would enjoy a write up on reading extensible-effects so here goes.
I’m going to document my process of reading through and understanding how extensible-effects is implemented. Since this is a fairly large library (about 1k) of code, we’re not going over all of it. Rather we’re just reviewing the core modules and enough of the extra ones to get a sense for how everything is implemented.
If you’re curious or still have questions, the modules that we don’t cover should serve as a nice place for further exploration.
Which Modules
extensible-effects comes with quite a few modules, my find query reveals
$ find src -name "*.hs"
  src/Data/OpenUnion1.hs
  src/Control/Eff/Reader/Strict.hs
  src/Control/Eff/Reader/Lazy.hs
  src/Control/Eff/Fresh.hs
  src/Control/Eff/Cut.hs
  src/Control/Eff/Exception.hs
  src/Control/Eff/State/Strict.hs
  src/Control/Eff/State/Lazy.hs
  src/Control/Eff/Writer/Strict.hs
  src/Control/Eff/Writer/Lazy.hs
  src/Control/Eff/Coroutine.hs
  src/Control/Eff/Trace.hs
  src/Control/Eff/Choose.hs
  src/Control/Eff/Lift.hs
  src/Control/Eff.hs
  src/Control/Eff/Reader/Strict.hs
Whew! Well I’m going to take a leap and assume that extensible-effects is similar to the mtl in the sense that there are a few core modules, an then a bunch of “utility” modules. So there’s Control.Monad.Trans and then Control.Monad.State and a bunch of other implementations of MonadTrans.
If we assume extensible-effects is formatted like this, then we need to look at

Data.OpenUnion1
Control.Monad.Eff

And maybe a few other modules to get a feel for how to use these two. I’ve added Data.OpenUnion1 because it’s imported by Control.Monad.Eff so is presumably important.
Since Data.OpenUnion1 is at the top of our dependency DAG, we’ll start with it.
Data.OpenUnion1
So we’re starting with Data.OpenUnion1. If the authors of this code have stuck to normal Haskell naming conventions, that’s an open union of type constructors, stuff with the kind * -> *.
Happily, this module has an export list so we can at least see what’s public.
    module Data.OpenUnion1( Union (..)
                          , SetMember
                          , Member
                          , (:>)
                          , inj
                          , prj
                          , prjForce
                          , decomp
                          , unsafeReUnion
                          ) where
So we’re looking at a data type Union, which we export everything for. Two type classes SetMember and Member, a type operator :>, and a handful of functions, most likely to work with Union.
So let’s figure out exactly what this union thing is
data Union r v = forall t. (Functor t, Typeable1 t) => Union (t v)
So Union r v is just a wrapper around some of functor applied to v. This seems a little odd, what’s this r thing? The docs hint that Member t r should always hold.
Member is a type class of two parameters with no members. In fact, greping the entire source reveals that the entire definition and instances for Member in this code base is
    infixr 1 :>
    data ((a :: * -> *) :> b)

    class Member t r
    instance Member t (t :> r)
    instance Member t r => Member t (t' :> r)
So this makes it a bit clearer, :> acts like a type level cons and Member just checks for membership!
Now Union makes a bit more sense, especially in light of the inj function
    inj :: (Functor t, Typeable1 t, Member t r) => t v -> Union r v
    inj = Union
So Union takes some t in r and hides it away in an existential applied to v. Now this is kinda like having a great nested bunch of Eithers with every t applied to v.
Dual to inj, we can define a projection from a Union to some t in r. This will need to return something wrapped in Maybe since we don’t know which member of r our Union is wrapping.
    prj :: (Typeable1 t, Member t r) => Union r v -> Maybe (t v)
    prj (Union v) = runId <$> gcast1 (Id v)
prj does some evil Typeable casts, but this is necessary since we’re throwing away all our type information with that existential. That Id runId pair is needed since gcast1 has the type
    -- In our case, `c ~ Id`
    gcast1 :: (Typeable t', Typeable t) => c (t a) -> Maybe (c (t' a))
They’re just defined as
    newtype Id a = Id { runId :: a }
      deriving Typeable
so just like Control.Monad.Identity.
Now let’s try to figure out what this SetMember thing is.
    class Member t r => SetMember set (t :: * -> *) r | r set -> t
    instance SetMember set t r => SetMember set t (t' :> r)
This is unhelpful, all we have is the recursive step with no base case! Resorting to grep reveals that our base case is defined in Control.Eff.Lift so we’ll temporarily put this class off until then.
Now the rest of the file is defining a few functions to operate over Unions.
First up is an unsafe “forced” version of prj.
    infixl 4 

    () :: Maybe a -> a -> a
    Just a  _ = a
    _  a = a

    prjForce :: (Typeable1 t, Member t r) => Union r v -> (t v -> a) -> a
    prjForce u f = f <$> prj u  error "prjForce with an invalid type"
prjForce is really exactly what it says on the label, it’s a version of prj that throws an exception if we’re in the wrong state of Union.
Next is a way of unsafely rejiggering the type level list that Union is indexed over.
    unsafeReUnion :: Union r w -> Union t w
    unsafeReUnion (Union v) = Union v
We need this for our last function, decom. This function partially unfolds our Union into an Either
    decomp :: Typeable1 t => Union (t :> r) v -> Either (Union r v) (t v)
    decomp u = Right <$> prj u  Left (unsafeReUnion u)
This provides a way to actually do some sort of induction on r by breaking out each type piece by piece with some absurd case for when we don’t have a :> b.
That about wraps up this little Union library, let’s move on to see how this is actually used.
Control.Eff
Now let’s talk about the core of extensible-effects, Control.Eff. As always we’ll start by taking a look at the export list
    module Control.Eff(
                        Eff (..)
                      , VE (..)
                      , Member
                      , SetMember
                      , Union
                      , (:>)
                      , inj
                      , prj
                      , prjForce
                      , decomp
                      , send
                      , admin
                      , run
                      , interpose
                      , handleRelay
                      , unsafeReUnion
                      ) where
So right away we can see that we’re exporting stuff Data.Union1 as well as several new things, including the infamous Eff.
The first definition we come across in this module is VE. VE is either a simple value or a Union applied to a VE!
    data VE r w = Val w | E !(Union r (VE r w))
Right away we notice that “pure value or X” pattern we see with free monads and other abstractions over effects.
We also include a quick function to try to extract a pure value form Vals
    fromVal :: VE r w -> w
    fromVal (Val w) = w
    fromVal _ = error "extensible-effects: fromVal was called on a non-terminal effect."
Now we’ve finally reached the definition of Eff!
    newtype Eff r a = Eff { runEff :: forall w. (a -> VE r w) -> VE r w }
So Eff bears a striking resemblance to Cont. There are two critical differences though, first is that we specialize our return type to something constructed with VE r. The second crucial difference is that by universally quantifying over w we sacrifice a lot of the power of Cont, including callCC!
Next in Control.Eff is the instances for Eff
    instance Functor (Eff r) where
        fmap f m = Eff $ \k -> runEff m (k . f)
        {-# INLINE fmap #-}

    instance Applicative (Eff r) where
        pure = return
        (<*>) = ap

    instance Monad (Eff r) where
        return x = Eff $ \k -> k x
        {-# INLINE return #-}

        m >>= f = Eff $ \k -> runEff m (\v -> runEff (f v) k)
        {-# INLINE (>>=) #-}
Notice that these are all really identical to Conts instances. Functor adds a function to the head of the continuation. Monad dereferences m and feeds the result into f. Exactly as with Cont.
Next we can look at our primitive function for handling effects
    send :: (forall w. (a -> VE r w) -> Union r (VE r w)) -> Eff r a
    send f = Eff (E . f)
I must admit, this tripped me up for a while. Here’s how I read it, “provide a function, which when given a continuation for the rest of the program expecting an a, produces a side effecting VE r w and we’ll map that into Eff”.
Remember how Union holds functors? Well each of our effects must act like as a functor and wrap itself in that union. By being open, we get the “extensible” in extensible-effects.
Next we look at how to remove effects once they’ve been added to our set of effects. In mtl-land, this is similar to the collection of runFooT functions that are used to gradually strip a layer of transformers away.
The first step towards this is to transform the CPS-ed effectful computation Eff, into a more manageable form, VE
    admin :: Eff r w -> VE r w
    admin (Eff m) = m Val
This is a setup step so that we can traverse the “tree” of effects that our Eff monad built up for us.
Next, we know that we can take an Eff with no effects and unwrap it into a pure value. This is the “base case” for running an effectful computation.
    run :: Eff () w -> w
    run = fromVal . admin
Concerned readers may notice that we’re using a partial function, this is OK since the E case is “morally impossible” since there is no t so that Member t () holds.
Next is the function to remove just one effect from an Eff
    handleRelay :: Typeable1 t
                => Union (t :> r) v -- ^ Request
                -> (v -> Eff r a)   -- ^ Relay the request
                -> (t v -> Eff r a) -- ^ Handle the request of type t
                -> Eff r a
    handleRelay u loop h = either passOn h $ decomp u
      where passOn u' = send (<$> u') >>= loop
Next to send, this function gave me the most trouble. The trick was to realize that that decomp will leave us in two cases.

Some effect producing a v, Union r v
A t producing a v, t v

If we have a t v, then we’re all set since we know exactly how to map that to a Eff r a with h.
Otherwise we need to take this effect, add it back into our computation. send (<$> u') takes the rest of the computation, that continuation and feeds it the v that we know our effects produce. This gives us the type Eff r v, where that outer Eff r contains our most recent effect as well as everything else. Now to convert this to a Eff r a we need to transform that v to an a. The only way to do that is to use the supplied loop function so we just bind to that.
Last but not least is a function to modify an effect somewhere in our effectful computation. A grep reveals will see this later with things like local from Control.Eff.Reader for example.
To do this we want something like handleRelay but without removing t from r. We also need to generalize the type so that t can be anywhere in our. Otherwise we’ll have to prematurally solidify our stack of effects to use something like modify.
    interpose :: (Typeable1 t, Functor t, Member t r)
              => Union r v
              -> (v -> Eff r a)
              -> (t v -> Eff r a)
              -> Eff r a
    interpose u loop h = maybe (send (<$> u) >>= loop) h $ prj u
Now this is almost identical to handleRelay except instead of using decomp which will split off t and only works when r ~ t :> r', we use prj! This gives us a Maybe and since the type of u doesn’t need to change we just recycle that for the send (<$> u) >>= loop sequence.
That wraps up the core of extensible-effects, and I must admit that when writing this I was still quite confused as to actually use Eff to implement new effects. Reading a few examples really helped clear things up for me.
Control.Eff.State
The State monad has always been the sort of classic monad example so I suppose we’ll start here.
    module Control.Eff.State.Lazy( State (..)
                                 , get
                                 , put
                                 , modify
                                 , runState
                                 , evalState
                                 , execState
                                 ) where
So we’re not reusing the State from Control.Monad.State but providing our own. It looks like
    data State s w = State (s -> s) (s -> w)
So what is this supposed to do? Well that s -> w looks a continuation of sorts, it takes the state s, and produces the resulting value. The s -> s looks like something that modify should use.
Indeed this is the case
    modify :: (Typeable s, Member (State s) r) => (s -> s) -> Eff r ()
    modify f = send $ \k -> inj $ State f $ \_ -> k ()

    put :: (Typeable e, Member (State e) r) => e -> Eff r ()
    put = modify . const
we grab the continuation from send and add a State effect on top which uses our modification function s. The continuation that State takes ignores the value it’s passed, the current state, and instead feeds the program computation the () it’s expecting.
get is defined in a similar manner, but instead of modifying the state, we use State’s continuation to feed the program the current state.
    get :: (Typeable e, Member (State e) r) => Eff r e
    get = send (inj . State id)
So we grab the continuation, feed it to a State id which won’t modify the state, and then inject that into our open union of effects.
Now that we have the API for working with states, let’s look at how to remove that effect.
    runState :: Typeable s
             => s                     -- ^ Initial state
             -> Eff (State s :> r) w  -- ^ Effect incorporating State
             -> Eff r (s, w)          -- ^ Effect containing final state and a return value
    runState s0 = loop s0 . admin where
     loop s (Val x) = return (s, x)
     loop s (E u)   = handleRelay u (loop s) $
                           \(State t k) -> let s' = t s
                                           in loop s' (k s')
runState first preps our effect to be pattern matched on with admin. We then start loop with the initial state.
loop has two components, if we have run into a value, then we don’t interpret any effects, just stick the state and value together and return them.
If we do have an effect, we use handleRelay to split out the State s from our effects. To handle the case where we get a VE w, we just loop with the current state. However, if we get a State t k, we update the state with t and pass the continuation k.
From runState evalState and execState.
    evalState :: Typeable s => s -> Eff (State s :> r) w -> Eff r w
    evalState s = fmap snd . runState s

    execState :: Typeable s => s -> Eff (State s :> r) w -> Eff r s
    execState s = fmap fst . runState s
That wraps up the interface for Control.Eff.State. The nice bit is this makes it a lot clearer how to use send, handleRelay and a few other functions from the core.
Control.Eff.Reader
Now we’re on to Reader. The interesting thing here is that local highlights how to use interpose properly.
As always, we start by looking at what exactly this module provides
    module Control.Eff.Reader.Lazy( Reader (..)
                                  , ask
                                  , local
                                  , reader
                                  , runReader
                                  ) where
The definition of Reader is refreshingly simple
    newtype Reader e v = Reader (e -> v)
Keen readers will note that this is just half of the State definition which makes sense; Reader is half of State.
ask is defined almost identically to get
    ask :: (Typeable e, Member (Reader e) r) => Eff r e
    ask = send (inj . Reader)
We just feed the continuation for the program into Reader. A simple wrapper over this gives our equivalent of reads
    reader :: (Typeable e, Member (Reader e) r) => (e -> a) -> Eff r a
    reader f = f <$> ask
Next up is local, which is the most interesting bit of this module.
    local :: (Typeable e, Member (Reader e) r)
          => (e -> e)
          -> Eff r a
          -> Eff r a
    local f m = do
      e <- f <$> ask
      let loop (Val x) = return x
          loop (E u) = interpose u loop (\(Reader k) -> loop (k e))
      loop (admin m)
So local starts by grabbing the view of the environment we’re interested in, e. From there we define our worker function which looks a lot like runState. The key difference is that instead of using handleRelay we use interpose to replace each Reader effect with the appropriate environment. Remember that interpose is not going to remove Reader from the set of effects, just update each Reader effect in the current computation.
Finally, we simply rejigger the computation with admin and feed it to loop.
In fact, this is very similar to how runReader works!
    runReader :: Typeable e => Eff (Reader e :> r) w -> e -> Eff r w
    runReader m e = loop (admin m)
      where
        loop (Val x) = return x
        loop (E u) = handleRelay u loop (\(Reader k) -> loop (k e))
Control.Eff.Lift
Now between Control.Eff.Reader and Control.Eff.State I felt I had a pretty good handle on most of what I’d read in extensible-effects. There was just one remaining loose end: SetMember. Don’t remember what that was? It was a class in Data.OpenUnion1 that was conspicuously absent of detail or use.
I finally found where it seemed to be used! In Control.Eff.Lift.
First let’s poke at the exports of his module
    module Control.Eff.Lift( Lift (..)
                           , lift
                           , runLift
                           ) where
This module is designed to lift an arbitrary monad into the world of effects. There’s a caveat though, since monads aren’t necessarily commutative, the order in which we run them in is very important. Imagine for example the difference between IO (m a) and m (IO a).
So to ensure that Eff can support lifted monads we have to do some evil things. First we must require that we never have to lifted monads and we always run the monad last. This is a little icky but it’s usefulness outweighs such ugliness.
To ensure condition 1, we need SetMember.
    instance SetMember Lift (Lift m) (Lift m :> ())
So we define a new instance of SetMember. Basically this says that any Lift is a SetMember ... r iff Lift m is the last item in r.
To ensure condition number two we define runLift with the more restrictive type
    runLift :: (Monad m, Typeable1 m) => Eff (Lift m :> ()) w -> m w
We can now look into exactly how Lift is defined.
    data Lift m v = forall a. Lift (m a) (a -> v)
So this Lift acts sort of like a “suspended bind”. We postpone actually binding the monad and simulate doing so with a continuation a -> v.
We can define our one operation with Lift, lift.
    lift :: (Typeable1 m, SetMember Lift (Lift m) r) => m a -> Eff r a
    lift m = send (inj . Lift m)
This works by suspending the rest of the program in a our faux binding to be unwrapped later in runLift.
    runLift :: (Monad m, Typeable1 m) => Eff (Lift m :> ()) w -> m w
    runLift m = loop (admin m) where
     loop (Val x) = return x
     loop (E u) = prjForce u $ \(Lift m' k) -> m' >>= loop . k
The one interesting difference between this function and the rest of the run functions we’ve seen is that here we use prjForce. The reason for this is that we know that r is just Lift m :> (). This drastically simplifies the process and means all we’re essentially doing is transforming each Lift into >>=.
That wraps up our tour of the module and with it, extensible-effects.
Wrap Up
This post turned out a lot longer than I’d expected, but I think it was worth it. We’ve gone through the coroutine/continuation based core of extensible-effects and walked through a few different examples of how to actually use them.
If you’re still having some trouble putting the pieces together, the rest of extensible effects is a great collection of useful examples of building effects.
I hope you had as much fun as I did with this one!
Thanks to Erik Rantapaa a much longer post than I led him to believe

          
          
          comments powered by Disqus



Examining Hackage: logict
Danny Gratzer — Thu, 10 Jul 2014 00:00:00 UT

    Posted on July 10, 2014
    


    
    Tags: haskell
    


One of my oldest habits with programming is reading other people’s code. I’ve been doing it almost since I started programming. For the last two years that habit has been focused on Hackage. Today I was reading the source code to the “logic programming monad” provided by logict and wanted to blog about how I go about reading new Haskell code.
This time the code was pretty tiny, find . -name *.hs | xargs wc -l reveals two files with just under 400 lines of code! logict also only has two dependencies, base and the mtl, so there’s not a big worry of unfamiliar libraries.
Setting Up
It’s a lot easier to read this post if you have the source for logict on hand. To grab it, use cabal get. My setup is something like
~ $ cabal get logict
~ $ cd logict-0.6.0.2
~/logict-0.6.0.2 $ cabal sandbox init
~/logict-0.6.0.2 $ cabal install --only-dependencies
Poking Around
I’m somewhat ashamed to admit that I use pretty primitive tooling for exploring a new codebase, it’s grep and find all the way! If you use a fancy IDE, perhaps you can just skip this section and take a moment to sit back and feel high-tech.
First things first is to figure out what Haskell files are here. It can be different than what’s listed on Hackage since often libraries don’t export external files.
~/logict-0.6.0.2 $ find . -name *.hs
  ./dist/build/autogen/Paths_logict.hs
  ./Control/Monad/Logic.hs
  ./Control/Monad/Logic/Class.hs
Alright, there’s two source file and one sitting in dist. The dist one is almost certainly just cabal auto-gened stuff that we don’t care about.
It also appears that there’s no src directory and every module is publicly exported! This means that we only have two modules to worry about.
The next thing to figure out is which to read first. In this case the choice is simple: greping for imports with
grep "import" -r Control
reveals that Control.Monad.Logic imports Control.Monad.Logic.Class so we start with *.Class.
Reading Control.Monad.Logic.Class
Alright! Now it’s actually time to start reading code.
The first thing that jumps out is the export list
    module Control.Monad.Logic.Class (MonadLogic(..), reflect, lnot) where
Alright, so we’re exporting everything from a class MonadLogic, as well as two functions reflect and lnot. Let’s go figure out what MonadLogic is.
    class (MonadPlus m) => MonadLogic m where
      msplit     :: m a -> m (Maybe (a, m a))
      interleave :: m a -> m a -> m a
      (>>-)      :: m a -> (a -> m b) -> m b
      ifte       :: m a -> (a -> m b) -> m b -> m b
      once       :: m a -> m a
The fact that this depends on MonadPlus is pretty significant. Since most classes don’t require this I’m going to assume that it’s fairly key to either the implementation of some of these methods or to using them. Similar to how Monoid is critical to Writer.
The docs make it pretty clear what each member of this class does

msplit
Take a local computation and split it into it’s first result and another computation that computes the rest.
interleave
This is the key difference between MonadLogic and []. interleave gives fair choice between two computation. This means that every result that appears in finitely many applications of msplit for some a and b, will appear in finitely many applications of msplit to interleave a b.
>>-
>>- is similar to interleave. Consider some code like
  (a >>= k) `mplus` (b >>= k)
This is equivalent to mplus a b >>= k, but has different characteristics since >>= might never terminate. >>- is described as “considering both sides of the disjunction”.
I have absolutely no idea what that means.. hopefully it’ll be clearer once we look at some implementations.
ifte
This is the equivalent of Prolog’s soft cut. We poke a logical computation and if it can succeed at all, then we feed it into the success computation, otherwise we’ll feed return the failure case.
once
once is clever combinator to prevent backtracking. It will grab the first result from a computation, wrap it up and return it. This prevents backtracking further on the original computation.

Now the docs also state that everything is derivable from msplit. These implementations look like
    interleave m1 m2 = msplit m1 >>=
                        maybe m2 (\(a, m1') -> return a `mplus` interleave m2 m1')

    m >>- f = do (a, m') <- maybe mzero return =<< msplit m
                 interleave (f a) (m' >>- f)

    ifte t th el = msplit t >>= maybe el (\(a,m) -> th a `mplus` (m >>= th))

    once m = do (a, _) <- maybe mzero return =<< msplit m
                return a
The first thing I notice looking at interleave is that it kinda looks like
    interleave' :: [a] -> [a] -> [a]
    interleave' (x:xs) ys = x : interleave' ys xs
    interleave _ ys       = ys
This makes sense, since this will fairly split between xs and ys just like interleave is supposed to. Here msplit is like pattern matching, mplus is :, and we have to sprinkle some return in there for kicks and giggles.
Now about this mysterious >>-, the biggest difference is that each f a is interleaved, rather than mplus-ed. This should mean that it can be fairly split between our first result, f a and the rest of them m' >>- f. Now if we can do something like
    (m >>- f) `interleave` (m' >>- f)
Should have nice and fair behavior.
The next two are fairly clear, ifte splits it’s computation, and if it can it feeds the whole stinking thing return amplusm' to the success computation, otherwise it just returns the failure computation. Nothing stunning.
once is my favorite function. To prevent backtracking all we do is grab the first result and return it.
So that takes care of MonadTrans. The next thing to worry about are these two functions reflect and lnot.
reflect confirms my suspicion that the dual of msplit is mplus (return a) m'.
    reflect :: MonadLogic m => Maybe (a, m a) -> m a
    reflect Nothing = mzero
    reflect (Just (a, m)) = return a `mplus` m
The next function lnot negates a logical computation. Now, this is a little misleading because the negated computation either produces one value, (), or is mzero and produces nothing. This is easily accomplished with ifte and once
    lnot :: MonadLogic m => m a -> m ()
    lnot m = ifte (once m) (const mzero) (return ())
That takes care of most of this file. What’s left is a bunch of instances for monad transformers for MonadTrans. There’s nothing to interesting in them so I won’t talk about them here. It might be worth glancing at the code if you’re interested.
One slightly odd thing I’m noticing is that each class implements all the methods, rather than just msplit. This seems a bit odd.. I guess the default implementations are significantly slower? Perhaps some benchmarking is in order.
Control.Monad.Logic
Now that we’ve finished with Control.Monad.Logic.Class, let’s move on to the main file.
Now we finally see the definition of LogicT
    newtype LogicT m a =
        LogicT { unLogicT :: forall r. (a -> m r -> m r) -> m r -> m r }
I have no idea how this works, but I’m guessing that this is a church version of [a] specialized to some m. Remember that the church version of [a] is
    type CList a = forall r. (a -> r -> r) -> r -> r
Now what’s interesting here is that the church version is strongly connected to how CPSed code works. We could than imagine that mplus works like cons for church lists and yields more and more results. But again, this is just speculation.
This suspicion is confirmed by the functions to extract values out of a LogicT computation
    observeT :: Monad m => LogicT m a -> m a
    observeT lt = unLogicT lt (const . return) (fail "No answer.")

    observeAllT :: Monad m => LogicT m a -> m [a]
    observeAllT m = unLogicT m (liftM . (:)) (return [])

    observeManyT :: Monad m => Int -> LogicT m a -> m [a]
    observeManyT n m
        | n <= 0 = return []
        | n == 1 = unLogicT m (\a _ -> return [a]) (return [])
        | otherwise = unLogicT (msplit m) sk (return [])
     where
     sk Nothing _ = return []
     sk (Just (a, m')) _ = (a:) `liftM` observeManyT (n-1) m'
observeT grabs the a from the success continuation and if no result is returned than it will evaluate fail "No Answer which looks like the failure continuation! Looks like out suspicion is confirmed, we’re dealing with monadic church lists or some other permutation of those buzzwords.
Somehow in a package partially designed by Oleg I’m not surprised to find continuations :)
observeAllT is quite similar, notice that we take advantage of the fact that r is universally quantified to instantiate it to a. This quantification is also used in observeManyT. This quantification also prevents any LogicT from taking advantage of the return type to do evil things with returning random values that happen to match the return type. This is what’s possible with ContT for example.
Now we have the standard specialization and smart constructor for the non-transformer version.
    type Logic = LogicT Identity

    logic :: (forall r. (a -> r -> r) -> r -> r) -> Logic a
    logic f = LogicT $ \k -> Identity .
                             f (\a -> runIdentity . k a . Identity) .
                             runIdentity
Look familiar? Now we can inject real church lists into a Logic computation. I suppose this shouldn’t be surprising since [a] functions like a slightly broken Logic a, without any sharing or soft cut.
Now we repeat all the observe* functions for Logic, I’ll omit these since they’re implementations are exactly as you’d expect and not interesting.
Next we have a few type class instances
    instance Functor (LogicT f) where
        fmap f lt = LogicT $ \sk fk -> unLogicT lt (sk . f) fk

    instance Applicative (LogicT f) where
        pure a = LogicT $ \sk fk -> sk a fk
        f <*> a = LogicT $ \sk fk -> unLogicT f (\g fk' -> unLogicT a (sk . g) fk') fk

    instance Alternative (LogicT f) where
        empty = LogicT $ \_ fk -> fk
        f1 <|> f2 = LogicT $ \sk fk -> unLogicT f1 sk (unLogicT f2 sk fk)

    instance Monad (LogicT m) where
        return a = LogicT $ \sk fk -> sk a fk
        m >>= f = LogicT $ \sk fk -> unLogicT m (\a fk' -> unLogicT (f a) sk fk') fk
        fail _ = LogicT $ \_ fk -> fk
It helps for reading this if you expand sk to “success continuation” and fk to “fail computation”. Since we’re dealing with church lists I suppose you could also use cons and nil.
What’s particularly interesting to me here is that there are no constraints on m for these type class declarations! Let’s go through them one at a time.
Functor is usually pretty mechanical, and this is no exception. Here we just have to change a -> m r -> m r to b -> m r -> m r. This is trivial just by composing the success computation with f.
Applicative is similar. pure just lifts a value into the church equivalent of a singleton list, [a]. <*> is a little bit more meaty, we first unwrap f to it’s underlying function g, and composes it with out successes computation for a. Notice that this is very similar to how Cont works, continuation passing style is necessary with church representations.
Now return and fail are pretty straightforward. Though this is interesting because since pattern matching calls fail, we can just do something like
    do
      Just a <- m
      Just b <  n
      return $ a + b
And we’ll run n and m until we get a Just value.
As for >>=, it’s implementation is very similar to <*>. We unwrap m and then feed the unwrapped a into f and run that with our success computations.
We’re only going to talk about one more instance for LogicT, MonadLogic, there are a few others but they’re mostly for MTL use and not too interesting.
    instance (Monad m) => MonadLogic (LogicT m) where
        msplit m = lift $ unLogicT m ssk (return Nothing)
         where ssk a fk = return $ Just (a, (lift fk >>= reflect))
We’re only implementing msplit here, which strikes me as a bit odd since we implemented everything before. We also actually need Monad m here so that we can use LogicT’s MonadTrans instance.
To split a LogicT, we run a special success computation and return Nothing if failure is ever called. Now there’s one more clever trick here, since we can choose what the r is in m r, we choose it to be Maybe (a, LogicT m a)! That way we can take the failure case, which essentially is just the tail of the list, and push it into reflect.
This confused me a bit so I wrote the equivalent version for church lists, where msplit is just uncons.
    {-# LANGUAGE RankNTypes #-}

    newtype CList a = CList {runCList :: forall r. (a -> r -> r) -> r -> r}

    cons :: a -> CList a -> CList a
    cons a (CList list) = CList $ \cs nil -> cs a (list cs nil)

    nil :: CList a
    nil = CList $ \cons nil -> nil

    head :: CList a -> Maybe a
    head list = runCList list (const . Just) Nothing

    uncons :: CList a -> Maybe (a, CList a)
    uncons (CList list) = list skk Nothing
      where skk a rest = Just (a, maybe nil (uncurry cons) rest)
Now it’s a bit clearer what’s going on, skk just pairs up the head of the list with the rest. However, since the tail of the list has the type m (Maybe (a, LogicT m a)), we lift it back into the LogicT monad and use reflect to smush it back into a good church list.
That about covers Control.Monad.Logic
Wrap Up
I’ve never tried sharing these readings before so I hope you enjoyed it. If this receives some positive feedback I’ll do something similar with another package, I’m leaning towards extensible-effects.
If you’re interested in doing this yourself, I highly recommend it! I’ve learned a lot about practical engineering with Haskell, as well as really clever and elegant Haskell code.
One thing I’ve always enjoyed about the Haskell ecosystem is that some of the most interesting code is often quite easy to read given some time.

          
          
          comments powered by Disqus



Dissecting crush
Danny Gratzer — Wed, 09 Jul 2014 00:00:00 UT

    Posted on July  9, 2014
    


    
    Tags: coq, types
    


For almost a year and half now I’ve been referencing one particular book on Coq, Certified Programming with Dependent Types. CPDT is a literate program on building practical things with Coq.
One of the main ideas of CPDT is that proofs ought to be fully automated. This means that a proof should be primarily a logic program (Ltac) which constructs some boring and large proof term. To this end, CPDT has a bunch of Ltac “tactics” for constructing such logic programs.
Since CPDT is a program, there’s actual working source for each of these tactics. It occurred to me today that in my 18 months of blinking uncomprehendingly at CPDT, I’ve never read its source for these tactics.
In this post, we’ll dissect how CPDT’s main tactic for automation, crush, actually works. In the process, we’ll get the chance to explore some nice, compositional, ltac engineering as well as a whole host of useful tricks.
The Code
The first step to figuring out of crush works is actually finding where it’s defined.
After downloading the source to CPDT I ran
grep "Ltac crush :=" -r .
And found in src/CpdtTactics, line 205
Ltac crush := crush' false fail.
Glancing at crush', I’ve noticed that it pulls in almost every tactic in CpdtTactics. Therefore, we’ll start at the top of this file and work our way done, dissecting each tactic as we go.
Incidentally, since CpdtTactics is an independent file, if you’re confused about something firing up your coq dev environment of choice and trying things out with Goal inline works nicely.
Starting from the top, our first tactic is inject.
Ltac inject H := injection H; clear H; intros; try subst.
This is just a quick wrapper around injection, which also does the normal operations one wants after calling injection. It clears the original hypothesis and brings our new equalities into our environment so future tactics can use them. It also tries to swap out any variables with our new equalities using subst. Notice the try wrapper since subst is one of those few tactics that will fail if it can’t do anything useful.
Next up is
Ltac appHyps f :=
  match goal with
    | [ H : _ |- _ ] => f H
  end.
appHyps makes use of the backtracking nature of match goal with. It’ll apply f to every hypothesis in the current environment and stop once it find a hypothesis f works with.
Now we get to some combinators for working with hypothesis.
Ltac inList x ls :=
  match ls with
    | x => idtac
    | (_, x) => idtac
    | (?LS, _) => inList x LS
  end.
inList takes a faux-list of hypothesis and looks for an occurrence of a particular lemma x. When it finds it we just run idtac which does nothing. In the case were we can’t match x anywhere, inList will just fail with the standard “No matching clause” message.
Next we have the equivalent of appHyps for tupled lists
Ltac app f ls :=
  match ls with
    | (?LS, ?X) => f X || app f LS || fail 1
    | _ => f ls
  end.
This works exactly like appHyps but instead of looking through the proofs environment, we’re looking through ls. It has the same “keep the first result that works” semantics too. One thing that confused me was the _ => f ls clause of this tactic. Remember that with our tupled lists we don’t have a “nil” member. But rather the equivalent of
A :: B :: C :: Nil
is
((A, B), C)
So when we don’t have a pair, ls itself is the last hypothesis in our list. As a corollary of this, there is no obvious “empty” tupled list, only one with a useless last hypothesis.
Next we have all, which runs f on every member in f ls.
Ltac all f ls :=
  match ls with
    | (?LS, ?X) => f X; all f LS
    | (_, _) => fail 1
    | _ => f ls
  end.
Careful readers will notice that instead of f X || ... we use ;. Additionally, if the first clause fails and the second clause matches, that means that either f X or all f LS failed. In this case we backtrack all the way back out of this clause. This should mean that this is a “all or nothing” tactic. It will either not fail on all members of ls or nothing at all will happen.
Now we get to the first big tactic
Ltac simplHyp invOne :=
  let invert H F :=
    inList F invOne;
      (inversion H; fail)
      || (inversion H; [idtac]; clear H; try subst) in

  match goal with
    | [ H : ex _ |- _ ] => destruct H
    | [ H : ?F ?X = ?F ?Y |- ?G ] =>
      (assert (X = Y); [ assumption | fail 1 ])
      || (injection H;
        match goal with
          | [ |- X = Y -> G ] =>
            try clear H; intros; try subst
        end)
    | [ H : ?F ?X ?U = ?F ?Y ?V |- ?G ] =>
      (assert (X = Y); [ assumption
        | assert (U = V); [ assumption | fail 1 ] ])
      || (injection H;
        match goal with
          | [ |- U = V -> X = Y -> G ] =>
            try clear H; intros; try subst
        end)

    | [ H : ?F _ |- _ ] => invert H F
    | [ H : ?F _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ _ |- _ ] => invert H F

    | [ H : existT _ ?T _ = existT _ ?T _ |- _ ] => generalize (inj_pair2 _ _ _ _ _ H); clear H
    | [ H : existT _ _ _ = existT _ _ _ |- _ ] => inversion H; clear H
    | [ H : Some _ = Some _ |- _ ] => injection H; clear H
  end.
Wow, just a little bit bigger than what we’ve been working with so far.
The first small chunk of simpleHyp is a tactic for doing clever inversion using the tuple list invOne.
 invert H F :=
   inList F invOne;
   (inversion H; fail)
     || (inversion H; [idtac]; clear H; try subst)
Here H is a hypothesis that we’re thinking about inverting on and F is the head symbol of H. First we run the inList predicate, meaning that we don’t invert upon anything that we don’t want to. If the head symbol of H is something worth inverting upon we try two different types of inversion.
In the first case inversion H; fail we’re just looking for an “easy proof” where inverting H immediately dispatches the current goal. In the second case inversion H; [idtac]; clear H; try subst, we invert upon H iff it only generates 1 subgoal. Remember that [t | t' | t''] is a tactic that runs t on the first subgoal, t’ on the second, and so on. If the number of goals don’t match, [] will fail. So [idtac] is just a clever way of saying “there’s only one new subgoal”. Next we get rid of the hypothesis we just inverted on (it’s not useful now, and we don’t want to try inverting it again) and see if any substitutions are applicable.
Alright! Now let’s talk about the massive match goal with going on in simplHyp.
The first branch is
    | [ H : ex _ |- _ ] => destruct H
This just looks for a hypothesis with an existential (remember that ex is what exists desugars to). If we find one, we introduce a new variable to our environment and instantiate H with it. The fact that this doesn’t recursively call simplHyp probably means that we want to do something like repeat simplHyp to ensure this is applied everywhere.
Next we look at simplifying hypothesis where injection applies. There are two almost identical branches, one for constructors of two parameters, one for one. Let’s look at the latter since it’s slightly simpler.
    | [ H : ?F ?X = ?F ?Y |- ?G ] =>
      (assert (X = Y); [ assumption | fail 1 ])
      || (injection H;
        match goal with
          | [ |- X = Y -> G ] =>
            try clear H; intros; try subst
        end)
This looks for an equality over a constructor F. This branch is looking to prove that X = Y, a fact deducible from the injectiveness of F.
The way that we go about doing this is actually quite a clever ltac trick though. First we assert X = Y, this will generate to subgoals, the first that X = Y (shocker) and the second is the current goal G, with the new hypothesis that X = Y. We attempt to prove that X = Y by assumption. If this works, than we already trivially can deduce X = Y so there’s no point in doing all that injection stuff so we fail 1 and bomb out of the whole branch.
If assumption fails we’ll jump to the other side of the ||s and actually use injection. We only run injection if it generates a proof that X = Y in which case we do the normal cleanup with trying to clear our original fact and do some substitution.
The next part is fairly straightforward, we make use of that invert tactic and run it over facts we have floating around in our environment
    | [ H : ?F _ |- _ ] => invert H F
    | [ H : ?F _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ _ |- _ ] => invert H F
Notice that we can now use the match to grab the leading symbol for H so we only invert upon hypothesis that we think will be useful.
Next comes a bit of axiom-fu
    | [ H : existT _ ?T _ = existT _ ?T _ |- _ ] =>
        generalize (inj_pair2 _ _ _ _ _ H); clear H
inj_pair2 is function that lives in the Coq standard library and has the type
forall (U : Type) (P : U -> Type) (p : U) (x y : P p),
       existT P p x = existT P p y -> x = y
This relies on eq_rect_eq so it’s just a little bit dodgy for something like HoTT where we give more rope to = than just refl.
This particular branch of the match is quite straightforward though. Once we see an equality between two witnesses for the same existential type, we just generalize the equality between their proofs into our goal.
If this fails however, we’ll fall back to standard inversion with
    | [ H : existT _ _ _ = existT _ _ _ |- _ ] => inversion H; clear H
Finally, we have one last special case branch for Some. This is because the branches above will fail when phased with a polymorphic constructor
    | [ H : Some _ = Some _ |- _ ] => injection H; clear H
Nothing exciting going on there.
So that wraps up simplHyp. It’s just a conglomeration of useful stuff to do to constructors in our hypothesis.
Onwards we go! Next is a simple tactic for automatically rewriting with a hypothesis
Ltac rewriteHyp :=
  match goal with
    | [ H : _ |- _ ] => rewrite H by solve [ auto ]
  end.
like most of the other tactics we saw earlier, this will hunt for an H where this works and then stop. The by solve [auto] will run solve [auto] against all the hypothesis that the rewrite generates and ensure that auto solves all the new goals. This prevents a rewrite from going and introducing obviously false facts as goals for a rewrite that made no sense.
We can combine this with autorewrite with two simple tactics
Ltac rewriterP := repeat (rewriteHyp; autorewrite with core in *).
Ltac rewriter := autorewrite with core in *; rewriterP.
This just repeatedly rewrite with autorewrite and rewriteHyp as long as they can. Worth noticing here how we can use repeat to make these smaller tactics modify all applicable hypothesis rather than just one.
Next up is an innocent looking definition that frightens me a little bit
Definition done (T : Type) (x : T) := True.
What frightens me about this is that Adam calls this “devious”.. and when he calls something clever or devious I’m fairly certain I’d never be able to come up with it :)
What this actually appears to do is provide a simple way to “stick” something into an environment. We can trivially prove done T x for any T and x but having this in an environment also gives us a proposition T and a ready made proof of it x! This is useful for tactics since we can do something like
assert (done SomethingUseful usefulPrf) by constructor
and viola! Global state without hurting anything.
We use these in the next tactic, instr.
Ltac inster e trace :=
  match type of e with
    | forall x : _, _ =>
      match goal with
        | [ H : _ |- _ ] =>
          inster (e H) (trace, H)
        | _ => fail 2
      end
    | _ =>
      match trace with
        | (_, _) =>
          match goal with
            | [ H : done (trace, _) |- _ ] =>
              fail 1
            | _ =>
              let T := type of e in
                match type of T with
                  | Prop =>
                    generalize e; intro;
                      assert (done (trace, tt)) by constructor
                  | _ =>
                    all ltac:(fun X =>
                      match goal with
                        | [ H : done (_, X) |- _ ] => fail 1
                        | _ => idtac
                      end) trace;
                    let i := fresh "i" in (pose (i := e);
                      assert (done (trace, i)) by constructor)
                end
          end
      end
  end.
Another big one!
This match is a little different than the previous ones. It’s not a match goal but a match type of ... with. This is used to examine one particular hypothesis’ type and match over that.
This particular match has two branches. The first deals with the case where we have uninstantiated universally quantified variables.
 | forall x : _, _ =>
    match goal with
      | [ H : _ |- _ ] =>
        inster (e H) (trace, H)
      | _ => fail 2
    end
If our hypothesis does, we randomly grab a hypothesis, instantiate e with it, add H to the trace list, and then recurse.
If there isn’t a hypothesis, then we fail out of the toplevel match and exit the tactic.
Now the next branch is where the real work happens
  | _ =>
    match trace with
      | (_, _) =>
        match goal with
          | [ H : done (trace, _) |- _ ] =>
            fail 1
          | _ =>
            let T := type of e in
              match type of T with
                | Prop =>
                  generalize e; intro;
                    assert (done (trace, tt)) by constructor
                | _ =>
                  all ltac:(fun X =>
                    match goal with
                      | [ H : done (_, X) |- _ ] => fail 1
                      | _ => idtac
                    end) trace;
                  let i := fresh "i" in (pose (i := e);
                    assert (done (trace, i)) by constructor)
              end
         end
      end
We first chekc to make sure that trace isn’t empty. If this is the case, then we know that we instantiated e with at least something. If we have, we snoop around to see if there’s a done in our environment with the same trace. If this is the case, we know that we’ve done an identical instantiation of e before hand so we backtrack to try another one.
Otherwise, we look to see what e was instantiated too. If it was a simple Prop, we just stick a done record of this instantiation into our environment and add our new instantiated e back in with generalize. If e isn’t a proof, we do the same thing. In this case, however, we must also double check that the things we used to instantiate e with aren’t results of inster as well otherwise our combination of backtracking/instantiating can lead to an infinite loop.
Since this tactic generates a bunch of done’s that are otherwise useless, a tactic to clear them is helpful.
Ltac un_done :=
  repeat match goal with
           | [ H : done _ |- _ ] => clear H
         end.
Hopefully by this point this isn’t too confusing. All this tactic does is loop through the environment and clear all dones.
Now, finally, we’ve reached crush'.
Ltac crush' lemmas invOne :=
  let sintuition := simpl in *; intuition; try subst;
    repeat (simplHyp invOne; intuition; try subst); try congruence in

  let rewriter := autorewrite with core in *;
    repeat (match goal with
              | [ H : ?P |- _ ] =>
                match P with
                  | context[JMeq] => fail 1
                  | _ => rewrite H by crush' lemmas invOne
                end
            end; autorewrite with core in *) in

    (sintuition; rewriter;
      match lemmas with
        | false => idtac            | _ =>
          (** Try a loop of instantiating lemmas... *)
          repeat ((app ltac:(fun L => inster L L) lemmas
          (** ...or instantiating hypotheses... *)
            || appHyps ltac:(fun L => inster L L));
          (** ...and then simplifying hypotheses. *)
          repeat (simplHyp invOne; intuition)); un_done
      end;
      sintuition; rewriter; sintuition;
      try omega; try (elimtype False; omega)).
crush' is really broken into 3 main components.
First is a simple tactic sintuition
sintuition := simpl in *; intuition; try subst;
    repeat (simplHyp invOne; intuition; try subst); try congruence
So this first runs the normal set of “generally useful tactics” and then breaks out some of first custom tactics. This essentially will act like a souped-up version of intuition and solve goals that are trivially solvable with straightforward inversions and reductions.
Next there’s a more powerful version of rewriter
rewriter := autorewrite with core in *;
    repeat (match goal with
              | [ H : ?P |- _ ] =>
                match P with
                  | context[JMeq] => fail 1
                  | _ => rewrite H by crush' lemmas invOne
                end
            end; autorewrite with core in *)
This is almost identical to what we have above but instead of solving side conditions with solve [auto], we use crush' to hopefully deal with a larger number of possible rewrites.
Finally, we have the main loop of crush'.
(sintuition; rewriter;
  match lemmas with
    | false => idtac
    | _ =>
      repeat ((app ltac:(fun L => inster L L) lemmas
        || appHyps ltac:(fun L => inster L L));
      repeat (simplHyp invOne; intuition)); un_done
  end;
  sintuition; rewriter; sintuition;
try omega; try (elimtype False; omega)).
Here we run the sintuition and rewriter and then get to work with the lemmas we supplied in lemmas.
The first branch is just a match on false, which we use like a nil. Since we have no hypothesis we don’t do anything new.
If we do have lemmas, we try instantiating both them and our hypothesis as many times as necessary and then repeatedly simplify the results. This loop will ensure that we make full use of bot our supplied lemmas and the surrounding environment.
Finally, we make another few passes with rewriter and sintuition attempting to dispatch our goal using our new, instantiated and simplified environment.
As a final bonus, if we still haven’t dispatched our goal, we’ll run omega to attempt to solve a Presburger arithmetic. On the off chance that we have something omega can be contradictory, we also try elimType false; omega to try to exploit such a contradiction.
So all crush does is call this tactic with no lemmas (false) and no suggestions to invert upon (fail). There you have it, and it only took 500 lines to get here.
Wrap Up
So that’s it, hopefully you got a few useful Ltac trick out of reading this. I certainly did writing it :)
If you enjoyed these tactics, there’s a more open-source version of these tactics, on the CPDT website. It might also interest you to read the rest of CpdtTactics.v since it has some useful gems like dep_destruct.
Last but not least, if you haven’t read CPDT itself and you’ve made it this far, go read it! It’s available as either dead-tree or online. I still reference it regularly so I at least find it useful. It’s certainly better written than this post :)
Note, all the code I’ve shown in this post is from CPDT and is licensed under ANCND license. I’ve removed some comments from the code where they wouldn’t render nicely with them.

          
          
          comments powered by Disqus



Some Useful Agda
Danny Gratzer — Sat, 28 Jun 2014 00:00:00 UT

    Posted on June 28, 2014
    


    
    Tags: agda, types
    


I’ve been using Agda for a few months now. I’ve always meant to figure out how it handles IO but never have.
Today I decided to change that! So off I went to the related Agda wiki page. So hello world in Agda apparently looks like this
    open import IO

    main = run (putStrLn "test")
The first time I tried running this I got an error about an IO.FFI, if you get this you need to go into your standard library and run cabal install in the ffi folder.
Now, on to what this actually does. Like Haskell, Agda has an IO monad. In fact, near as I can tell this isn’t a coincidence at all, Agda’s primitive IO seems to be a direct call to Haskell’s IO.
Unlike Haskell, Agda has two IO monads, a “raw” primitive one and a higher level pure one found in IO.agda. What few docs there are make it clear that you are not intended to write the “primitive IO”.
Instead, one writes in this higher level IO monad and then uses a function called run which converts everything to the primitive IO.
So one might ask: what exactly is this strange IO monad and how does it actually provide return and >>=? Well the docs don’t actually seem to exist so poking about the source reveals
    data IO {a} (A : Set a) : Set (suc a) where
      lift   : (m : Prim.IO A) → IO A
      return : (x : A) → IO A
      _>>=_  : {B : Set a} (m : ∞ (IO B)) (f : (x : B) → ∞ (IO A)) → IO A
      _>>_   : {B : Set a} (m₁ : ∞ (IO B)) (m₂ : ∞ (IO A)) → IO A
Wow.. I don’t know about you, but this was a bit different than I was expecting.
So this actually just forms a syntax tree! There’s something quite special about this tree though, those ∞ annotations mean that it’s a “coinductive” tree. So we can construct infinite IO tree. Otherwise it’s just a normal tree.
Right below that in the source is the definition of run
    {-# NO_TERMINATION_CHECK #-}
    run : ∀ {a} {A : Set a} → IO A → Prim.IO A
    run (lift m)   = m
    run (return x) = Prim.return x
    run (m  >>= f) = Prim._>>=_ (run (♭ m )) λ x → run (♭ (f x))
    run (m₁ >> m₂) = Prim._>>=_ (run (♭ m₁)) λ _ → run (♭ m₂)
So here’s where the evilness comes in! We can loop forever transforming our IO into a Prim.IO.
Now I had never used Agda’s coinductive features before and if you haven’t either than they’re not terribly complicated.
∞ is a prefix operator that stands for a “coinductive computation” which is roughly a thunk. ♯ is a prefix operator that delays a computation and ♭ forces it.
There are reasonably complex rules that govern what qualifies as a “safe” way to force things. Guarded recursion seems to always work though. So we can write something like
    open import Coinduction
    open import Data.Unit

    data Cothingy (A : Set) : Set where
      conil  : Cothingy A
      coCons : A → ∞ (Cothingy A) → Cothingy A

    lotsa-units : Cothingy ⊤
    lotsa-units = coCons tt (♯ lotsa-units)
Now using ♯ we can actually construct programs with infinite output.
    forever : IO ⊤
    forever = ♯ putStrLn "Hi" >> ♯ forever

    main = run forever
This when run will output “Hi” forever. This is actually quite pleasant when you think about it! You can view you’re resulting computation as a normal, first class data structure and then reify it to actual computations with run.
So with all of this figured out, I wanted to write a simple program in Agda just to make sure that I got it all.
FizzBuzz
I decided to write the fizz-buzz program. For those unfamiliar, the specification of the program is

For each of the numbers 0 to 100, if the number is divisible by 3 print fizz, if it’s divisible by 5 print buzz, if it’s divisible by both print fizzbuzz. Otherwise just print the number.

This program is pretty straightforward. First, the laundry list of imports
    module fizzbuzz where

    import Data.Nat        as N
    import Data.Nat.DivMod as N
    import Data.Nat.Show   as N
    import Data.Bool       as B
    import Data.Fin        as F
    import Data.Unit       as U
    import Data.String     as S
    open import Data.Product using (_,_ ; _×_)
    open import IO
    open import Coinduction
    open import Relation.Nullary
    open import Function
This seems to be the downside of finely grained modules.. Tons and tons of imports.
Now we need a function which takes to ℕs and returns true if the first mod the second is zero.
    congruent : N.ℕ → N.ℕ → B.Bool
    congruent n N.zero    = B.false
    congruent n (N.suc m) with N._≟_ 0 $ F.toℕ (N._mod_ n (N.suc m) {U.tt})
    ... | yes _ = B.true
    ... | no  _ = B.false
Now from here we can combine this into the actual worker for the program

    _and_ : {A B : Set} → A → B → A × B
    _and_ = _,_

    fizzbuzz : N.ℕ → S.String
    fizzbuzz N.zero    = "fizzbuzz"
    fizzbuzz n with congruent n 3 and congruent n 5
    ... | B.true  , B.true   = "fizzbuzz"
    ... | B.true  , B.false  = "fizz"
    ... | B.false , B.true   = "buzz"
    ... | B.false , B.false  = N.show n
Now all that’s left is the IO glue
    worker : N.ℕ → IO U.⊤
    worker N.zero    = putStrLn $ fizzbuzz N.zero
    worker (N.suc n) = ♯ worker n >> ♯ putStrLn (fizzbuzz $ N.suc n)

    main = run $ worker 100
There. A somewhat real, IO based program written in Agda. It only took me 8 months to figure out how to write it :)

          
          
          comments powered by Disqus



Teaching Python with a Raspberry Pi
Danny Gratzer — Mon, 23 Jun 2014 00:00:00 UT

    Posted on June 23, 2014
    


    
    Tags: teaching
    


This last week I’ve been volunteering at a summer camp. This camp is aimed at kids ages 8 to 12 and teaches the basics of Python!
I wanted to write down some of my thoughts and experiences on the whole process.
The Curriculum
The curriculum for the camp was based around 3 key components

Python
Raspberry Pis
Minecraft

The camp was spread over 4 days, each 3 hours. Each day introduced more of Python with more sophisticated programs. Each program actually interacted with minecraft, building structures, modifying worlds, and doing cool visible things. We’ll talk later about how this was possible.
Going into the camp, the expected schedule was something like

Introduce the Pi, show how to run Python scripts from terminal
Introduce the basics of Python, mostly variables and conditionals
Apply these basics with the minecraft API
Introduce loops, apply this with a few more advanced programs

In hindsight, this curriculum was a tad bit unrealistic, but what curriculum isn’t.
The Staff
This was the first time the camp was run, so the staff was a little inexperienced.
I was the only person familiar with programming but had never taught young children before, and the two payed staff members were used to teaching basic science camps but had never taught anything CS-ish. This meant that a lot of this was a learning experience for us as much as the kids.
The Children
The camp was over-capacity with 14 children. None of them has ever programmed before per-se. But two had done some basic HTML layout and one 10 year old was quite familiar with unix after 2 years of running various Linux distributions (I was impressed).
The unfortunate fact was that since the camp was marketed as teaching with Minecraft, a lot of the kids just showed up to play Minecraft. This was anticipated but still a little saddening.
Day 1
On day 1, we get everyone set up with their own Pi, we also included

A cheap monitor
A very cheap mouse
A keyboard

Getting this all set up for 14 kids was a lot smoother than anticipated. The only hitch was the SD cards we’d purchased were a lot cheaper than anticipated so we burned through maybe 5 cards that we just couldn’t get a Pi to boot with.
We got everyone successfully to a desktop in about 30 minutes.
The Pis were running a custom operating system called Raspbian. This OS is very verbose during boot time and shows the entire log from booting up rather than just displaying an innocent little loading graphic.
Quite a few of the kids were curious about what was going on so we explained how little about how OS’s work. It was pretty awesome to see kids being interested in what steps a kernel went through.
Sadly I’m not a super knowledgeable person when it comes to OS’s. In light of this I’ve ordered a book or two on the subject, something that’s been on my todo list for a while now. I should be better prepared for questions next time.
Now once we got everyone up and running we had people order 2 programs, LXTerminal and Minecraft. This is when we had some fun trying to explain what exactly a terminal is.
I eventually started simply saying

LXTerminal is a program that let’s you run other programs. It’s like a text interface so that you can do what you normally do by clicking with typing.
Almost all Unix computers, like OS X and Raspbian, have the same way of entering stuff into terminals.

From here we had everyone run cd play. Luckily a group of volunteer engineers had sat down and written a bunch of programs to do various things in Minecraft. The first one everyone started with just built a grid of stone blocks.
We then started explaining how to run things with the python program. This turned out to be a bit more of a struggle than anticipated since typing and spelling are more difficult than anticipated.
We had a lot of people doing things like
 $ pyton grid.py
 $ pythongrid.py
 $ grid.py
 $ python grid.py # Finally
I really wish we had an overhead project to show everyone written examples on a teacher machine. This was a big problem as time went on, simply saying things out loud is not a sufficient method for communicating about programs.
Now, once this ran there was a satisfying “Whoooaaaa” when everyone saw that this command had modified the game right before their eyes!
Some people quickly started trying to use this to speed up their building by automatically creating walls for themselves rather than doing it by hand. This was exactly the response we were looking for and it was clear this was starting to spark some interest in programming.
Finally we had everyone open up IDLE. We used IDLE for all our editing purposes for exactly two reasons

It’s dead simple to use
It’s preinstalled

Everyone opened up grid.py and had a look at the source code. The code for grid.py was roughly
    import minecraft
    import block

    mc = minecraft.Minecraft.create() # Our connection to Minecraft

    def buildWall(size, type = block.STONE):
        pos = mc.player.getPos()
        pos.x += 3

        for x in range(size):
            for y in range(size):
                # Set block at these coordinates to type
                mc.setBlock(pos.x + x, pos.y + y, pos.z, type)

    if __name__ = "__main__":
        buildWall(5)
We get a pretty nice high level API to minecraft, and the code is quite simple. Keep in mind, we have taught exactly 0 python at this point.
Next we explained that we could change buildWall(5) to buildWall(6) and our program would make a bigger wall! Again an overhead was sorely missed at this point since it was very hard to explain exactly where we were talking about, even in such small code.
Most people than started modifying the code trying to build as big a wall as possible. This was also the point at which our first syntax errors started up.
Since I was the only person in the room who understood what they meant there was a fair bit of running around. I have to give a lot of credit to the two staff members who essentially learned the basics of Python syntax by me yelling it to them across the room!
grid.py also included some code to generate a grid with different blocks. This was another huge success since kids could try to spell different words in their grid of blocks. I’ve omitted it from the above snippet since frankly I don’t remember it.
This took up most of the first day, since everyone also got a 30 minute snack breaks (don’t you miss snack breaks?).
Day 2
The next day we were actually aiming to teach some programming! This had a script written already by the engineers who’d written the code we’d used yesterday, but upon consulting the script I found

Teach variables
Explain what a value is
Explain if’s
Questions

Uh oh. So I ended up writing a few notes down the night before, we didn’t have access to any sort of projector so a lot of my explanations consisted of scribbling on a giant (2’ by 3’) post it note.
This had distinctly mixed results. As I’d expected most kids couldn’t pick up the fundamentals of programming in an hour! This was OK though since the rest of the day was spent messing around with a simple program
    # chat.py
    import minecraft

    mc = minecraft.Minecraft.create()

    message = "Hello Chat"

    mc.putToChat(message)
And we used this to introduce the fundamentals of Python. For kids that were progressing faster, we challenged them to write more complicated programs like
    import minecraft

    mc = minecraft.Minecraft.create()

    message = ""

    if 1 + 1 < 2:
        message = "Hello"
    else:
        message = "Goodbye"

    mc.putToChat(message)
Not surprisingly, this was really hard to grasp for our kids. This was when the class started to fragment a bit, some kids were getting this and really doing awesome while some were having a harder time with all the new information.
If I had a chance to do this again, I’d definitely split the class into two groups, one for people who were up and running with basic concepts to build some programs together with one instructor. The other two could then stay and give one to one help slowly but surely. This would prevent us from leaving anyone behind.
In reality I’d say we had about 5 kids who were understanding what was going on and 8 who were lost. No one had yet given up on programming luckily, so we were still more or less OK.
Day 3
Going into this day I knew it wasn’t going to be easy

We were starting to lose a bit of interest since it’s getting later in the week
Some kids were falling behind others

with this in mind, we went about introducing a few new prewritten programs that built cubes! I’ll leave it to your imagination how this worked, it’s pretty similar to grid.py.
For the kids who were really clicking, I challenged some of them to explain parts of the code to me. In this context I taught a few kids about for loops. It’s a bit tricky to explain how they work since I didn’t want to explain what an iterable was. Remember, we hadn’t talked about any OO aspects of Python.
I introduced them to loops as something to the effect of
    for VAR in range(NUMBER):
        STMT
        STMT
        ...
With the explanation that

A loop means we run that list of statements once for each number between 0 and NUMBER - 1 with VAR first being 0, then 1, then 2 and so on.

This seemed to click with most of them so quite a few got the hang of how loops worked.
I’d actually prefer I’d built some sort of abstraction like
    def allPairs(*dims):
        ...
which returned an iterable (generator?) that had a list of all pairs possible within the given set of numbers. This would eliminate the need to talk about nested loops, which were a confusing subject for most people.
The tricky bit is that while I was hopping from person to person, the slower moving campers where playing with cube.py all on their own and not trying to understand the whole thing but still use it.
This worked surprisingly well actually, we were challenging kids to think about how to combine grid.py and cube.py to build things without ever laying a block by hand. Sadly a few kids just abandoned the effort and started playing Minecraft. This was not unexpected but still a little sad.
To keep things going, I wrote a little program which built a cube where the inside was filled with one thing and the outside was another. This meant that kids could build an upside down volcano or a waterfall.
Unfortunately, to get this to all the kids we had to hand write it on giant post-it notes and they had to manually type it. This is another case where we desperately needed a projector.
So the third day wasn’t nearly as structured as day 2, it was really a day when kids experimented and we tried to push kids individually. This actually seemed to be a great help since a few more kids had some breakthroughs on day 2 materials.
Day 4
Now, on the final day we opted to try something a little different.
We first tried networking the Raspberry Pis since kids had been asking to do this since day 1. Despite being able to get this working in prep time, we had some technical issues that prevented us from getting it working during the actual camp, very frustrating.
After the kids snack break, we went into a different room with no computers and put up a post it with the title “Steps for Writing Code”

Define Our Problem
Brainstorm Solutions
Compare Solutions and Choose One
Implement Solution
Test Implementation

Now experts will notice the missing step 5.5, “swear profusely while implementation doesn’t work”. We will of course include this in a second level camp for teaching programming :)
Now I told them that their goal was to create a program which built a “sphere”. I put the quotes there since minecraft is built from blocks and doesn’t have a smooth sphere but you can get pretty close with bigger and bigger spheres.
So we went on to step 1. and everyone struggled to define what exactly a sphere was and how one ought to decide what “build it” meant.
We eventually settled on our problem being to build a sphere where

A sphere is a collection of all blocks within a certain distance, D, from the center
To “build” a sphere meant we’d place the center 3 + D blocks in front of us and we’d color all blocks in our sphere to stone.

Next came the lively discussion on how to actually go about doing this.
After about 5 minutes, we had a lot of hand-wavy solutions but not actual concrete procedure for doing this so I tossed out a hint.
I stated that if someone needed a procedure for finding the space between two blocks, I will implement a function dist so that
dist(x, y)
would return the distance between the x block and y block in three dimensions.
Now the solutions got a lot closer, people started listing steps of what to do. I encouraged them to treat me like the computer and give me directions. I would then walk around and “color” carpet squares. This seemed to demonstrate which solutions weren’t quite precise enough.
Eventually, we ended on a simple solution

Figure out the center by adding D + 3 to the current position. For each square in the grid S, if dist(S, Center) < D, color S

Very simple, very inefficient but correct. I then started talking about pseudo-code and turning this into a more executable form.
The kids who understood loops jumped in and we ended with something like
    pos = mc.player.getPos()
    pos.x += 3
    for square in fullWorld():
        if dist(square, pos) <= D:
            mc.setBlock(square.x, square.y, square.z, block.STONE)
I let them off the hook here and wrote the rest of the code for them while they took a brief break.
We then adjourned into the computer room and got started testing! We had just enough time to have everyone gather round while we built a Death Star on the teachers machine (I was the fastest typist).
Quite a few of the kids where interested in buying their own Pis and continuing on their own so we gave everyone their SD cards and directions on how to acquire a Raspberry Pi. I also gave out my emails to a few of the kids who wanted to make sure they had someone answer questions when they were setting up their Pis.
Recap
So dear reader, where are we left?
Well the place that ran this camp is running more. I’m not sure if they’re full, but if you’re a parent or interested kid, please email me at [jozefg AT cmu.edu].
If you’re thinking that you want to run one of these camps yourself, do it! I only have 4 pieces of advice

Error on being concise and simple rather than comprehensive
You’re not going to teach someone to program in 4 days. You can however, make someone hate programming forever in 4 days! If they kids want more information, they’ll ask.
I guarantee that you’ll end up flooding the kids with too much information if you try to be comprehensive.
Always run this with more than one adult present
Otherwise you’ll end up spending the whole camp chasing after kids to fix issues and everyone else will be bored.
It’s always good to have more than one adult who knows Python too! You can do it with just one I’ve discovered. It is less than ideal however.
Have a good space, with a projector!
Projectors are great. So great that I’m very seriously considering buying one for the next 2 iterations of this camp.
Inspire kids to want to learn more!
That’s the whole point! You’ll never teach anything if you’re fighting the kids. Make this fun and don’t sweat it if you feel like you’re not covering as much material as you’d like. This isn’t a class, there’s no exam at the end, it’s supposed to be fun!

If anyone has any more specific questions on this camp, please comment below and I’ll respond as soon as I can.

          
          
          comments powered by Disqus



Grokking recursion-schemes: Part 2
Danny Gratzer — Sat, 14 Jun 2014 00:00:00 UT

    Posted on June 14, 2014
    


    
    Tags: haskell
    


In this post I’d like to talk about the second half of recusion-schemes. Previously we’d talked about catamorphisms and friends. These all focused on “destroying” a datastructure by collapsing it layer by layer.
We’re now going to talk about the opposite: anamorphisms. Anamorphisms are just like generalized versions of unfoldr.
Getting Anamorphisms
To demonstrate how to start anamorphisms, we’ll create our custom list again.
    {-# LANGUAGE DeriveFunctor #-}
    data MyList a  = MyCons a (MyList a) | MyNil
    data ListB a b = BCons a b           | BNil
                   deriving Functor
Now we create an instance of the type class Unfoldable (shocker I know)
    type instance Base (MyList a) = BList a

    instance Unfoldable (MyList a) where
      embed (BCons a b) = MyCons a b
      embed BNil        = MyNil
That’s it! We define the dual to Foldable’s project, embed. This just defines how to take the datastructure that we’ve built up and stick it back into our list.
Using Anamorphisms
Now, let’s actually start writing some anamorphisms. The simplest example of an unfolding I can think of is between. between takes two boundaries and then creates a list of values between the high and the low, (low, high).
    > enum 1 5
      [2, 3, 4]
    > enum 'a' 'c'
      "b"
    > enum False False
      [False]
To make this more fun, we’ll return MyList a instead of just [a] since it’ll make it easier to show off recursion-schemes. I’ll explain how to generate [a]’s momentarily.
Now it’s pretty obvious the type of between should be something like
    between :: (Eq a, Enum a) => a -> a -> MyList a
We could write this with simple, boring recursion
    between a b | a == b    = MyNil
                | otherwise = (succ a) `MyCons` enum (succ a) b
But this is exactly what we were avoiding! Let’s rewrite this to use an anamorphism. The type of ana (our anamorphism implementation) is
    ana :: (a -> Base t a) -> a -> t
This is the almost the exact opposite of cata :: (Base t a -> a) -> t -> a. So instead of tearing the structure down layer by layer, we build it up layer by layer.
    between low high = ana builder low
      where builder a = ???
where builder is takes an a and returns the either BCons (succ a) (succ a) or BNil if a == high. This is trivial to implement
    between low high = ana builder low
      where builder a | a == b    = BNil
                      | otherwise = join BCons (succ a) -- from Control.Monad
That’s it! builder captures the essence of how we build up the list, one cons at a time.
Now, as promised here’s how to actually implement it so it returns [a]’s.
    between low high = ana builder low
      where builder a | a == b    = Nil
                      | otherwise = join Cons (succ a)
recursion-schemes defines the type instance for [a] with two constructor Cons and Nil that behave precisely like BCons and BNil. However, Cons and Nil are defined using some type families magic that makes them invisible in the documentation (I found them by reading the source). They exist I promise :)
Now, I said before this was just a generalized version of unfoldr, let’s look at the type of unfoldr.
    Data.List.unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
So unfoldr takes our seed value, b, and splits it into either a value and another seed, or nothing. Sound familiar? Look again at Cons, Cons is a value a, and the next seed b! Furthermore Nil is completely ismorphic to Nothing here.
Now ana generalizes upon unfoldr since we don’t need to represent everything as either 1 terminator, Nothing, or one builder, (a, b).
We could imagine something like
    data RedBlack a = Red   a (RedBlack a) (RedBlack a)
                    | Black a (RedBlack a) (RedBlack a)
                    | Leaf
Now ana could handle the fact that we can now “build” new seeds in two ways, with RedB or BlackB!
Building Stuff Up to Tear It Down
One of the most common patterns in Haskell is to create some intermediate data structure and immediately use it.
This is kinda like smashing an anamorphism and a catamorphism together into one. This has a name: a hylomorphism, hylo in recursion-schemes.
It turns out that this is one of the most useful applications of anamorphisms!
As a fun example, Daniel Wagner blogged about how we can generate an infinite list of all rational numbers. The key to this is an infinite binary tree where each node is a rational number p/q and it’s two children are (p + q) / q and p / (p + q).
We can build this binary tree with ana.
    import GHC.Real

    data Bin a    = Node a (Bin a) (Bin a)
    data BBin a b = NodeB a b b deriving Functor

    type instance Base (Bin a) = BBin a

    instance Unfoldable (Bin a) where
      embed (NodeB a l r) = Node a l r
    instance Foldable (Bin a) where


    rats :: Bin Rational
    rats = ana builder (1 % 1)
      where builder r@(p :% q) = NodeB r ((p + q) % q) (p % (p + q))
We can collapse it into a list with cata
    collapse :: Bin a -> [a]
    collapse = cata folder
     where folder (NodeB a l r)         = a : interleave l r
           interleave (x : xs) (y : ys) = x : y : interleave xs ys
The work horse here is interleave which just describes how to safely combine two infinite lists.
Now we can combine the process of building up our binary tree and generating a list into one cool transformation
    allRats :: [Rational]
    allRats = hylo folder builder (1 % 1)
      where folder (NodeB a l r)         = a : interleave l r
            interleave (x : xs) (y : ys) = x : y : interleave xs ys
            builder r@(p :% q)           = NodeB r ((p + q) % q) (p % (p + q))
There you are! As a challenge to the reader, figure out what index a number p/q will appear in this list (it will only occur once).
If you found this math intersting, check out this paper.
A few other people have shown off this pattern, one of my favorites being merge sort as a hylomorphism.
A Recap
We’ve now covered the core elements of the recursion-schemes library, but I’m not quite done with this blog series. I’m planning on one more post detailing my attempt to actually use recursion-schemes in a real project: a scheme compiler.
I think it would make the post more interesting though if the next post didn’t just include an example of “stuff I find cool”, so, if you have any particular example of cleaning up some code using recursion-schemes, please let me know! I’d love to share any and all examples I can find since that’s been the best way I’ve found to actually grok recurion-schemes.
If you’re interested in sharing, either comment or email me at jozefg [at] cmu.edu.
Thanks to tel for proof reading

          
          
          comments powered by Disqus



Overview of A Scheme Compiler
Danny Gratzer — Sat, 07 Jun 2014 00:00:00 UT

    Posted on June  7, 2014
    


    
    Tags: haskell, compilers
    


For the last few months I’ve been spending a fair amount of time on a fun little Scheme to C compiler, c_of_scheme.
In this post I’ll outline the high level overview of c_of_scheme and in future posts detail the specifics of each component.
Modules
c_of_scheme is divided into 11 modules: 2 utility modules, 6 modules which each handle one step of compilation, a module with definitions of ASTs, a driver, and of course Main.
First, let’s discuss the utility modules, Utils.Gen and Utils.Error. Gen defines a monad, Gen. This is used to generate unique integers to be used as identifiers. For example:
    data Var = Name String | Gen Integer

    genVar :: Gen Var
    genVar = Gen <$> gen
Other stages of the compiler (continuation passing style, closure conversion, and lambda lifting) need lots of temporaries so this is used throughout the compiler.
Gen also comes with a monad transformer that implements a handful of useful MTL type classes. Overall, nothing too stunning.
The other uninteresting utility module is Error, this is just a wrapper around Either with a few functions for throwing errors and good pretty printing of errors. This is used internally to signal a major internal error.
The precise interface is given by a set of functions failRW, failCPS, failClos, etc., which correspond to each stage of compilation. These generate lovely pretty printed error messages for each stage. This will become clearer as we go over each phase individually and it’s clear what needs to signal failure.
A module that’s worth mentioning that’s not a compilation stage but not quite a utility module is AST. This defines the various abstract syntax trees and primops for our representation of Scheme. This also defines the compiler monad, which combines our error monad with Gen and some other bits and bobs useful for our compiler. More on AST in future posts.
Stages of Compilation
Now let’s actually go over the individual phases of compilation.
Parsing (Parser.hs)
This is the least interesting phase of compilation.. I personally just dislike parsing so I don’t have much to say about this.
A legal Scheme program is a list of definitions, we don’t currently allow top level expressions. We also don’t currently support the usual define sugar for functions.
The parser uses Parsec because I just happen to know the Parsec API, ironically because of this. If anyone cares enough to write a proper lexer and/or parser or something, I’m more than happy to help!
Rewrite Top Levels (RewriteTopLevel.hs)
This phase is a little peculiar. It exists because we’re targeting C and C has a fairly annoying restriction on what it allows top levels to initialized to.
In C, we can’t write something like
    int c = 1 + 1 + 1;
but in our dialect of Scheme, this is the only way to write interesting computations! This phase of compilation rewrites top levels (Shocking!) to match the C definition of top levels.
This is done by changing each definition to an Init, this will later turn into a C declaration without initialization. Next we create a new function, our main function, that is a series of assignments which pair each top level definition to its initializer.
For example
    (define foo 1)
    (define bar 2)
    (define quux (+ foo bar))

    (define _ (display quux))
will become
    (init foo)
    (init bar)
    (init quux)
    (init _)

    (define magical-main
       (lambda ()
          (set! foo 1)
          (set! bar 1)
          (set! quux (+ foo bar))
          (set! _ (display quux))))
where magical-main will be the first thing called in the generated code.
A caveat, we turn (define foo (lambda (..) ...)) into something different since it’s more efficient to directly convert these to functions.
Continuations Passing Style Conversion (CPS.hs)
This is the first interesting bit of compilation, CPS is a style where each function call is a tail call. Here’s an example non-CPS code converted to CPS.
    (define foo
       (lambda (y)
          (+ 1 y)))

     (define foo-cps
       (lambda (cont x y)
         ((lambda (+')
            ((lambda (one)
               ((lambda (x')
                  ((lambda (result)
                     (cont result))
                   (+' one x')))
                x))
             1))
          +)))
Notice how with the CPS’ed version we’ve actually made evaluation order explicit and have removed non-primitive expressions.
CPS.hs converts the AST to use CPS. We’ll detail this process later but for now I’ll mention one more interesting tidbit.
CPS.hs is also where we implement call/cc! In fact it’s trivial to do. All we do as add the declaration for
    (define call/cc
       (lambda (c f)
          (f c
             (lambda (ignored x) (c x)))))
Optimizations (OptimizeCPS.hs)
This module implements the simple optimizations we perform. For now this is limited to simple inlining and constant folding, but this should improve in the future.
These optimizations are implemented quite pleasantly with recursion schemes.
Closure Conversion + Lambda Lifting (ClosureConvert.hs)
This is the most difficult phase of compilation, for me anyways. In concept it’s quite simple though.
The idea is that we take the implicit closure “argument” that all scheme procedures take and make it explicit. To this end we add three new primops, NewClos, ReadClos, and WriteClos. These do much what you would expect and let us treat closures opaquely as first class values.
Next we change each procedure to take an extra argument, its closure, and change closed over variables to be selected from this closure. Finally we change each lambda to be paired with its closure when constructed.
This sounded pretty feasible to me on paper, but in practice it seems to be the greatest source of bugs in c_of_scheme. It finally seems to work nicely now so I’ll be sure to blog about it soon.
Code Generation (CodeGen.hs)
This is the final stop in our compilation pipeline - we generate C code.
To do this we use one of my libraries. This is actually quite a simple step in the compiler since closure-converted, CPS-ed code is quite close to C.
Some of the details that code generation handles:

Interfacing to the runtime system
Generating the main method
Generating declarations for all the variables used in our intermediate language
Mapping the Scheme variables to appropriate C names

While this might sound daunting, this isn’t actually so bad.
Driver (Driver.hs)
While I might not write a post on it, Driver is my personal favorite module. It glues together all of the previous compilation phases and provides a bunch of nice high level functions like compileScheme.
The reason I like it so much is that all the code in it is a very nice, clean example of composing components as good old functions.
If you’re looking to understand c_of_scheme’s particular implementation, I’d urge you to start with Driver. It’ll provide a bit of an intuition from what goes to where.
The Runtime System
Currently c_of_scheme has an incredibly naive runtime system. Mostly because it’s being written by an incredibly naive C programmer (hi!).
I already wrote about the most interesting bit of the RTS: tail calls.
I plan on talking a bit about the RTS in the context of code generation (since it’d be impossible not to), and perhaps a post on c_of_scheme’s simple little mark and sweep GC.
Wrap Up
So that’s the high level overview of c_of_scheme, I think the compiler is best exemplified by one particular function in Driver.hs:
    compileScheme :: [SDec UserPrim] -> Compiler [CExtDecl]
    compileScheme = addPrimops >=> makeMain >=> cpsify >=> optimizeCPS >=> closConvert >=> codegen
      where addPrimops = return . (++prims)
This chains together all the phases of compilation into one big old function from the Scheme AST to the C one.
Now, if you’re really interested in c_of_scheme, go ahead and grab the source with
hg clone ssh://hg@bitbucket.org/jozefg/c_of_scheme
I do use mercurial so you can also grab a zip from bitbucket if you’re unwilling to use mercurial for one command :)
I should have posts about each specific phase of compilation up in Real Soon Now. I’ll edit with a list of links to posts below as they are written.
Thanks to @tylerholien for proofreading

          
          
          comments powered by Disqus



Grokking recursion-scheme: Part 1
Danny Gratzer — Mon, 19 May 2014 00:00:00 UT

    Posted on May 19, 2014
    


    
    Tags: haskell
    


This post is a little different than the rest of my blog, I’m not nearly as competent with recursion-schemes as I want to be and I don’t understand them fully (yet). This isn’t entirely complete, but I hope it will provide a useful intuition for how to work with some of the lower ends of recursion-schemes and some idea of how to get into the higher end. I’ll be reading this again in two weeks once I’ve forgotten all of this (again). You’ve been warned…
Why Bother?
First, let’s talk about why anyone would care about using a library like recursion-schemes.
Remember back in the good old days when all a programmer was goto and guts? And everyone hated it? We’re at a not dissimilar place in Haskell. Well, it’s not nearly so bad nowadays, however, our principle form of control flow is recursion and really we mostly use recursion in a raw, unprincipled way.
However, we’re starting to move away from it. Do these look familiar?
    foldr :: (a -> b -> b) -> b -> [a] -> b
    foldr f nil (x : xs) = x `f` foldr f nil xs
    foldr f nil []       = nil
foldr is all about abstracting away raw recursion! foldr is great in this way since it covers a surprisingly large cover of cases
    map :: (a -> b) -> [a] -> [b]
    map f = foldr ((:) . f) []

    filter :: (a -> Bool) -> [a] -> [a]
    filter p = foldr (\x rest -> if p x then x : rest else rest) []
Turns out you can implement quite a lot of Data.List with foldr and poor judgment.
However, this isn’t good enough. For example, I do a lot of work with compilers and therefore spend a lot of time doing transformations on trees. I want something like foldr to deal with this.
recursion-schemes is one such option. It’s a way of generalizing these uniform transformations on structures and it’s expanded to cover a lot transformations.
On to recursion-schemes
Now that we know that recursion-schemes is solving a useful problem, let’s get into actually using it. First, we can install it off of hackage
cabal install recursion-schemes
And import everything with
    {-# LANGUAGE TypeFamilies, DeriveFunctor #-}
    import Data.Functor.Foldable
Let’s get started by seeing how recursion-schemes covers foldr
First, we define our own custom list
    data MyList a = MyCons a (MyList a) | MyNil
Next, we define another type of list, with the recursion factored out
    data BList a b = BCons a b | BNil
         deriving Functor
Here b is the recursive bit of BList factored out into an explicit parameter. So
    MyList a ~ BList a (BList a (BList a ....))
The fancy term for this would be to say that List a is the “fixed point” for BList a.
Now we can actually use recursion-schemes
    type instance Base (List a) = BList a

    instance Foldable (List a) where
      project (MyCons a b) = BCons a b
      project MyNil        = BNil
And we’re done. So to understand what’s going on we need to talk about another data type and a little math.
    newtype Fix f a = Fix {unFix :: f (Fix f a)}
Remember before how I mentioned how MyList is the fixed point of BList? Well Fix let’s us exploit this fact. In particular
    out :: Fix (BList a) -> MyList a
    out (Fix (BCons a rest)) = MyCons a (out rest)

    into :: MyList a -> Fix (BList a)
    into (MyCons a rest) = Fix (BCons a $ into rest)
So we could write either BList or MyList for all our data types, but the BList version is really a pain to write since everything is wrapped in Fix. Unfortunately though, it’s much easier to write generic code for stuff of the form Fix (f a).
To solve this recursion-schemes has the type class Base where we map the recursive data type to its non-recursive friend. Then, in project we define how to.. well.. project the recursive into a partially unfolded equivalent.
With just those two steps, we get a large chunk of recursion-schemes operations for our data type!
Just What Did We Get?
Now this was the part I really had trouble with in recursion-schemes the names of the functions for Foldable are… opaque if you’re not familiar with the terminology.
The most basic one is cata, which is the “catamorphism” across our data type. I’m not going to trouble you with why we call it a catamorphism, but just remember that it’s the souped-up version of foldr.
    foldr :: (a -> b -> b)          -> b -> [a]    -> b
    foldr :: ((a, b) -> b)          -> b -> [a]    -> b
    cata  :: (Fix BList a -> b)     -> List a      -> b
    cata  :: (Base (List a) a -> b) -> List a      -> b
    cata  :: (Base t b -> b)        -> t           -> b
And we can use it the same way!
    map :: (a -> b) -> List a -> List b
    map f = cata mapper
      where mapper (BCons a b) = f a `MyCons` b
            mapper BNil        = MyNil


    myfilter :: (a -> Bool) -> List a -> List a
    myfilter p = cata filterer
      where filterer (BCons a b) = if p a then a `MyCons` b else b
            filterer BNil        = MyNil
Now we can all tell people that we’ve written map using a catamorphism.
Careful readers will notice one big difference between foldr and cata: cata doesn’t take a seed! Indeed with foldr we replace all the constructors of our list with the function f, so
    1 : 2 : 3 : 4 : []
    1 `f` 2 `f` 3 `f` 4 `f` seed
This doesn’t generalize well though, what if we have a type with a constructor of 3 arguments? Or 5? To avoid this problem, recursion-schemes takes a clever approach.
Remember that BList factors out recursion? cata works by collapsing a sublist recursively and sticking the slot back into the slot of the original list. So we actually have something like
    BCons 1 (BCons 2 (BCons 3 (BCons 4 BNil)))
    BCons 1 (f (BCons 2 (f (BCons 3 (f (BCons 4 (f Nil)))))))
Now f has to handle all possible cases of our constructor, so it handles both the seed value and the collapsing case! And this generalizing beautifully by just delegating all the constructor specific work to f this is how it’s possible to derive cata practically for free.
Now, since recursion-schemes already has an instance for [a], I’ll dispense with MyList since it’s a bit clunky.
Our foldable instance gives us quite a bit more than just foldr however! We also get this function para, short for “paramorphisms”. A paramorphism is like a fold, but also gives a “snapshot” of the structure at the point we’re folding. So if we wanted to sum each tail of a list, we could do something like
    sumTails :: Num a => [a] -> [a]
    sumTails = para summer
      where summer (Cons a (list, rest)) = a + sum list : rest
            summer Nil                   = []
This could be useful for example, if you’re doing any context dependent operations on a structure. Later, I’ll try to include some more practical examples of a paramorphism (I never thought I’d say those words).
Now recursion-schemes includes generalized versions of all of these but I’m not brave enough to try to explain them right now.
A Real Example
Before we wrap this post up, let’s demonstrate an actual useful example of recursion-schemes.
We’re going to implement trivial constant folding in a made up language I’ll call Foo.
The AST for Foo is something like
    data Op = Plus | Sub | Mult | Div

    data Foo = Num Int           -- Numeric literals
             | String String     -- String literals
             | Binop Op Foo Foo  -- Primitive operation
             | Fun String Foo    -- Lambda/Abstraction over terms
             | App Foo Foo       -- Application
             | Var String        -- Variables
             deriving Show
Now we want our trivial constant folding to reduce something like Binop Plus (Num 1) (Num 2) to just Num 3. Let’s first formalize this by writing a quick little reducer
    compute :: Op -> Int -> Int -> Int
    compute Plus = (+)
    compute Sub  = (-)
    compute Mult = (*)
    compute Div  = div

    reduce :: Foo -> Foo
    reduce (Binop op (Num a) (Num b)) = Num $ compute op a b -- The reduction
    reduce a                          = a
So we compute all constant expressions and leave everything else alone. This is pretty simple, but how can we apply it to every element in our AST? Well, time to break out recursion-schemes
    data FooB a = NumB Int
                | StringB String
                | BinopB Op a a
                | FunB String a
                | App a a
                | Var String
    type instance Base Foo = FooB

    instance Foldable Foo where
      project (Num a)        = NumB a
      project (String a)     = StringB a
      project (Binop op a b) = BinopB op a b
      project (Fun v a)      = FunB v a
      project (App a b)      = AppB a b
      project (Var a)        = VarB a

    -- reverse of project
    rProject :: Base Foo Foo -> Foo
    rProject (NumB a)        = Num a
    rProject (StringB a)     = String a
    rProject (BinopB op a b) = Binop op a b
    rProject (FunB v a)      = Fun v a
    rProject (AppB a b)      = App a b
    rProject (VarB a)        = Var a
And let’s rewrite reduce to use FooB instead of Foo
    reduce :: Base Foo Foo -> Foo
    reduce (Fix (BinopB op (Num a) (Num b))) = Num $ compute op a b -- The reduction
    reduce a                                 = rProject a
So this entire traversal now just becomes
    constFold :: Foo -> Foo
    constFold = cata reduce
Now we can test our simple optimization
    test = Binop Plus (Num 1) (Binop Mult (Num 2) (Num 3))
    optimized = constFold test
    main = print optimized
As we’d hope, this prints out Num 7!
This seems like a lot of work but don’t forget, now that we’ve taught recursion-schemes how to do traversals, we get all of this for free. For example, let’s now write a function to grab all the free variables of an expression.
As before, let’s start by writing the simple worker function for this traversal.
     freeVar :: Base Foo [String] -> [String]
     freeVar (NumB _)         = []
     freeVar (StringB _)      = []
     freeVar (VarB s)         = [s]
     freeVar (BinopB _ v1 v2) = v1 ++ v2
     freeVar (AppB v1 v2)     = v1 ++ v2
     freeVar (FunB v vs)      = delete v vs
Now the full traversal is trivial!
    freeIn :: Foo -> [String]
    freeIn = cata freeVars
As we’d hope, this traversal is much easier to write than the first one. You can imagine that the boilerplate of writing FooB and project is amortized over each traversal, making it much easier to write subsequent traversals once we’ve gone through the trouble of actually laying down the foundation.
What’s Next?
So far I’ve discussed part of the Foldable half of the recursion-schemes library. In my next post I’ll cover anamorphisms and Unfoldable, the dual of what we’ve talked about here.

          
          
          comments powered by Disqus



Getting Proper Tail Calls Out of C
Danny Gratzer — Mon, 05 May 2014 00:00:00 UT

    Posted on May  5, 2014
    


    
    Tags: c, compilers
    


While I don’t exactly love writing C, it has a lot to offer as a compilation target. Its got lots of smart compilers that can target just about every platform I’ve ever heard of and tons of others, its got the ability to mess with low level aspects of itself, and C’s got some nice high level abstractions like functions.
One big issue I have with it as a target is that its function calls suck. I’m usually compiling a functional language where tail call optimization is imperative (heh) and C makes this a lot harder than it should.
This post illustrates how I currently beat C into actually generating proper tail calls. Most of the code from this post is straight from c_of_scheme. If you’re having trouble understanding some function than there may actually be documentation for it in the source :)
The first step involves something called continuation passing style. The idea here is that reify the implicit “continuation” for each expression to an explicitly function.
So we’ll turn something like
    (+ 1 (* 2 2))
Into something like
    (lambda (k)
       ((lambda (mult-result)
          (k (+ 1 mult-result)))
         (* 2 2)))
Notice how now the order of evaluation is completely determined by how we pass things around? We pass each result along the chain of continuations and every non-primitive function becomes a tail call.
This has one more very important effect, none of these function calls will return. We’re going to pass control off to each continuation and the very last function will exit the program. This means that as soon as we call a continuation, we can nuke the stack and every function call has become identical to calling a continuation.
C actually has a similar notion to this and when we run this code through closure conversion and lambda lifting (a subject that’s worth of its own rant) we’ll end up with functions that look something like
    void _gen1(scm_t arg, scm_t cont, ...){
       scm_apply(cont, scm_add(scm_t arg, 1));
    }
It’s worth a mention that scm_apply will unwrap the continuation and actually apply it since it’s just a normal function. We know that the call to scm_apply will never return. We can tell C this with __attribute__((noreturn)). Theoretically this also enables the use of something much like TCO: once the last function is called, we can reuse the stack frame of _gen1 and if the function actually returns despite our promises simply segfault (hooray for C).
Unfortunately, GCC doesn’t seem to do this on its own in my case. So I cried for a little bit and offered it many flags in the hopes that it would be merciful and just do it for me but it didn’t. And now I can actually illustrate how I did this manually.
It turns out this is possible to do with only a tiny impact on the generated code from the compiler and a bit of monkeying with scm_apply. First, I’ll explain how scm_apply looks normally.
    void scm_apply(int i, scm_t f, ...) {
      int x;
      va_list va;
      scm_t *arg_list = malloc(sizeof(scm_t) * i + 1);
      va_start(va, f);
      for(x = 1; x < i+1; ++x){
        arg_list[x] = va_arg(va, scm_t);
      }
      if(f->state != 4){
        printf("Attempted to apply nonfunction\n");
        exit(1);
      } else {
        arg_list[0] = f->val.scm_lam.clos;
        f->val.scm_lam.fun(arg_list);
      }
    }
Note that scm_t is a pointer to a discriminated union in C to fake the dynamic types found in Scheme.
So the first bit is just the varargs goo to extract the arguments given to scm_apply. Once we have all of those in an array, we look at the state field of f, our function. If it’s not 4, then we don’t really have a function so we complain loudly and exit. Otherwise we just get the actual function pointer out of f and call it.
This is a little tricky to read if you’re not familiar with the DU’s in C, but there’s nothing exactly earth shattering in there.
Now, since every function call is going through scm_apply, we add a global ticker to count how many function calls have gone through there
    static int stack_frames;
    ...
    void scm_apply(int i, scm_t f, ...) {
        ...
        else {
            ++stack_frames;
            ....
        }
    }
Now we know just how quickly we’re burning through the available stack space.
Next we need to add a special case of scm_apply which we’ll call scm_init. It looks like this
    void scm_init(lam_t f){
       stack_frames = 0;
       scm_apply(0, mkLam(scm_top_clos, f)); // Call main
    }
All this does is initialize stack_frames and call scm_apply. We can modify the codegen so that the main function is passed to scm_init. We know that this main function will take no arguments in c_of_scheme for reasons that aren’t entirely relevant to this post.
OK, so now is the magic and like all good C magic, it starts by including setjmp.
    #include 
Now we add 3 more global variables (please don’t hate me)
    static scm_t  current_fun;
    static scm_t* current_args;
    static jmp_buf env;
Now we modify scm_apply so that if we’re at a depth of 100 function calls or more we stick the current function and arguments into these global variables and longjmp with env!
Now we need a good place to longjmp to, the place where env points to. This is what scm_init, we know that it’s called almost immediately so it’s relatively “low” on the stack. So scm_init now becomes
    void scm_init(lam_t f){
      stack_frames = 0;

      if(setjmp(env)){
         stack_frames = 0;
         current_fun->val.scm_lam.fun(current_args);
      }
      scm_apply(0, mkLam(scm_top_clos, f)); // Call main
    }
Notice that we do know error checking and just go straight into calling the next function after a longjmp. In order to set up current_fun and current_args correctly scm_apply must be modified
    void scm_apply(int i, scm_t f, ...) {
      int x;
      va_list va;
      scm_t *arg_list = malloc(sizeof(scm_t) * i + 1);
      va_start(va, f);
      for(x = 1; x < i+1; ++x){
        arg_list[x] = va_arg(va, scm_t);
      }
      if(f->state != 4){
        printf("Attempted to apply nonfunction\n");
        exit(1);
      } else {
        arg_list[0] = f->val.scm_lam.clos;

        if(stack_frames >= 100){
          // Transfer continuation up
          current_fun     = f;
          current_args    = arg_list;
          longjmp(env, 1);
        }
        ++stack_frames;
        f->val.scm_lam.fun(arg_list);
      }
    }
This meant that now when we’ve applied 100 functions, we jump back to scm_init, demolishing all those unused stack frames and keep going.
There it is, that’s my minimally invasive technique for tail calls in C. From what I’ve heard this is also used by Chicken Scheme.

          
          
          comments powered by Disqus



You Could Have Invented GHC.Generics
Danny Gratzer — Fri, 25 Apr 2014 00:00:00 UT

    Posted on April 25, 2014
    


    
    Tags: haskell
    


In Haskell right now there seem to be two main approaches to data-generic programming. There’s the whole Typeable/Data approaches which is a bit magic. Lately however, there has been a new kid on the block, GHC.Generics.
In this post we’ll step through the intuition for the library and (hopefully) help shed some light on why it exists and how to use it.
Boilerplate
Let’s imagine you, our young and brilliant Haskell hacker cranking out some code. You’ve probably gone the typesafe route and have lots and lots of types to encode invariants.
However, this proliferation of types is cramping your style a bit, you’re forced to create a new function over each type which seems to do exactly the same thing!
    mapFoo  :: (a -> b) -> Foo a  -> Foo b
    mapBar  :: (a -> b) -> Bar a  -> Bar b
    mapQuux :: (a -> b) -> Quux a -> Quux b
But, we’re clever enough to notice that this is obviously just fmap! So we can scrap all of this with fmap and -XDeriveFunctor.
But, what about other functions. There are a lot of things that are basically mechanical to define over each type. Serialization, field selection, and so on and so on. Each of these operations have something in common; they deal with the structure of the types rather than the actual representation of it.
Selecting the first fields from
    data Foo a = Foo a a a
    data Bar a = Bar a a a
is almost identical! The only difference is in the name. So, let’s figure out a way to talk about the structure of our types.
Dissecting an Algebraic Type
Now, when we go to dissect some type data Foo = ... we have two things to consider

A list of constructors
A list of fields for each constructor

Let’s start with (2) since it’s simpler. For types that are of the form
    data SomeType = OneConstructor field1 field2 field3 ...
we can almost think of them as really, really big tuples.
    type SomeType' = (field1, field2, field3, ...)
But, since we want to encode different numbers of fields in just one type, let’s transform this further into
    type SomeType'' = (field1, (field2, (field3, ...)))
There we have it, we can encode lists of fields as a deeply nested group of tuples.
We can now imagine something like
    {-# LANGUAGE TypeFamilies #-}
    type family TupleForm a

    data Foo a = Foo a a
    type instance TupleForm (Foo a) = (a, a)

    data Bar a = Bar a a
    type instance TupleForm (Bar a) = (a, a)

    class Tuple a where
       toT :: a -> TupleForm a
       fromT :: TupleForm a -> a

    instance Tuple (Foo a) where
     ...
    instance Tuple (Bar a) where
     ...
Now we can write generic functions by only writing them for the TupleForm of Foo and Bar. For example,
    gfst :: (TupleForm a ~ (b, c), Tuple a) => a -> b
    gfst = fst . toT
Now that we understand fields, let’s move on to constructors!
A list constructors is the dual to a list of fields, representing OR rather than AND. We can make a bit of a leap from this to thinking that our representations of the two should be dual. So what would be the dual of (a, b)? Why that would be Either a b!
This means for a type
    data SomeType = Bar Int | Baz Char | Quux ()
    type SomeType' = Either Int (Either Char ())
This covers almost every case, we just need to make sure we represent no argument constructors as constructors of one argument: (). Take a moment to think why.
A Procedure for Reifying
Let’s now outline an algorithm for turning some arbitrary type to the corresponding generic version.
For a type C, with constructors C1, C2, C3.. and fields C1^1, C1^2, C2^1…

Change each set Cx^* to the TupleForm, call this TupleForm Tx
Nest the Tx’s in Either’s, Either T1 (Either T2 (Either T3 ...))

And that’s it, let’s practice on some data types to check that it works.
    data Test = Foo Int Char | Bar Int Bool Char | Quux
    type Test' = Either (Int, Char) (Either (Int, (Bool, Char)) ())

    data Maybe a  = Just a | Nothing
    type Maybe' a = Either a ()
So we can see that this transformation is pretty mechanical! There’s one hiccup though: what do we do with recursive types?
We’ll handle it the same way that GHC.Generics does, we just don’t transform the recursive arguments into the generic representation lest we end up with an infinite tree.
So [a] should look like
    type List a = Either (a, [a]) ()
Building a Library
Now if we want to build this into a library, we’d like to provide a few of our own data types rather than hijacking Either and (,).
    {-# LANGUAGE TypeOperators #-}

    data (:*:) a b = a :*: b       -- Like (,) a b
    data (:+:) a b = InL a | InR b -- Like Either a b
    data U         = U             -- Like (), U is for Unit
Now all our transformation are the same, but the results are prettier thanks to the type level operators
    type List   a = (a :*: [a]) :+: U
    type Maybe' a = a :+: U
Now to facilitate generic programming, we’ll lug one more parameter through each of these constructors and add another two types to wrap meta information and constants respectively
    data (:*:) a b p = a p :*: b p           -- Like (,) a b
    data (:+:) a b p = L1 (a p) | R1 (b p) -- Like Either a b
    data U         p = U                     -- Like (), U is for Unit

    newtype M1 i c f p = M1 (f p) -- i and c are meta info
    newtype K1 i c   p = K1 c
Now because we’re expecting all our arguments to (:*:) and (:+:) to be of kind * -> * we use K1 to wrap a normal type like Int so that it can take an argument.
M1 is a bit odd, it’s used to store information about our data entirely in phantom types. We can imagine having a bunch of types that represent different things, like whether the tree of constructors represents such and such data type or what constructor we’re dealing with. It’s not terribly relevant to the rest of this post, but useful in some odd cases.
Now we can repeat our transformation we’d discussed earlier just using the new constructors instead. We can imagine wrapping up this whole class like this
    class Generic a where
      type family Rep a :: * -> *
      to   :: a       -> Rep a
      from :: Rep a p -> a
This is very much in the spirit of our Tuple type class, but now our type family returns something of type * -> * to leave room for our extra p parameter
The Real Deal
As clever readers will have noticed, the above type class is precisely what GHC.Generics exports! We have successfully reached full circle and now have arrived at GHC.Generics' API.
The only difference between us and GHC.Generics is their Generic class can be derived almost identically to our algorithm. The only slight difference is rather than a “list” of :*:’s or :+:’s they make a tree, this makes little difference to most programs however.
To wrap things up, let’s finish by showcasing making a simple generic debugging dumper.
To begin with, we’ll define a class GDump and will make instances for the GHC.Generics types
    class GDump a where
       gdump :: a -> String

    instance GDump (U1 p) where
      gdump U1 = "()"

    instance Show c => GDump (K1 i c p) where
      gdump (K1 c) = show c

    instance (GDump (f p), GDump (g p)) => GDump ((:*:) f g p) where
      gdump (a :*: b) = "(" ++ gdump a ++ " :*: " ++ gdump b ++ ")"

    instance (GDump (f p), GDump (g p)) => GDump ((:+:) f g p) where
      gdump (L1 a) = "(Left  " ++ gdump a ++ ")"
      gdump (R1 a) = "(Right " ++ gdump a ++ ")"

    instance (GDump (f p)) => GDump (M1 a b f p) where
      gdump (M1 f) = gdump f
And now we can create a class for “normal” values and use -XDefaultSignatures to give the default implementation a Generic constraint
    class Dump a where
      dump :: a -> String

      default dump :: (Generic a, GDump (Rep a ())) => a -> String
      dump a = gdump (from' a)
        where from' :: Generic a => a -> Rep a ()
              from' = from  -- A hack to stop the type checker from whining about p
And now we can just use this default implementation.
    instance Show a => Dump (Maybe a)
Using this we can print suitably boring representations of generic types, for free!

          
          
          comments powered by Disqus



Continuations and Exceptions
Danny Gratzer — Mon, 14 Apr 2014 00:00:00 UT

    Posted on April 14, 2014
    


    
    Tags: haskell
    


Continuations are useful things. They provide a nice way to manually handle control flow. This fact makes them very useful for compilers, an often used alternative to SSA is an intermediate language in which every function call is a tail-call and every expression has been converted to continuation passing style.
Often however, this isn’t enough. In a language which exceptions, we don’t just have a single continuation. Since every expression can either do one of two things.

Continue the rest of the program normally
Throw an exception and run an alternative program, the exception handler

To represent this, we can imagine having two continuations. Instead of
    newtype Cont r a = Cont {runCont :: (a -> r) -> r}
We have
    {-# LANGUAGE DeriveFunctor #-}
    import Control.Monad

    newtype Throws r e a = Throws {runThrows :: (e -> r) -> (a -> r) -> r}
    deriving (Functor)
Now we have two continuations, where e -> r represents the composition of exception handlers.
We can write a trivial monad instance similar to Cont
    instance Monad (Throws r e) where
      return a = Throws $ \ex cont -> cont a
      (Throws c) >>= f = Throws $ \ ex cont ->
        c ex $ \a -> runThrows (f a) e cont
So >>= maintains the exception handler between computations and otherwise acts exactly like Cont.
To actually take advantage of our exception handlers, we need two things, a throw and catch like pair of function. Let’s start with throw since it’s easiest.
    throw :: e -> Throws r e a
    throw e = Throws $ \ex cont -> ex e
This is pretty straightforward, when we’re given an exception an to throw, we simply feed it to our exception handler continuation. Since care what value cont needs, we can universally quantify over a.
Next up is handle, we’ll represent an exception handler as a function from e -> Maybe a. If we return Nothing, we can’t handle the exception at this level and we’ll just pass it to the existing exception handler.
So our handle is
    handle :: Throws r e a -> (e -> Maybe a) -> Throws r e a
    handle (Throws rest) handler = Throws $ \ex cont ->
      rest (\e -> maybe (ex e) cont (handler e)) cont
Notice the clever bit here, each handler actually contains both the success and failure continuations! If we can’t handle the exception we fail otherwise we can resume exactly where we were before.
No post would be complete without a demonstration!
    data Ex = Boom | Kaboom | Splat String
            deriving Show

    throwEx1 = throw Boom
    throwEx2 = throw Kaboom
    throwEx3 = throw (Splat "splat")
    test = do
      result <- handle throwEx1 $ \e -> case e of
        Boom -> Just "foo"
        _    -> Nothing
      result2 <- handle throwEx2 $ \e -> case e of
        Boom   -> Just "what"
        Kaboom -> Just "huh?"
        _      -> Nothing
      result3 <- handle throwEx3 $ \e -> case e of
        Splat s -> Just s
        _       -> Nothing
      return (unwords [result, result2, result3])
We can run this with
    runThrows (error . ("Toplevel fail "++)) test
which returns
    "foo huh? splat"
So our exceptions do in fact, work :) ## A Note on Either Now we already have a perfectly good system of monadic exception like thing in the form of Either.
It might be interesting to note that what we’ve written is in fact isomorphic to Either. (e -> r) -> (a -> r) -> r is just the church representation of Either e a.
We can even go all the way and change Throws to
    newtype Throws e a = Throws {runThrows :: forall r. (e -> r) -> (a -> r) -> r}
So there you are, an interesting realization that one of the classic representations of a language like SML is in fact a really complicated version of Either :)

          
          
          comments powered by Disqus



Bargain Priced Coroutines
Danny Gratzer — Tue, 08 Apr 2014 00:00:00 UT

    Posted on April  8, 2014
    


    
    Tags: haskell
    


The other day I was reading the 19th issue of the Monad.Reader and there was a fascinating post on coroutines.
While reading some of the code I noticed that it, like most things in Haskell, can be reduced to 5 lines with a library that Edward Kmett has written.
Consider the type of a trampoline as described in this article
    newtype Trampoline m a = Tramp {runTramp :: m (Either (Tramp m a) a)}
So a trampoline is a monadic computation of some sort returning either a result, a, or another computation to run to get the rest.
Now this looks strikingly familiar. A computation returning Trampoline m a is really a computation returning a tree of Tramp m a’s terminating in a pure value.
This sounds like a free monad!
    import Control.Monad.Trans.Free
    import Control.Monad.Identity

    type Trampoline = FreeT Identity
Recall that FreeT is defined as
    data FreeF f a b = Pure a | Free (f b)
    data FreeT f m a = FreeT (m (FreeF f a (FreeT f m a)))
This is isomorphic to what we where looking at before. As an added bonus, we’ve saved the tedium of defining our own monad and applicative instance for Trampoline.
We can now implement bounce and pause to define our trampolines. bounce must take a computation and unwrap it by one level, leaving either a value or another computation.
This is just a matter of rejiggering the FreeF into an Either
    bounce :: Functor m => Trampoline m a -> m (Either (Trampoline m a) a)
    bounce = fmap toEither . runFreeT
      where toEither (Pure a) = Right a
            toEither (Free m) = Left $ runIdentity m
pause requires some thought, the trick is to realize that if we wrap a computation in one layer of Free when unwrapped by bounce we’ll get the rest of the computation.
Therefore,
    pause :: Monad m => Trampoline m ()
    pause = FreeT $ return (Free . Identity $ return ())
So that’s 6 lines of code for trampolines. Let’s move on to generators.
A generator doesn’t yield just another computation, it yields a pair of a computation and a freshly generated value. We can account for this by changing that Identity functor.
    type Generator c = FreeT ((,) c)
Again we get free functor, applicative and monad instances. We two functions, yield and runGen. Yield is going to take one value and stick it into the first element of the pair.
    yield :: Monad m => g -> Generator g m ()
    yield g = FreeT . return $ Free (g, return ())
This just sticks a good old boring m () in the second element of the pair.
Now runGen should take a generator and produce a m (Maybe c, Generator c m a). This can be done again by pattern matching on the underlying FreeF.
    runGen :: (Monad m, Functor m) => Generator g m a -> m (Maybe g, Generator g m a)
    runGen = fmap toTuple . runFreeT
      where toTuple (Pure a)         = (Nothing, return a)
            toTuple (Free (g, rest)) = (Just g, rest)
Now, last but not least, let’s build consumers. These wait for a value rather than generating one, so -> looks like the right functor.
    type Consumer c = FreeT ((->) c)
Now we want await and runCon. await to wait for a value and runCon to supply one. These are both fairly mechanical.
    runConsumer :: Monad m => c -> Consumer c m a -> m a
    runConsumer c = (>>= go) . runFreeT
      where go (Pure a) = return a
            go (Free f) = runConsumer c $ f c

    runCon :: (Monad m, Functor m)
        => Maybe c
        -> Consumer c m a
        -> m (Either a (Consumer c m a))
    runCon food c = runFreeT c >>= go
      where go (Pure a) = return . Left $ a
            go (Free f) = do
              result <- runFreeT $ f food
              return $ case result of
                Pure a -> Left                   $ a
                free   -> Right . FreeT . return $ free
runCon is a bit more complex than I’d like. This is to essentially ensure that if we had some code like
    Just a <- await
    lift $ do
      foo
      bar
      baz
    Just b <- await
We want foo, bar, and baz to run with just one await. You’d expect that we’d run as much as possible with each call to runCon. Thus we unwrap not one, but two layers of our FreeT and run them, then rewrap the lower layer. The trick is that we make sure never to duplicate side effects by using good old return.
We can sleep easy that this is sound since return a >>= f is f a by the monad laws. Thus, our call to return can’t do anything detectable or too interesting.
While this is arguably more intuitive, I don’t particularly like it so we can instead write
    runCon :: (Monad m, Functor m)
        => Maybe c
        -> Consumer c m a
        -> m (Either a (Consumer c m a))
    runCon food = fmap go . runFreeT
      where go (Pure a) = Left a
            go (Free f) = Right (f food)
Much simpler, but now our above example wouldn’t run foo and friends until the second call of runCon.
Now we can join generators to consumers in a pretty naive way,
    (>~>) :: (Functor m, Monad m) => Generator c m () -> Consumer c m a -> m a
    gen >~> con = do
      (cMay, rest) <- runGen gen
      case cMay of
        Nothing -> starve con
        Just c  -> runCon c con >>= use rest
      where use _    (Left a)  = return a
            use rest (Right c) = rest >~> c
And now we can use it!
    addGen :: Generator Int IO ()
    addGen = do
      lift $ putStrLn "Yielding 1"
      yield 1
      lift $ putStrLn "Yielding 2"
      yield 2

    addCon :: Consumer Int IO ()
    addCon = do
      lift $ putStrLn "Waiting for a..."
      Just a <- await
      lift $ putStrLn "Waiting for b..."
      Just b <- await
      lift . print $ a + b

    main = addGen >~> addCon
When run this prints
    Yielding 1
    Waiting for a...
    Yielding 2
    Waiting for b...
    3
Now, this all falls out of playing with what functor we give to FreeT. So far, we’ve gotten trampolines out of Identity, generators out of (,) a, and consumers out of (->) a.

          
          
          comments powered by Disqus



Church Representations: Part 3
Danny Gratzer — Mon, 10 Mar 2014 00:00:00 UT

    Posted on March 10, 2014
    


    
    Tags: haskell, types
    


To conclude my recent spat of posts on church representations I’d like to write one more.
My last two posts have focused on taking a random value and turning it into a nifty function that we can use in place of the value. In this post I’ll show to invert the process and given a function, return a value.
This requires a lot more intricate type level programming, so let’s start by turning on our slew of language extensions and importing a few libraries.
    {-# LANGUAGE TypeFamilies,          TypeOperators,     UndecidableInstances #-}
    {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts     #-}
    {-# LANGUAGE ScopedTypeVariables,   PolyKinds,         DataKinds            #-}
    import Data.Proxy
    import GHC.Generics
We’ll start by defining a few useful type families across type level lists (-XDataKinds has promoted [] for us).
    type family Head (xs :: [k]) :: k
    type instance Head (x ': xs) = x
    type family Tail (xs :: [k]) :: [k]
    type instance Tail (x ': xs) = xs

    pHead :: Proxy xs -> Proxy (Head xs)
    pHead = reproxy
    pTail :: Proxy xs -> Proxy (Tail xs)
    pTail = reproxy

    type family Append (xs :: [k]) (ys :: [k]) :: [k]
    type instance Append '[] ys = ys
    type instance Append (x ': xs) ys = x ': Append xs ys

    type family Reverse (xs :: [k]) :: [k]
    type instance Reverse '[] = '[]
    type instance Reverse (x ': xs) = Append (Reverse xs) (x ': '[])
There’s not too much of interest here, but it’s worth noting the syntax for type level lists '[] and ':. Additionally, I’ve made these functions polykinded for reasons that will become apparent momentarily.
A High Level Outline
Since the actual procedure for this is much more complex than going to a Church, it’s useful to have a high level overview of how we plan on doing this.
The 10,000 foot idea is something like this

Take in the church representation which has the form (a -> r) -> (b -> c -> r) -> r -> r or similar
Apply the constructors of the type we’re interested in to the church representation
Let the representation do the actual work of constructing the value

Our first major hiccup is that we don’t actually have access to the constructors. Instead we’ll traverse the generic representation of our type, and produce a type level list of “breadcrumbs” (I’ll explain more shortly) that can be fed to a typeclass to yield a function which takes either a U1 p, K1 a t p, or some large combination of :*:’s and returns our value. This serves as our makeshift constructor.
However, this alone isn’t enough since we need to provide a function of type a -> b -> r not M1 foo bar (K1 baz a) :*: M1 foo bar (K1 quux a) -> r. We need something akin to the ultimate “uncurry” that’ll take in a large product type component by component and build it up. In order to do this, we’ll introduce a “shell” of a product type where each node is undefined, then as we get each new argument, we’ll fill in the corresponding undefined in the product type.
This isn’t the nicest solution, since we’re playing with fire by tossing around undefined, but it’s much simpler than the alternatives which is to either somehow convert a tree of product types into a flat constructor or make each field of the product type maybe and step by step fill in each field and change the type of the product type.
The Gory Details
Without further delay, let’s dive in! First let’s create our type level representation of the “bread-crumbs” we’ll use to represent the type’s structure
    data Traverse a = Meta a a a | InL a a | InR a a | Term a
In our use case, we’ll specialize a to be *, the kind of simple haskell types.
Now the way to think about this is as a series of directions, each element of the list a specific instruction on how to navigate the type.

Meta represents a type wrapped in an M1 constructor so that M1 a b f p corresponds to Meta a b p
InL and InR represent going left or right in a :+: or :*:. Going left at (l :*: r) p is InL (r p) p
Term is the endpoint of our directions, it holds some leaf in the “tree” we’re considering

Now with this in mind, here’s how we can construct all possible paths in our types
    type family MakePaths v (m :: [Traverse *]) (r :: [ [Traverse *] ]) :: [[Traverse *] ]
    type instance MakePaths ((:+:) l r p) s all =
      Append (MakePaths (l p) (InL (r p) p ': s) '[])
              (Append (MakePaths (r p) (InR (l p) p ': s) '[]) all)
    type instance MakePaths (M1 a b f p) s all  =
      MakePaths (f p) (Meta a b p ': s) all
    type instance MakePaths (K1 a t p) s all    =  Reverse (Term (K1 a t p) ': s)    ': all
    type instance MakePaths (U1 p) s all        =  Reverse (Term (U1 p) ': s)        ': all
    type instance MakePaths ((:*:) l r p) s all =  Reverse (Term ((:*:) l r p) ': s) ': all
This traverse each path, maintaining a stack of all the current stuff seen as s previous paths as all. Similarly, we can also reconstruct the original type given a path
    type family ReconstructPath (t :: [Traverse *])
    type instance ReconstructPath (InL r p  ': rest) =
      (WithoutParam (ReconstructPath rest) :+: WithoutParam r) p
    type instance ReconstructPath (InR l p  ': rest) =
      (WithoutParam l :+: WithoutParam (ReconstructPath rest)) p
    type instance ReconstructPath (Meta a b p ': rest) =
      M1 a b (WithoutParam (ReconstructPath rest)) p
    type instance ReconstructPath (Term a     ': '[])  = a
Now it should be clear why we need to store the p and the other side of :*:s and :+:s, without them we couldn’t possible reconstruct the type.
We need one final type family before we can start doing real work, this needs to take a path and return the type that the path leads to, in other words the Term a at the end of our list.
Extracting this is pretty mechanical
    type family PathArg (t :: [Traverse *])
    type instance PathArg (Term a     ': '[] ) = a
    type instance PathArg (Meta a b p ': rest) = PathArg rest
    type instance PathArg (InR l p    ': rest) = PathArg rest
    type instance PathArg (InL r p    ': rest) = PathArg rest
We only need to explicitly pattern match on Meta’s and others because type families don’t have the same convenient “fall through” semantics as normal Haskell functions.
Notice that we expect our list to be terminated by a Term a ': '[], if they aren’t it represents a bug in MakePaths and will fail at compile time.
Now we can actually start writing some transformations, first let’s define a function that takes a [Traverse *] and the corresponding PathArg and returns our type
    class GPath (p :: [Traverse *]) where
      path :: Proxy p -> PathArg p -> ReconstructPath p
    instance GPath (Term a ': '[]) where
      path _ = id
    instance ((WithoutParam (ReconstructPath rest)) p ~ ReconstructPath rest, GPath rest)
             => GPath (InR r p ': rest) where
      path p a = R1 $ path (pTail p) a
    instance ((WithoutParam (ReconstructPath rest)) p ~ ReconstructPath rest, GPath rest)
             => GPath (InL l p ': rest) where
      path p a = L1 $ path (pTail p) a
    instance ((WithoutParam (ReconstructPath rest)) p ~ ReconstructPath rest, GPath rest)
             => GPath (Meta a b p ': rest) where
      path p a = M1 $ path (pTail p) a
This code rolls out pretty similarly to how ReconstructPath works, the only difference is at the end of it all we stick the PathArg we’ve been lugging around with us into the appropriate “slot” in our type.
Now we just need to take a series of paths and somehow give them to the church representation with path to mimic constructors
    class GBuild (paths :: [[Traverse *] ])f r where
      build :: Proxy paths -> f -> r

    -- | Unit case. This represents constructors with no arguments
    instance (ReconstructPath x ~ r, GPath x, GBuild xs f' r, PathArg x ~ U1 p)
             => GBuild (x ': xs) (r -> f') r where
      build p f = build (pTail p) $ f (path (pHead p) U1)
    instance (???) => GBuild (x ': xs) ((f -> g) -> f') r where
      build p f = build (pTail p) $ f (??? $ path (pHead p))
    instance GBuild '[] r r where
      build _ f = f
So build takes a list of paths by Proxy, the church representation f, and returns our result r, the last instance is the simplest, it represents once we’ve fully applied the church representation and can now just return it.
The next simplest one is the first one, where we have a constructor with no arguments so the PathArg is U1 p. Since we can trivially construct a U1 p, we just do so and give it to path ourselves, this creates an r that we can give to the church representation. Recall that for a Church a r, no argument constructors of a are simply represented with r.
Now the tricky bit, in the second instance we must fill in ??? with something that takes a product type and converts it to something that will swallow and fill in our product type step by step.
Our first step in doing this is to create a generic way of creating “empty” product types
    class GEmpty a where
      empty :: a
    instance GEmpty (U1 p) where
      empty = U1
    instance GEmpty (K1 a t p) where
      empty = K1 (error "Error! The impossible has happened.")
    instance GEmpty (f p) => GEmpty (M1 a b f p) where
      empty = M1 empty
    instance (GEmpty (l p), GEmpty (r p)) => GEmpty ((:*:) l r p) where
      empty = empty :*: empty
It’s pretty clear how this works. Everything except K1’s are filled in with empty’s and K1 is filled in with something equivalent to undefined.
Now we can write a type family to give us the paths to each K1 in a product type
    type family MakeProdPaths v (m :: [Traverse *]) (r :: [ [Traverse *] ]) :: [[Traverse *] ]
    type instance MakeProdPaths (K1 a t p) s all    = Reverse (Term (K1 a t p) ': s) ': all
    type instance MakeProdPaths (M1 a b f p) s all  = MakeProdPaths (f p) (Meta a b p ': s) all
    type instance MakeProdPaths ((:*:) l r p) s all =
      Append (MakeProdPaths (l p) (InL (r p) p ': s) '[])
              (Append (MakeProdPaths (r p) (InR (l p) p ': s) '[]) all)
This should strike the reader as very similar to MakePaths because it is. In fact the only real difference is that we only accept K1’s in nodes and that we’re branching on :*:’s.
Just like MakePaths, we’ll want a corresponding type class to take a path and a value and fill in the corresponding bit of our structure.
    class GUpdate (path :: [Traverse *]) a where
      update :: Proxy path -> a -> PathArg path -> a
    instance GUpdate (Term (K1 a t p) ': '[]) (K1 a t p) where
      update _ _ a = a
    instance GUpdate rest (l p) => GUpdate (InL (r p) p ': rest) ((:*:) l r p) where
      update p (l :*: r) a = update (pTail p) l a :*: r
    instance GUpdate rest (r p) => GUpdate (InR (l p) p ': rest) ((:*:) l r p) where
      update p (l :*: r) a = l :*: update (pTail p) r a
    instance GUpdate rest (f p) => GUpdate (Meta a b p ': rest) (M1 a b f p) where
      update p (M1 f) a = M1 (update (pTail p) f a)
Unlike GPath, we take in a structure instead of updating it instead of creating an entirely new one. Notice that we’re careful to never be strict in any leaves of the structure since we intend to use this with GEmpty and all those leaves will blow up if poked.
Now for the really clever bit, we can use update to create a function which will take in an argument and the corresponding path to fill it in in the structure.
    type family Fill (paths :: [[Traverse *] ]) r
    type instance Fill (x ': xs) r = StripK (PathArg x) -> Fill xs r
    type instance Fill '[] r = r

    class GFill (paths :: [[Traverse *] ]) a where
      fill :: Proxy paths -> (a -> r) -> a -> Fill paths r
    instance GFill '[] a where
      fill _ f a = f a
    instance (PathArg x ~ K1 m t p, StripK (PathArg x) ~ t, GUpdate x a, GFill xs a) =>
             GFill (x ': xs) a where
      fill p f a = \x -> fill (pTail p) f $ update (pHead p) a (K1 x)
Fill represents the type of these functions. While most of this code is as one would expect, notice that we take a continuation a -> r and drag it through fill and finally apply it once we’ve filled in all the leaves. There is a good reason for this, we intend to use this with path, but we can’t compose them since fill takes varying amounts of arguments. Instead, we opt for a bit of continuation passing style and through path into fill and let fill call it where appropriate.
Now we can fill in that last bit of GBuild
    instance ((f -> g) ~ Fill (MakeProdPaths (PathArg x) '[] '[]) r,
              ReconstructPath x ~ r, GEmpty (PathArg x), GBuild xs f' r,
              (GFill (MakeProdPaths (PathArg x) '[] '[]) (PathArg x)),
              GPath x)
             => GBuild (x ': xs) ((f -> g) -> f') r where
      build p f = build (pTail p) $ f (fill (prod p) (path (pHead p)) empty)
        where prod :: forall xs. Proxy xs -> Proxy (MakeProdPaths (PathArg (Head xs)) '[] '[])
              prod _ = Proxy
And now we’re almost done! We can wrap all of this up into one function
    fromChurch :: forall a. (Generic a,
                       GBuild (MakePaths (Rep a ()) '[] '[])
                              (Church a (Rep a ()))
                              (Rep a ()))
                  => Church a (Rep a ()) -> a
    fromChurch c = to $ (build p c :: Rep a ())
      where p :: Proxy (MakePaths (Rep a ()) '[] '[])
            p = Proxy
And that’s it! Automatic reconstruction of types from church representations! To demonstrate
> fromChurch (\nothing just -> just True) :: Maybe Bool
   Just True
> fromChurch (\f -> f 'a' "foo") :: (Char, String)
   ('a', "foo")
Thanks for reading through this (rather dense) series of posts! Most of the code can be found bundled into a package: generic-church.

          
          
          comments powered by Disqus



Church Representations: Part 2
Danny Gratzer — Fri, 07 Mar 2014 00:00:00 UT

    Posted on March  7, 2014
    


    
    Tags: haskell, types
    


In the last post, we discussed some of the type families needed to transform a type equipped with a Generic instance into a church representation.
In this post, we’ll go over the type class prolog needed to actually mechanically transform values between these types.
To start with, we’ll need a bit of boilerplate. Here’s our language extensions and imports.
    {-# LANGUAGE TypeFamilies,          TypeOperators,     UndecidableInstances #-}
    {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts     #-}
    {-# LANGUAGE ScopedTypeVariables,   RankNTypes                              #-}
    import GHC.Generics
    import Data.Proxy
Now our first order of business is to write a function to take a GHC.Generics value and transform it into the corresponding value occupying the type returned by StripMeta.
To do this we rely on type classes
    class GStripMeta a where
      stripMeta :: a -> StripMeta a
    instance GStripMeta (f p) => GStripMeta (M1 a b f p) where
      stripMeta (M1 f) = stripMeta f
    instance GStripMeta (K1 a t p) where
      stripMeta = id
    instance GStripMeta (U1 p) where
      stripMeta = id
    instance (GStripMeta (l p), GStripMeta (r p),
              (WithoutParam (StripMeta (l p))) p ~ StripMeta (l p),
              (WithoutParam (StripMeta (r p))) p ~ StripMeta (r p)) =>
             GStripMeta ((:*:) l r p) where
      stripMeta (l :*: r) = stripMeta l :*: stripMeta r
    instance (GStripMeta (l p), GStripMeta (r p),
              (WithoutParam (StripMeta (l p))) p ~ StripMeta (l p),
              (WithoutParam (StripMeta (r p))) p ~ StripMeta (r p)) =>
             GStripMeta ((:+:) l r p) where
      stripMeta (L1 l) = L1 $ stripMeta l
      stripMeta (R1 r) = R1 $ stripMeta r
This does some type class prolog to traverse a GHC.Generics value and systematically throw out the M1 annotations. An important technique that I make heavy use of is the ~ equality constraints. These let us assert that two types are equivalent and the type checker will attempt to verify this once it’s selected the appropriate instance later. Other than that there’s not too much of interest here so let’s move on to something with more substance.
Our ToList type family requires quite a bit of work to automagically do with a type family. The base cases are pretty straightforward
    class GList a r where
      toList :: Maybe a -> r -> ToList a r
    instance (WithoutParam r) p ~ r => GList (U1 p) r where
      toList Nothing  r = R1 r
      toList (Just a) _ = L1 a
    instance (WithoutParam r) p ~ r => GList (K1 a t p) r where
      toList Nothing  r = R1 r
      toList (Just a) _ = L1 a
    instance (WithoutParam r) p ~ r => GList ((l :*: r') p) r where
      toList Nothing  r = R1 r
      toList (Just a) _ = L1 a
The trick here is that if we get a Nothing, it means that somewhere in the process of choosing the list we’ve already found state our sum type is in and all we do is pass things of to R1. Otherwise, we shove the value we receive into L1.
Now the tricky bit is the :+: instance which must walk along the spine of our tree flattening things as it goes.
    instance (GList (l p) (ToList (r' p) r), GList (r' p) r) =>
             GList ((l :+: r') p) r where
      toList (Just sum@(L1 l)) r = toList (Just l) (toList (rNot sum) r)
        where rNot :: forall l r p. (l :+: r) p -> Maybe (r p)
              rNot _ = Nothing
      toList (Just sum@(R1 r')) r = toList (lNot sum) (toList (Just r') r)
        where lNot :: forall l r p. (l :+: r) p -> Maybe (l p)
              lNot _ = Nothing
      toList m r = toList (lNot m) (toList (rNot m) r)
        where lNot :: forall l r p. Maybe ((:+:) l r p) -> Maybe (l p)
              lNot _ = Nothing
              rNot :: forall l r p. Maybe ((:+:) l r p) -> Maybe (r p)
              rNot _ = Nothing
We’re just plugging in the appropriate l and r into toList l (toList r rest). if both we have Just (L1 l) then we put in Nothing for r, similarly for Just (R1 r) and if we have Nothing then both are filled in as Nothings.
Notice that we have to a few hoops using lNot and rNot. Otherwise GHC will complain that type classes aren’t injective and it’s not sure how to handle Nothing :: Maybe a. However, a bit of explicit hand holding takes care of this.
For our next type class hacking we need to actually add one type family that I forgot in our last post.
    type family ToListProd v rest
    type instance ToListProd ((:*:) l r' p) r = ToListProd (l p) (ToListProd (r' p) r)
    type instance ToListProd (K1 a t p)     r = (K1 a t     :*: WithoutParam r) p
    type instance ToListProd (U1 p)         r = U1 p -- since U1 is never in `:*:`'s.
This is isomorphic to ToList but instead of restructuring :+:’s, it moves around :*:’s. The corresponding type class for this almost identical to GList
    class GListProd a r where
      toListProd :: a -> r -> ToListProd a r
    instance (WithoutParam r) p ~ r => GListProd (U1 p) r where
      toListProd = const -- Throw away the rest which must be ListTerm
    instance (WithoutParam r) p ~ r => GListProd (K1 a t p) r where
      toListProd = (:*:)
    instance (GListProd (l p) (ToListProd (r' p) r), GListProd (r' p) r) =>
             GListProd ((:*:) l r' p) r where
      toListProd (l :*: r) rest = toListProd l (toListProd r rest)
The only notable difference here is that we don’t have a Maybe a since with products both sides our present. This makes the whole thing much simpler.
No we’re ready to proceed to the actual transformation type classes.
    class GChurchProd a where
      prod :: Proxy r -> a -> ChurchProd a r -> r -- Proxy needed for GChurchSum
    instance GChurchProd (U1 p) where
      prod _ _ f = f
    instance GChurchProd (K1 a t p) where
      prod _ (K1 r) f = f r
    instance GChurchProd (r p) => GChurchProd ((:*:) (K1 a t) r p) where
      prod p (K1 l :*: r) f = prod p r (f l)

    class Swallow a where
      swallow :: Proxy a -> c -> ChurchSum a c
    instance Swallow (ListTerm p) where
      swallow _ c = c
    instance Swallow (r p) => Swallow ((:+:) l r p) where
      swallow p c = \_ -> swallow (right p) c
        where right :: forall l r p. Proxy ((:+:) l r p) -> Proxy (r p)
              right _ = Proxy
In GChurchProd we take a value and the corresponding product eliminator and eliminate it. Believe it or not this is essentially the workhorse of this entire library. We’re threading a Proxy r through there which will come in handy when we start to use this in GChurchSum.
Swallow is a bit odd. It represents a situation where we’ve already used prod to produce our result and now need to eat the rest of the supplied arguments. Note again the use of Proxy to keep track of types, it’s a useful little library!
Now, finally, GChurchSum
    class GChurchSum a r where
      elim :: Proxy r -> a -> ChurchSum a r -- Proxy because type inference is stubborn

    instance (GListProd (l p) (ListTerm ()), GChurchProd (ToListProd (l p) (ListTerm ())),
              GChurchSum (r' p) r, Swallow (r' p)) =>
             GChurchSum ((:+:) l r' p) r where
      elim p sum@(L1 l) = \f ->
        swallow (right sum) (prod p (toListProd l (ListTerm :: ListTerm ())) f)
        where right :: forall l r p. (:+:) l r p -> Proxy (r p)
              right _ = Proxy
      elim p (R1 r) = \_ -> elim p r
    instance GChurchSum (ListTerm p) r where
      elim _ _ = error "Malformed generic instance"
Now the elim instance for ListTerm can never be called, this is guarenteed by the definition of toList since a type must occupy one state prior to ToList, we’ll never end up with ListTerm.
Otherwise if we get an L1 then we’re at the actual value our sum type is in so we produce an r and swallow the rest of our arguments, notice that this is where we actual transform a leaf into a ToListProd and this is reflected in our constraints. Otherwise we ignore the irrelevant eliminator and recurse!
To put it all together
    from' :: Generic a => a -> Rep a ()
    from' = from

    toChurch :: forall a r.
                (Generic a, GStripMeta (Rep a ()),
                 GList (StripMeta (Rep a ())) (ListTerm ()),
                 GChurchSum (ToList (StripMeta (Rep a ())) (ListTerm ())) r) =>
                a -> Church a r
    toChurch = elim p . flip toList (ListTerm :: ListTerm ()) . Just . stripMeta . from'
      where p = Proxy :: Proxy r
And we’re done! Using this we can reify a type to it’s church representation. We can mostly ignore that scary looking constraints on toChurch since they should be true by construction.
As a demo
> toChurch [1, 2, 3] True (\_ _ ->  False)
False
> toChurch [] True (\_ _ ->  False)
True
The True corresponds to the [] list case and the function represents (:). The current True (\_ _ -> False) actually computes null.
Edit, added GListProd

          
          
          comments powered by Disqus



Church Representations
Danny Gratzer — Thu, 06 Mar 2014 00:00:00 UT

    Posted on March  6, 2014
    


    
    Tags: haskell, types
    


A project I’ve been playing with lately is generalizing maybe. Now on the surface that sounds well.. boring. But there’s actually some interesting concepts buried in here.
Lambda The Almighty
Let’s start by specifying what I mean when I say “generalize”. When we look at maybe, the type gives us a pretty strong clue on how to implement it
    maybe :: b -> (a -> b) -> Maybe a -> b
    maybe nothingCase justCase Nothing  = nothingCase
    maybe nothingCase justCase (Just a) = justCase a
Each argument of corresponds to a different case of the sum type Maybe.
There’s actually a name for this idea, it’s encoding a data type within functions: Church Encoding.
Let’s rattle of some examples:
Tuples
    {-# LANGUAGE RankNTypes #-}
    -- We'll call a tuple a Product because it
    -- looks like the cartesion *product* of two types
    type Product a b = forall c. (a -> b -> c) -> c
    pair :: a -> b -> Product a b
    pair a b = \destruct -> destruct a b

    -- We can easily make fst and snd
    fst :: Product a b -> a
    fst p = p $ \a b -> a
    snd :: Product a b -> b
    snd p = p $ \a b -> b
We can do Either as well
    -- We'll call Either a "sum" because it
    -- looks like the disjoint union (sum) of two types
    type Sum a b = forall c. (a -> c) -> (b -> c) -> c
    inl :: a -> Sum a b
    inl a = \l r -> l a

    inr :: b -> Sum a b
    inr b = \l r -> r b
Look familiar? That’s just Data.Either.either! Now if we squint at these we can imagine building up more complex types from these building blocks
    data AFew = NoVal | OneVal Int | ThrVal Int Bool String

    type ChurchAFew =
      Sum () (Sum Int (Prod Int (Prod Bool String)))

    noVal :: ChurchAFew
    noVal = inl ()

    oneVal :: Int -> ChurchAFew
    oneVal i = inr (inl i)

    thrVal :: Int -> Bool -> String -> ChurchAFew
    thrVal i b s = inr . inr . pair i $ pair b s
And now pattern matching more or less falls out for free from ChurchAFew. Since it’s a function we transform
    case foo of
      NoVal        -> ...
      OneVal i     -> ...
      ThrVal i b s -> ...
    -- becomes
    foo (\() -> ...) (\p -> p (\i -> ...) (\p1 -> p1 (\i p2 -> ...)))
Such is the power of lambda! And there are all sorts of fun side effects of pattern matching being a function, most of it to do with nicer composition with point-free functions.
But It’s Boring!
There’s a drawback here, boilerplate! We have to essentially duplicate all our data declarations, extra boilerplate for generating accessors, and then two functions to map back and forth between our representations.
As a programmer I’m far too lazy to write all that!
Whenever we start to think of terms of sum and product types it’s time to turn to GHC.Generics. It’s a library that provides a type class to reify our normal types into explicit sums and products and back again.
The first thing we have to do is write a type level function to reify a GHC.Generics representation to the appropriate type.
For example Maybe Int has the following representation
    M1
        D
        GHC.Generics.D1Maybe
        (M1 C GHC.Generics.C1_0Maybe U1
         :+: M1 C GHC.Generics.C1_1Maybe (M1 S NoSelector (K1 R Int)))
We can strip out all the M1 meta information since we don’t really care leaving
    U1 :+: K1 R Int
Not so bad! Let’s start by writing a type level function (type family) to get rid the M1 constructors
    {-# LANGUAGE TypeFamilies, TypeOperators, UndecidableInstances, RankNTypes #-}

    import GHC.Generics

    -- | Remove the extra `p` parameter that GHC.Generics
    -- lugs through every constructor
    type family WithoutParam v :: * -> *
    type instance WithoutParam ((:+:) l r p) = l :+: r
    type instance WithoutParam ((:*:) l r p) = l :*: r
    type instance WithoutParam (U1 p)        = U1
    type instance WithoutParam (K1 a t p)    = K1 a t

    -- | Strip out `M1` tags
    type family StripMeta v :: *
    type instance StripMeta (M1 a b f p)  = StripMeta (f p)
    type instance StripMeta (K1 a t p)    = K1 a t p
    type instance StripMeta ((:+:) l r p) =
      (:+:) (WithoutParam (StripMeta (l p))) (WithoutParam (StripMeta (r p))) p
    type instance StripMeta ((:*:) l r p) =
      (:*:) (WithoutParam (StripMeta (l p))) (WithoutParam (StripMeta (r p))) p
    type instance StripMeta (U1 p)        = U1 p
As we can see these type families are well.. pretty terrible. But they work! Next we can actually do some real work. We need to take a product type with one or more members and turn it into a function.
    type family ChurchProd v c :: *
    type instance ChurchProd (K1 a t p) c    = t -> c
    type instance ChurchProd (U1 p)     c    = () -> c
    type instance ChurchProd ((:*:) l r p) c =
      ChurchProd (l p) (ChurchProd (r p) c)
So here we have a type family with two parameters, the term and the “out” type. These take a :*: b :*: c to a -> b -> c -> .... This is important because GHC.Generics represents things like a list where the (:) equivalent is :+: and the each leaf is product or unit type.
Now at least we can run
> :kind! ChurchProd (StripMeta (Rep (Int, Bool) ())) Char
ChurchProd (StripMeta (Rep (Int, Bool) ())) Char :: *
= Int -> Bool -> Char
As it happens, I told a small fib, GHC.Generics doesn’t make quite a list. In fact it makes a tree! We can rejigger things to a list though
    data ListTerm p -- The list terminator

    type family ToList v rest :: *
    type instance ToList ((:+:) l r' p) r = ToList (l p) (ToList (r' p) r)
    type instance ToList (K1 a t p)     r = (K1 a t     :+: WithoutParam r) p
    type instance ToList ((:*:) l r' p) r = ((l :*: r') :+: WithoutParam r) p
    type instance ToList (U1 p)         r = (U1         :+: WithoutParam r) p
Now the final piece, we need to write a function which “folds” over a tree of :+: and produces a function (a -> c) -> (b -> c) -> ... -> c
    type family ChurchSum v c :: *
    type instance ChurchSum ((:+:) l r p) c = ChurchProd (l p) c -> ChurchSum (r p) c
    type instance ChurchSum (ListTerm p) c = c


    -- A driver type for the whole thing
    type Church t = forall c. ChurchSum (ToList (StripMeta (Rep t ())) (ListTerm ())) c
And there we go! As a quick test
    {-# LANGUAGE DeriveGeneric #-}
    data AFew = S1 Int Int Int | S2 Bool Char | S3 String | S4
              deriving Generic
And now
> kind! Church AFew
(Int -> Int -> Int -> c)
  -> (Bool -> Char -> c)
  -> ([Char] -> c)
  -> (() -> c)
  -> c
Tada! Now we’ve automated the generation of types for church representations. In the next post we’ll actually go about populating them.

          
          
          comments powered by Disqus



Types and Kinds and Sorts, Oh My!
Danny Gratzer — Mon, 10 Feb 2014 00:00:00 UT

    Posted on February 10, 2014
    


    
    Tags: haskell, types
    


One subject that most introductory Haskell books fail to address is kinds. This means that when most intermediate Haskellers start looking at Haskell extensions they’re flummoxed by DataKinds.
This post aims to introduce intermediate haskellers to kinds and sorts as well as how they enter into the world of Haskell programming.
Think about an expression in Haskell, if it’s well formed, then we can assign it a type: 1 :: Int, "foo" :: String, and Just () :: Maybe () for example.
Now this has all sorts of lovely benefits like ensuring we can’t write insane expressions like "foo" + 2. However, none of those benefits seem to extend to the types themselves.
We have functions at the type level, consider
    data Cons a b = Pair a b
Cons looks like the type level equivalent of Pair. But how do we ensure that these actually work? What if we wrote
    foo :: Cons Maybe Either
    foo = ???
This makes no sense however, there is no value whose type is Maybe.
This hints that we want something corresponding to a type system at the type level. Something to ensure that all the types we write make some sort of sense. For example, let’s call the “type” of types things values occupy, *. So Int :: *, as well as String, [a], and many others. Now it seems that Cons takes in two types, a :: * and b :: *, and returns another type, Cons a b :: *. This is notated * -> * -> *.
Now let’s rattle off some other examples
    Maybe  :: * -> *      -- Maybe takes a type and returns another
    Either :: * -> * -> * -- Either takes two types and returns another
    StateT :: (* -> *) -> * -> *
Notice that there’s something interesting about StateT, it takes a function of type onto type. It’s a type level parallel of a higher order function!
This is what kinds are all about, kinds are the type of types! Indeed, Haskell even notates the kind of types that values occupy as * as well. We can read something like Int :: * as Int has the kind *.
StateT is what’s called a higher kinded type, it’s the type level version of higher order functions.
Now the step is to ask, what’s the “type” of a kind? * :: ??? the answer is, a sort. However, Haskell doesn’t talk of sorts and conceptually only has one, BOX. It is occasionally helpful to think as if BOX existed, but we can’t actually state this in Haskell. This means that while the sorts conceptually exist, we can’t really do much of anything with them. Perhaps in the future Haskell will grow a few more extensions to enable talking about sorts, until then though, we’ll focus on kinds.
Now back to kinds, what does Haskell have in terms of a kind system

By default, we have two kind constructors, (->) and *.
With -XKindSignatures we can actually utter kinds, eg a :: *.
With -XDataKinds we can define our own kinds just like we can with types.
With -XTypeFamilies we can write type level functions.
With -XPolyKinds we have parametric polymorphism at the kind level.

Data Kinds
The motiviation for DataKinds is that Haskell’s vanilla kind system is well… boring. It doesn’t really help us since we can’t write our own kinds and types to occupy these kinds.
Let’s take a simple example using GADTs and red-black binary trees. For the sake of brevity I’ll leave the reader to take a moment and learn about GADTs if necessary.
    {-# LANGUAGE KindSignatures, GADTs, EmptyDataDecls #-}
    {- No DataKinds -}

    data Black -- This is what EmptyDataDecls allows,
    data Red   -- types with no constructors

    data Tree :: * -> * -> * where
      Leaf  :: Tree a Black
      NodeR :: a -> Tree a Black -> Tree a Black -> Tree a Red
      NodeB :: a -> Tree a c     -> Tree a c'    -> Tree a Black
Here we’re attempting to model the fact that in a red-black binary tree, a red node has black children and a black node has either red or black children.
However this doesn’t model it correctly, we’d like to make it impossible to state nonsense like
    crazy :: Tree a Int
    crazy = undefined
The problem here is that the kind of Tree is * -> * -> *. We don’t really mean that a tree be colored by any type of kind *! We really want to limit it so that we can only color a tree with Red and Black.
Enter DataKinds
    {-# LANGUAGE KindSignatures, DataKinds, GADTs #-}

    data Color = Red | Black

    data Tree :: * -> Color -> * where
      Leaf  :: Tree a Black
      NodeR :: a -> Tree a Black -> Tree a Black -> Tree a Red
      NodeB :: a -> Tree a c     -> Tree a c'    -> Tree a Black
Now if we attempted
    foo :: Tree a Int
    foo = undefined
We’ll get a kind error! This is a simple example of how we can leverage the kind system to rule out illegal programs.
Let’s attempt to encode a more complex property, that there are exactly the same number of black nodes below every node. To start we’ll need a type level encoding of numbers
data Nat = Z | S Nat
These are called peano numbers, Z is zero and S is equivalent to +1, so 2 is S (S Z). Now we can integrate these into our tree.
    {-# LANGUAGE KindSignatures, DataKinds, GADTs #-}
    data Tree :: * -> Color -> Nat -> * where
      Leaf  :: Tree a Black Z
      NodeR :: a -> Tree a Black n -> Tree a Black n -> Tree a Red   n
      NodeB :: a -> Tree a c n     -> Tree a c' n    -> Tree a Black (S n)
Taking a moment to examine this, we see that a Leaf has 0 black nodes below it, which makes perfect sense. Then nodes take two trees of identical height and either adds one to the height if the node is black or leaves it the same.
Now if we attempted to create an unbalanced tree
    unbalanced = NodeB () Leaf (NodeB () Leaf Leaf)
We get a type error!
Hopefully this clears up what kinds are and how we can leverage them to statically check some properties of our programs.
For the curious reader I encourage you to look at PolyKinds and TypeFamilies, these let you express some very sophisticated programs at the type level in Haskell. If this really tickles your fancy, perhaps make the leap to Agda, Idris, or Coq to enjoy full dependent types.
Thanks to GlenH7 and JimmyHoffa on thewhiteboard and byorgey on #haskell for proof reading

          
          
          comments powered by Disqus



Optimizing a Trie
Danny Gratzer — Tue, 28 Jan 2014 00:00:00 UT

    Posted on January 28, 2014
    


    
    Tags: haskell
    


The other day I stumbled across some of old code for a trie. Being the hardworking and responsible student that I am, I immediately dropped all my homework to try and squeeze some performance out of it.
The code I ended up with was something like
    {-# LANGUAGE ViewPatterns, OverloadedStrings #-}
    import Data.List
    import Data.Maybe
    import qualified Data.Map.Strict as M
    import Data.Map.Strict ((!))
    import qualified Data.ByteString.Char8 as B

    newtype Trie = Node {subs :: M.Map Char Trie}
      deriving (Eq, Show, Ord)

    add :: B.ByteString -> Trie -> Trie
    add (B.uncons -> Just (c, s)) (Node subs) = Node
                                               . with
                                               . maybe (Node M.empty) id
                                               $ M.lookup c subs
      where with next = M.insert c (add s next) subs

    add _ trie = trie

    build :: [B.ByteString] -> Trie
    build = foldl' (flip add) (Node M.empty)

    contains :: B.ByteString -> Trie -> Bool
    contains (B.uncons -> Just (c, s)) (Node subs) = M.member c subs && contains s next
      where next = subs ! c
    contains _ _ = True
and so I decided to test it with a simple main
    main = do
      trie <- (build . B.lines) `fmap` B.readFile "/usr/share/dict/words"
      print $ contains "zebra" trie
and compiled it with the standard optimization flags
> ghc -O2 -fllvm trie.hs
> time ./trie
True

real 0m0.964s
user 0m0.896s
sys  0m0.059s
So pretty reasonably fast for about 500k words. Then to see if I could kick this in the teeth by forcing the entire trie I ran
    main = do
      words <- B.lines `fmap` B.readFile "/usr/share/dict/words"
      print $ all (flip contains $ build words) words
And again I ran it
> ghc -O2 -fllvm trie.hs
> time ./trie
True

real 0m1.557s
user 0m1.427s
sys  0m0.119s
Now this is weird. It seems that such building the trie is such a huge bottleneck that the building it takes around .9 seconds and querying it 500k times is only .6 seconds.
Now the next logical step was to see if maybe this was due to a build up of garbage that was dominating my time. A quick run with +RTS -s gives tells me that about 36% of my time is GC on both. This isn’t shocking since I’m building up a ton of data over a relatively short period of time.
So it’s not GC, and both operations are obviously O(n) (remembering that each map has 256 or less entries). So the only difference between them is that add allocates memory, a new map, and contains doesn’t. So let’s confirm our suspicions with a quick round of profiling.
 > ghc -O2 -fllvm -prof -auto-all trie.hs
 > ./trie
generates trie.prof with the profiling information for our runs.
They both inform us that about 97% of time and memory is spent in add. Now this seems odd, since our time difference between the runs was significant. I hypothesize that the reason for this is that when we make 500k queries we’re forcing the entire trie which is lazy. Indeed looking out the output from +RTS -s confirms that the 500k output allocates 1.69 mb of memory while only one query allocates 80 mb of memory.
So the bottleneck here is that allocation is slow as a dog. More to follow on optimizing this.

          
          
          comments powered by Disqus



Faking Existentials with Rank N Types
Danny Gratzer — Tue, 14 Jan 2014 00:00:00 UT

    Posted on January 14, 2014
    


    
    Tags: haskell, types
    


GHC has a language extension call ExistentialQuantification. This lets users write “existential” types. There are lots of good explanations of what an existential type is, but to briefly summarize: an existential type allows the callee to choose the type versus the caller.
As an example, if we have
    {-# LANGUAGE ExistentialQuantification #-}
    data NotExist a = NotExist a
    data Exists = forall a. Show a => Exists a

    normal :: Show a => NotExist a
    normal = NotExist undefined

    exists :: Exists
    exists = Exists "The callee chose this"
With normal the caller gets to choose what a is and so we have to use undefined, otherwise we’d have to construct an arbitrary Show a => a.
With exists, the callee get’s to choose what is boxed up in Exists, the caller can’t control anything about it.
Now, in logic existentials correspond to propositions like ∃ x∈ℕ. x > 0 ∧ x < 2 or in English, “There exists an x such that x > 0 and x < 2”. Normal haskell types like NotExists correspond more to propositions like ∀ x∈ℕ. x < x + 1.
Interestingly, we can actually define ∃ in terms of ∀.
∃ x∈A. P(x) = ∀ Q. (∀ c∈A. P(c) → Q) → Q
In English, the proposition that “There exists an x in A so that P(x)” is equivalent to “For all propositions Q, if for all c, P(c) implies Q, then Q.”
This can be translated to Haskell!
We’ll need to enable rank n types since our definition for ∃ uses nested ∀s. Additionally, we’ll use constraint kinds and impredicative polymorphism since we’ll want to pass around typeclass constraints and store polymorphic values in lists.
    {-# LANGUAGE RankNTypes, ConstraintKinds #-}
    {-# LANGUAGE KindSignatures, ImpredicativeTypes #-}
    import GHC.Prim (Constraint)

    type Exists c = forall x. (forall a. c a => a -> x) -> x
Here P(x) becomes the proposition c a => a. Proving this “proposition” is done by providing a value of type a, this is sometimes called “witnessing”.
If this jump has left you baffled, try doing a bit of research on the “Curry Howard Isomorphism” and remembering that the type c a => a is really the same as the pair (Dict, a) where Dict is the record of all the function in the typeclass c.
With this intuition (terrible pun for constructionists) we can write a function to construct a Exists c given an c a => a.
    exists :: c a => a -> Exists c
    exists witness cont = cont witness
Now we can actually write some code using this
    -- This needs impredicative polymorphism
    showables :: [Exists Show]
    showables = [ exists "string"
                , exists 'c'
                , exists ()
                , exists True]
So now we’ve got a list of Exists Shows so let’s figure out how to use it. Let’s write a function that takes a function forall a. c a => a -> b and returns a function Exists c -> b.
    withExists :: (forall a. c a => a -> b) -> Exists c -> b
    withExists cont existential = existential cont
Now we can write
    main = mapM_ (withExists print) showables
Which outputs
"string"
'c'
()
True
Just as expected!
And there you have it, existential types cobbled together from a few other extensions.

          
          
          comments powered by Disqus



Some Fun Dependently Typed Programs
Danny Gratzer — Tue, 24 Dec 2013 00:00:00 UT

    Posted on December 24, 2013
    


    
    Tags: agda, types
    


I’ve been playing with Agda lately and decided to translate my two favorite dependently typed programs from Coq to Agda. Here are the results.
Variadic Functions
Variadic functions are hard to do right. Especially since with fun a b c d e it isn’t clear if fun a b c d is supposed to be a call to a variadic function which returns a function, or whether the whole thing is one function call, or just a type error.
Now it’d be nice if we could say something like
add 7 1 2 3 4 5 6 7
In other words, tell the variadic function how many arguments we’ll give it at runtime. This is marginally less flexible than just listing them off, but not by much.
Now how could we encode this? Our type would depend on the value we pass it! So we could write something like
    open import Data.Nat

    var_ty : ℕ -> Set -> Set -> Set
    var_ty 0 _  ret    = ret
    var_ty (suc a) t r = t -> var_ty a t r
So a call to var_ty returns a function of n arguments of type t and returns an r.
From there it’s simple to write our variadic sum,
    var_sum' : (n : ℕ) -> ℕ -> var_ty n ℕ ℕ
    var_sum' 0 cur       = cur
    var_sum' (suc a) cur = \x -> var_sum' a (cur + x)

    var_sum : (n : ℕ) -> var_ty n ℕ ℕ
    var_sum n = var_sum' n 0
And var_sum 3 1 2 3 evaluates to 6, kinda nifty.
Heterogeneous lists
Functional languages are commonplace in functional programming, but they’re almost always homogeneous, meaning the list has only one type of element.
It’d be nice to store multiple types in the same list, and indeed we can do this with a bit of dependent-type-foo
    open import Data.List

    data HList : List Set -> Set1 where
      []  : HList []
      _∷_ : {A : Set}{xs : List Set} -> A -> HList xs -> HList (A ∷ xs)
So we attach a list of types to our HList and the element at i has the type of this list at i. The interesting bit is the definition of ∷. It takes an implicit type A, and a list of types xs. It then takes an A and an HList xs and returns the new updated HList.
This is quite similar to (a, (b, (c, ()))) in Haskell, but the dependent types make it much more pleasant to use. For example, we can now write
foo : HList _
foo = 1 ∷ true ∷ "foo" ∷ tt ∷ []
Not too shabby. This is much more pleasant to use than the Haskell equivalent, nested tuples. For example, to write a length function for tuples in Haskell, you’d have to say something like
    class HasLength a where
      ...
    instance HasLength b => HasLength (a, b) where
      ...
And rely on some typeclass prolog. Compare this to the equivalent
    hlength : {xs : List Set} -> HList xs -> ℕ
    hlength {xs} _ = length xs
Since we have a nice flat list of all the types in our structure, it’s much simpler to work with.
That’s all for now, I’ll probably rant more about agda in the future.

          
          
          comments powered by Disqus



Sieves in Haskell: Part 2
Danny Gratzer — Tue, 17 Dec 2013 00:00:00 UT

    Posted on December 17, 2013
    


    
    Tags: haskell
    


So in my last post about sieves in Haskell, I’d mentioned that there is a purely functional approach infinite, lazy sieves. In the post I’ll explain how to construct these.
The first step to implementing the lazy version is to realize we need 2 kinds of laziness,

Laziness in the actual list
Laziness in how we cross things off

This is important because in order to maintain an efficient sieve, we must cross off all multiples of a number when we see the number, otherwise we degrade to trial division.
However, this seems impossible since well, our list is infinitely long. We can work around this though by using a form of iterators.
Specifically, we’ll have a function like this
    import qualified Data.IntMap.Strict as M
    import Data.List
    sieve' :: [Int] -> M.IntMap [Int] -> [Int]
And in that IntMap, we store a list of prime factors of that number. We use these as “iterators”. So when we get to a number n, if n has a list of primes associated with them, insert each prime p at p + n in the map and don’t add n to our list. If n has no prime factors, then it is clearly prime so we add it to our list and add n as a prime factor of 2 * n.
In Haskell code
    sieve' (n:ns) m =
      case M.lookup n m of
        Nothing -> n : sieve' ns (M.insertWith (++) (2 * n) [n] m)
        Just ps -> sieve' ns $ foldl' insertPrime (M.delete n m) ps
      where insertPrime m p = M.insertWith (++) (n + p) [p] m
And then to drive this, we can just use
    sieve :: [Int]
    sieve = sieve' [2..] M.empty

    main = sum . takeWhile (<2000000) $ sieve
And there you have it. This code is certainly cleaner than the ST version and decently performant, clocking in at 1.5 seconds. This is a little unfair to this version though since the ST code takes advantage of unboxed types which isn’t possible here. Additionally, this version is strictly more general, creating an infinite list rather than a finite one.
I’ll updated with a fairer benchmark once I have time to redo the ST code.

          
          
          comments powered by Disqus



Colleges and My Not-So-Distant Future
Danny Gratzer — Sun, 15 Dec 2013 00:00:00 UT

    Posted on December 15, 2013
    


    
    Tags: personal
    


This is a short but excited post.
I’m a senior in high school and that means colleges. This weekend has been a rather interesting one, I’ve been accepted to two schools I’ve always dreamed about, MIT and CMU.
I was absolutely shocked. I’ve been wanting the chance to go to a school like either of them since I was 8. These last 9 or so months have seen me wobbling back and forth between hopelessness, confidence, and excitement over the idea of attending. The fact that both want me and that I’m actually getting to go is humbling, terrifying, and making me a little teary-eyed all at the same time.
I’ve decided I’ll be attending Carnegie Mellon University next fall. It wasn’t an easy choice by any stretch; blood, sweat and tears were shed over it, but I’m happy with my decision.
I’d like to take a moment to extend my sincerest thanks to any and everyone who has helped me get through what’s proving to be a long 4 years. It means the world to me.
In particular, I’ll never be able to thank those who have given me the chance to work at labs at the U of M and MIT enough. They’ve taken me a few steps closer to my dreams and I know that without them I’d have never been accepted.
Now of course, I have to make it 4 years at CMU :)

          
          
          comments powered by Disqus



Sieves in Haskell
Danny Gratzer — Tue, 03 Dec 2013 00:00:00 UT

    Posted on December  3, 2013
    


    
    Tags: haskell
    


The other day I was answering a question on StackOverflow and decided the solution was worth talking about.
It was particularly interesting because it illustrated something: Haskell makes a darn good imperative language.
What do I mean? This statement seems absurd, Haskell has no notion of state! How could it be used for imperative programming?
Well thanks to monads and do notation, we’re going to translate the following Python code to Haskell
    def sieve(n):
        nums = [True for _ in range(n)]
        nums[0] = False
        nums[1] = False
        for i in range(n):
            if nums[i]:
               for mul in range(i*2, n, i):
                   nums[mul] = False
        return [i for i, v in enumerate(nums) if v]
This is the sieve of Eratosthenes, it works like this

Write out the nums 0 - n
Cross off 0 and 1
For the next not crossed off number, cross of its multiples
Repeat 3 until the end of the list
The remaining numbers are primes

So in Python, we write this with a mutable list, nums. Then we just loop through each index and cross of as we go along! It’s a pretty straightforward translation.
Now we could write this in pure Haskell,
    sieve n = go 2 $ False : False : replicate (n-2) True
      where go = ...
But this is incredibly awkward and inefficient, since updating an element takes both linear time and space. We could opt for a clever solution
    primes = go [2..]
      where go (p:rest) = p : [r | r <- rest, r `mod` p /= 0]
    sieve = flip takeWhile primes . (<)
But this is still very, very inefficient. And in fact, it’s not even a sieve! it’s called trial division.
So, what’s a functional programmer to do.. Well, let’s start by cheating!
The ST Monad
    import Data.Vector.Unboxed hiding (forM_)
    import Data.Vector.Unboxed.Mutable
    import Control.Monad.ST (runST)
    import Control.Monad (forM_, when)
    import Prelude hiding (read)
This laundry list of imports gives us access to something pretty cool: mutable state in Haskell. ST is short for State Thread and acts as a safe wrapper around IO. It’ll let us imperative code, mutate things, but then force us to present a pure interface!
The purity trick is actually quite clever, it’s done by providing an escape hatch out of ST with the type
    runST :: (forall s. ST s a) -> a
So we can only escape ST when this phantom s is universally quantified. This forces you handle the s, the state, opaquely which prevents us from doing anything unsafe.
Next we can use Data.Vector.Unboxed.Mutable to create unboxed mutable vectors.
    new :: Int -> ST s (MVector s a)
And now all of this leads to
    sieve :: Int -> Vector Bool
    sieve n = runST $ do
      vec <- new (n + 1) -- Create the mutable vector
      set vec True       -- Set all the elements to True
      write vec 1 False  -- One isn't a prime
      forM_ [2..n] $ \ i -> do -- Loop for i from 2 to n
        val <- read vec i -- read the value at i
        when val $ -- if the value is true, set all its multiples to false
          forM_ [2*i, 3*i .. n] $ \j -> write vec j False
      freeze vec -- return the immutable vector
Using this we can easily sum the first 10k primes (project Euler 10)
    main = print . ifoldl' summer 0 $ sieve 2000000
      where summer s i b = if b then i + s else s
So there you have the “cheater” way. But darn it it’s fast, that sums the first 10,000 primes in around 0.2 seconds.
In my next post, I’ll explain a lazy functional way to do this to produce an infinite sieve.

          
          
          comments powered by Disqus



Sports for Geeks
Danny Gratzer — Fri, 29 Nov 2013 00:00:00 UT

    Posted on November 29, 2013
    


    
    Tags: opinions
    


Over the last few months, I have been acquiring various tidbits of knowledge to do with sports. This has been essential since my girlfriend likes football.
Here is a brief summary of what I’ve learned:
Danny’s Neanderthal-Level Guide to Sports

Football players don’t wear “tights”
It’s never OK to say “This is pointless”
Touchdowns are good
Flags are bad
Unless it’s against the other team
The ref is dumb. Always
Penalties are bad
Unless it’s against the other team
The other team is barely human
It is acceptable to dance upon completing a “touch-down”
Field goals are good
Smiling is bad
Upon fumbling, a player is dead to us
It is acceptable to shoot the quarter back following an interception
Make-up for fans is OK if it’s the teams colors
Beer hats are ingenious precisely within a stadium
Shirts are always optional in a stadium. Even in November. Or when you’re overweight.
the-team-we-don’t-like (TTWDL) will never win. Ever.
TTWDL is synonymous with devil.
People who support TTWDL are the devil’s children
the-team-we-like (TTWL) are meant to win. Any other outcome is not possible
The quarter-back is always hot
An injury on the TTWL is unspeakably horrible
If you yell at the TV, they can hear you
Some teams just need to be put out of their misery early in the season
It is never OK to say “it’s just a game”
Football players are not just big kids who never grew up
If someone is “first pick” then they are good and we want then on our team
“Our team” is an appropriate saying for a team that we like even though we are in no way associated with that team
Fantasy football isn’t silly or at all like Dungeons and Dragons and other pretend games

This is all I got so far.

          
          
          comments powered by Disqus



Fixpoints and Iso-recursive Types
Danny Gratzer — Sat, 09 Nov 2013 00:00:00 UT

    Posted on November  9, 2013
    


    
    Tags: haskell, types
    


Let’s imagine a world where Haskell didn’t have recursive functions. The Haskell committee simply left them out by mistake. Could Haskell still be Turing complete?
Well we need some method of arbitrary recursion, let’s try what we’d do in lambda calculus: fixpoints. In particular, we want a function like this
    fix :: (a -> a) -> a
What are fixpoints?
So we pass in a function f, and it will return to us a value a so that f a = a. Why is this useful? Imagine we encode recursive functions like this
    factorial :: (Int -> Int)
              -> Int
              -> Int
    factorial self 0 = 1
    factorial self n = n * self (n-1)
So we pass along recursion through this extra function. Well here factorial is really a function of type
    factorial :: (Int -> Int) -> (Int -> Int)
And we want to fill it like this
    completeFactorial n = factorial (factorial (factorial (factorial ...))) n
Now when we take the fixpoint, we find an a where factorial a = a. This means that
factorial a = factorial (factorial a) = factorial (factorial (factorial ...)))
So with fixpoints, we get that infinitely long chain that we wanted.
Fixpoints with recursion
If we allowed ourselves recursion for a moment then fix is easy
fix f = let x = f x in x
This looks silly, but in fact, if f isn’t strict in x, than x can be non-bottom. And if f is strict than by definition, f ⊥ = ⊥ so ⊥ is a fixpoint.
Fixpoints without recursion
Now we can actually still write a fixpoint-finding function without recursion. The most famous one is the y-combinator. In lambda calculus, we’d write this as
Y = λf . (λx . f (x x)) (λx . f (x x))
And we want to show that Y f = f (Y f)
Y f = f (Y f)
(λx . f (x x)) (λx . f (x x)) = f ((λx . f (x x)) (λx . f (x x)))
f((λx . f (x x)) (λx . f (x x))) = f ((λx . f (x x)) (λx . f (x x)))
With simple beta reduction, they’re equal.
Fixpoints in Haskell
Now in Haskell, this doesn’t work. We could try
    fix f = (\x -> f (x x)) (\x -> f (x x))
But what is x’s type? Well it’s a function so
x :: a -> b
And x’s first argument is x, so
    type T = T -> b
    type T = μR. R -> b -- This means the same as the above
    x :: T
But this isn’t legal! We can’t have infinite types like that. Don’t despair though, we’re going to use the magic of iso-recursive types.
What are iso-recursive types? Well they’re like (equi-)recursive types, but they provide two operations, fold and unfold.
    unfold :: μX. T -> [μX. T/X]T
    fold   :: [μX. T/T]T -> μX. T
Where [foo/bar]baz means, "substitute all occurrences of bar with foo in baz. When trying to unify iso-recursive types, we don’t consider a type equal to an unfolding of that type. This makes type inference considerably easier since we’re requiring the user to explicitly fold and unfold types.
We can write these in Haskell
    newtype Mu f = Mu {unMu :: f (Mu f)}
    unfold = unMu
    fold   = Mu
Now we can write the type of x
    newtype X' b a = {unX :: a -> b}
    type X a = Mu (X' a)
Take a moment to think about this, mentally unfolding we have
X a
Mu (X' a)
Mu X' -> a
(Mu X' -> a) -> a
...
There we go! Now for that y combinator
    unfold' = unX  . unfold
    fold'   = fold . X'
    y f = (\x -> f (unfold' x x)) $ fold' (\x -> f (unfold' x x))
and finally
    fix = y
And to test it
    f = fix factorial -- From the way before
    main = mapM_ (print . f) [1..5]
prints
1
2
6
24
120
Which means we’re successful, we’ve added back recursion to Haskell!

          
          
          comments powered by Disqus



Learn You Some Category Theory
Danny Gratzer — Tue, 22 Oct 2013 00:00:00 UT

    Posted on October 22, 2013
    


    
    Tags: math
    


Introduction
Hi!
This is the first in a series of posts on basic category theory in Haskell. they require no knowledge of category theory, but I expect the reader to be familiar with Haskell. I will make various parallels to set theory, but they can be skipped in favor of the corresponding Haskell code.
Categories
What is a category?
A category is a collection of 2 things: objects and arrows. We leave the idea of an object abstract, it’s just a “thing”. It varies from category to category, in some it will be sets, in some monoids, whatever, but it’s the core “building block” of that category.
Arrows are different “things” that go from objects to object, we’d write them like f : A -> B to mean that f is an arrow from object A to object B.
Think of this like a directed graph, objects are nodes and arrows are lines. In fact as we go, we’ll talk about parts of categories just like this, as diagrams.
Let’s do some examples, let’s say objects are sets. So
A = {1, 2}
B = {3, 4}
Then arrows would be functions from set to set.
f 1 = 2
f 2 = 4
Another example: Haskell. Objects would be types and arrows would be functions
    type A = Int
    type B = Int

    f :: A -> B
    f x = x + 1
Now for some group of objects and arrows to be a category, a few conditions have to hold

There must be an identity arrow for each object
idA : A -> A
idB : B -> B
There must be an operation to compose arrows, I use . for this
f : A -> B
g : B -> C
g . f : A -> C
Identities and composition have to play nice,
f . id = f
id . f = f

And viola! If we can show these few things, it’s a category!
Examples
First let’s do the category of sets, which we call Set,

Objects are sets
Arrows are functions from set to set
Identity arrows are just identity functions
Arrow composition is function composition
It’s trivial To see that our identity arrows satisfy these conditions

Next is Haskell, this category is called Hask.

Objects are types
Arrows are functions
The identity arrows are all given by id
Arrow composition is just .
We know that id . f = f . id = f in Haskell

There you have it, our start into the wide world of category theory.

          
          
          comments powered by Disqus



Representable Functors
Danny Gratzer — Mon, 21 Oct 2013 00:00:00 UT

    Posted on October 21, 2013
    


    
    Tags: haskell, math
    


Representable functors are a powerful tool in category theory. As it turns out, they’re pretty useful in Haskell as well. Here’s a few examples of what they are and how to use them
First, some definition. We’re interested in Hask which the category where objects are types and arrows are functions. A representable functor for us, is a special functor from Hask -> Hask (an endofunctor). Now when we apply category theory to Haskell, we also pretend Hask is Set (The category of sets). There’s a type of functor called a hom-functor. It’s a functor that looks like this
    newtype Hom a = (->) a
    -- Hom a b = a -> b
Now, a Hom implements Functor like this
    instance Functor Hom where
        fmap f hom = f . hom
In other words, Hom a takes an object b to the set of all morphisms a -> b. It takes an arrow b -> c to the function Hom a b -> Hom a c using composition. Nothing stunning yet.
Now consider some arbitrary functor F. Suppose there exists an object a so that F is isomorphic to Hom a. What would this look like?
    type family Obj (f :: * -> *) :: *
    class Functor f => HomIso f where
      toHom :: f a -> Hom (Obj f) a
      toF   :: Hom (Obj f) a -> f a
And we have the laws that
    toHom . toF = id
    toF . toHom = id
Then f is a representable functor. From now on, I will refer to HomIso as Repr to emphasize this. The simplest representable functor is of course Hom a.
Let’s notice some useful properties of representable functors.
    lookup :: Repr f => f a -> Obj f -> a
    lookup = toHom
Our functor can look things up! Cool! Let’s use this idea to guide us to finding some simple representable functors. Let’s look at a trivial case
    newtype Identity a = Identity {runIdentity :: a}
                       deriving(Eq, Show, Functor)

    newtype Unit = Unit
    type instance Obj Identity = Unit

    instance Repr Identity where
      toHom (Identity a) = const a
      toF f = Identity $ f Unit
Since Identity has only one value, Unit indexes it exactly. A more complicated example
    data Prod a = Prod a a
                deriving(Eq, Show, Functor)

    data Two = InL | InR
    type instance Obj Prod = Two

    instance Repr Prod where
      toHom (Prod a _) InL  = a
      toHom (Prod _ a) InR = b
      toF hom = Prod (hom InL) (hom InR)
This is all quite well, but what about infinite data structures? This is Haskell! we want those too.
    data Forever a = Cons a (Forever a)
                   deriving (Functor)

    data Nat = Z | S Nat
    type instance Obj Forever = Nat

    instance Repr Forever where
      toHom (Cons a as) Z = a
      toHom (Cons a as) (S n) = toHom as n

      toF f = cs z
        where cs n = Cons (f n) (cs (S n))
Since Forever goes, well, forever. It can be keyed with natural numbers, which we represent here with Nat. Then toHom is classic recursion and toF is classic co-recursion.
There are tons more of these, but hopefully now you’re getting the idea. Here’s another cool thought
    switch :: (Repr f, Functor g) => g (f a) -> f (g a)
    switch g = toF $ \obj -> fmap ($ obj) hom
      where hom = fmap toHom g
Wait a moment, what if f and g where both Repr instances? Then
    switch . switch = id
Neat! We can use representable functors to switch around functors.
Now, what about applicatives, can we use a representative functor to build one?
    -- To keep type classes from getting confused
    newtype Wrap f a = Wrap {unWrap :: f a}

    instance (Repr f) => Applicative (Wrap f) where
      pure    = toF . const
      f <*> a = toF $ \obj -> toHom f obj $ toHom a obj
So we can actually build out applicatives from a representable functor. How about monads?
    instance (Repr f) => Monad (Wrap f) where
      return = toF . const
      m >>= f = toF $ \obj -> ($obj) . toHom . f $ toHom m obj
Notice how these are working? The functor and monad are defined “pointwise”. Basically we’re applying each function at a “point” in our functor’s underlying structure and then peeking at the result at that point.
If we translate this into Forever functor, our applicative instance would correspond to taking a stream of functions, and zipping it with a stream of values. Our monad instance would do the same, and select the point in the same position in the resulting list. That’s why we often refer to these as zippy monads.
Well hopefully I’ve convinced you that representable functors are interesting, remember, we were able to build all of this from a simple isomorphism with Hom. Cool right?

          
          
          comments powered by Disqus



Please Don't Learn Category Theory to Learn Haskell
Danny Gratzer — Mon, 14 Oct 2013 00:00:00 UT

    Posted on October 14, 2013
    


    
    Tags: haskell, opinions
    


I spend a lot of time talking about Haskell. Trying to get imperative programmers to learn Haskell is hard, people are stubborn and often have misconceptions about functional programming.
One of the most common excuses I hear is

I don’t know enough math to learn Haskell, don’t you need to know category theory?

No! In fact, please don’t go learn category theory to learn Haskell. Why not?
The Haskell standard library makes shallow use of quite deep and complex ideas. Yes it uses the words Monad, Functor, and Category. But you don’t need to have any idea what these words mean to use them. In fact, try substituting
 Monad    -> FuzzyWuzzy
 Functor  -> Banana
 Category -> Cheerios
And you’ll still be able to learn/use Haskell just fine. In fact, there’s a lovely series of problems that do just this.

But wait, why did they even bother with the names then?

Because that’s where the idea comes from. The abstractions in Haskell are sometimes inspired by math. There’s no argument there. And the people who designed Haskell decided they weren’t going to pretend they didn’t use math. But just like how Ruby was inspired by Lisp and Smalltalk, you don’t need to learn the source of some abstraction to enjoy it.
In fact, if you want to see a bunch of category theory inspired abstractions, check out some of Edward Kmett’s libraries. They’re just littered with big-n-scary words. However, you can still use lens (a super useful library) without understanding what it means to “downstar a functor into a profunctor”. And tons do.

Is there any point in learning category theory then?

I’d say so, it’s a cool piece of mathematics to start with. And who knows, plenty of people find it useful to look to category theory for reasoning about abstractions.
I decided to pick up a few books in June and have been thoroughly enjoying it. Not because it suddenly made me better at Haskell but because I like math and it provides a good language for talking about some concepts.
Who knows, maybe you’re a budding category theorist.

What do I need to know before Haskell then?

Well um, not much. In fact, the less you know the better! I came to Haskell from primarily imperative languages (C, Perl, Java) and the biggest problem I had was trying not to write Perl in Haskell.
The only thing I found terribly helpful was already understanding what a type was. And I think that can be picked up pretty quickly.
Good luck :)

          
          
          comments powered by Disqus



forall Means All!
Danny Gratzer — Sun, 06 Oct 2013 00:00:00 UT

    Posted on October  6, 2013
    


    
    Tags: haskell, types
    


It seems like the every week on Stack Overflow there’s at least two questions about higher rank polymorphism (RankNTypes). So here’s a brief description of what they are and how to use them.
First, to turn them on
    {-# LANGUAGE RankNTypes #-}
Now to write one
     demo :: (forall a. a -> Int) -> Int
     demo f = f 'a' + f True
Now that forall means “this function is polymorphic and can be applied to any argument”. Notice we can’t do this without rank N types,
    uhoh :: (a -> Int) -> Int
    uhoh f = f 'a' + f True -- Error cannot unify 'a' with Char
Here’s how not to use it
    demo id
id here unifies with Int -> Int which isn’t the necessary forall a. a -> Int. To use it,
    demo (const 1)
Now this seems pretty clear right? It’s just to make it possible to pass polymorphic functions to other functions. Easy peasy :)
Now what does this mean?
    data Tricky = Tricky (forall x. x -> x)
Well you’d be right if you realized that the only sane instantiation of this is
    t = Tricky id
We need a function
    arg :: a -> a
It’s pretty clear that the only sane version of arg is id.
A harder one,
    type ReallyTricky a b = forall f. Functor f => (a -> b) -> f a -> f b
That’s right it just means that anything of type ReallyTricky knows how to take some arbitrary function, and lift it into any functor. And the caller gets to choose which one.
    t :: ReallyTricky
    t = fmap
That’s it! Just remember that forall is universal quantification. This means that you have to be able to support all possible instationations of that variable and the caller will choose which one.
Now suppose you want it the other way, you choose the instantiation and the caller has to handle it generically. Then you want existential quantification. A subject for another post.

          
          
          comments powered by Disqus



Naive Map Isn't So Naive
Danny Gratzer — Tue, 01 Oct 2013 00:00:00 UT

    Posted on October  1, 2013
    


    
    Tags: haskell
    


One of the most beloved functions in functional programming languages is map. It can be defined like this:
    map :: (a -> b) -> [a] -> [b]
    map f (x:xs) = f x : map f xs
    map _ []     = []
However in a lot of languages, writing map like this is a no-no. It’s not tail recursive! For example in OCaml
    # let rec my_map f = function
       | [] -> []
       | x :: xs -> f x :: my_map f xs

    # my_map ((+) 1) list_from_0_to_5000000
    ... Wait a bit ...
    Error: Stackoverflow
Urk! That’s annoying. The problem is that we have to make a recursive function call that can’t be compiled down to a loop (tail recursion). Well, let’s look at how map is defined in Haskell to avoid this problem
    -- In Base
    map :: (a -> b) -> [a] -> [b]
    map f (x:xs) = f x : map f xs
    map _ []     = []
Wait, isn’t this bad? We just saw how this isn’t tail recursive!
The thing is, in Haskell things are lazy. map (+1) [1..10000] returns a thunk. Inside that thunk is something like this
    (:) 1 {thunk to get rest of list}
So this takes constant space! After all, none of those extra stack frames are used because : doesn’t evaluate its arguments. This means that a lot of not tail recursive functions in Haskell still take constant space, however, you have to be careful about consuming the results.
Take for example sum.
    sum :: [Integer] -> Integer
    sum (x:xs) = x + sum xs
    sum []     = 0
Now this also isn’t tail recursive, but there’s a problem: + is strict. By this I mean that to evaluate a+b you must first evaluate a and then b. This means that to evaluate x + sum xs we have to evaluate sum xs.
Urk, now we’re building up a big pile of expressions, something like
    a + sum (b:c:d:e:[])
    a + (b + sum (c:d:e:[]))
    a + (b + (c + sum (d:e:[])))
    a + (b + (c + (d + sum (e:[]))))
    a + (b + (c + (d + (e + 0))))
Now we can see why this will blow up, it’s building up a huge expression before we can evaluate anything. Hi stack overflow.
Now this is when we do want the tail recursive function like foldl' to keep things in constant space.
Conclusion
When you construct something with : for example, it’s possible to evaluate the head without evaluating the tail. Similarly with most constructors. With things like this, it’s possible to keep naive recursion in constant space.

          
          
          comments powered by Disqus



Logic and Continuations
Danny Gratzer — Thu, 12 Sep 2013 00:00:00 UT

    Posted on September 12, 2013
    


    
    Tags: haskell
    


In Haskell there’s a monad known as Cont. Most people don’t use it, but it’s pretty cool. The basic idea is that
     Cont r a = Cont {runCont :: (a -> r) -> r}
The intuition being that that (a -> r) term is the “rest of the program”. You feed it the type it is expecting and it will happily run the rest of your computation.
Deriving Monad (Cont r)
return is easy
    return a = Cont ($a)
or
    return a = Cont $ \c -> c a
Bind is a little trickier
    (Cont c) >>= f = Cont (\rest -> c $ \a -> runCont (f a) rest)
Think of this as feeding the continuation c :: (a -> r) -> r a function made by with a continuation runCont . f :: a -> (b -> r) -> r and rest :: (b -> r) to make something of type a -> r.
Yeah it hurts your head a little.
Now let’s talk about seeing into the future.
Back to the Future
First things first, to conform with how the MTL does stuff
     cont = Cont
Because in the real Control.Monad.Cont, there’s a monad transformer and a type synonym
     type Cont r a = ContT r Identity a
or something like that.
Let’s try a simple application of Cont. Suppose we have a function of type [a -> Bool] -> [a] -> [a] and we want to return the longest list of as that satisfies one of the predicates in our list. Yeah it’s contrived but I’ll show you a more realistic example in a second:
     longest preds as = do
       p <- selectPred preds
       return (filter p as)
Now we just need to define selectPred so that it can “know” which predicate will return the longest list
    selectPred ps = cont $ \c -> maxBy length . map c $ ps
That’s it. It’s actually running the program with each possible value and then returns the best result. Cool right?
Note that it’s kind of important that things are pure here. If you have an unsafePerformIO and you start backtracking things get hairy. However, since you can toss around IOs without evaluating them, eg
     const 1 (print "foo")
Doesn’t print foo or anything, you can have ContT layered over IO.
Logic Framework
Now let’s use this to create a simple framework for non-deterministic logic programming. Some skeleton code:
    {-# LANGUAGE RankNTypes #-}
    newtype Logic a = Logic {runLogic :: forall r.  Cont (Maybe r) a}
    instance Monad Logic where
      return a = Logic $ return a
      (Logic c) >>= f  = Logic $ c >>= runLogic . f
The RankNTypes basically says a logical computation can’t make assumptions about the result, which is pretty reasonable. The monad instance is just relying on the underlying Cont instance. Now we want three functions:
    amb :: [a] -> Logic a
    disconj :: Logic a -> Logic a -> Logic a
    backtrack :: Logic ()
where backtrack backtracks to the nearest amb and tries the next element and disconj simply joins together two propositions and chooses an element from one that won’t fail (return Nothing).
    backtrack = Logic (cont $ const Nothing)
    disconj (Logic a) (Logic b) = Logic (cont $ \c -> runCont a c `mplus` runCont b c)
    amb as = Logic (cont $ \c -> join . find isJust . map (c$) $ as)
Note: with RankNTypes weird things can happen with . for example, using Logic . cont is ill typed presumably because GHC restricts the Cont being fed to Logic.
Now these actually map nicely to an existing typeclass.
     instance MonadPlus Logic where
       mzero = backtrack
       mplus = disconj
Also helpful is
    evaluate :: Logic a -> Maybe a
    evaluate = flip runCont Just . runLogic
So let’s try it out:
     main = print . evaluate $ do
       a <- amb [1, 2, 3] :: Logic Integer
       b <- amb [4, 5, 6]
       when (a + b /= 9) mzero
       return (a, b)
and perhaps a helpful combinator
     assert = flip when mzero
makes
     main = print . evaluate $ do
       a <- amb "floor"
       b <- amb "bar"
       assert (a==b)
       return (a, b)
And there you have it, using continuations we have created a logic DSL. The interesting bit is that each amb is actually running the code multiple times and “seeing into the future”. Once it has which element will actually return a desirable result it pops it back. Nifty.
An exercise to the reader
A useful exercise is to add 1 of 2 combinators
    cut :: Logic ()
    interleave :: Logic a -> Logic b -> Logic (a, b)
cut doesn’t backtrack. To implement this, you’ll have to use callCC and pass the escape continuation around and call it from cut if something tries to backtrack past it.
interleave is also cool, it’s fair disjunction. Our current setup can’t handle
     a <- amb [1, 2, 3, 4, 5]
     b <- amb [1..]
     assert ( a == 2 )
It’ll get stuck fiddling with the value of b! With interleave we’d type
     (a, b) <- interleave (amb [1, 2, 3]) (amb [1..])
and it will give us pairs “fairly” by returning pairs so that the probability of (n, m) being returned when n is a elements into the first computation and m is b elements into the second is k / (a + b).
Good luck!

          
          
          comments powered by Disqus



Teens and Functional Programming
Danny Gratzer — Sun, 08 Sep 2013 00:00:00 UT

    Posted on September  8, 2013
    


    
    Tags: opinions
    


As a teenager who spends most of his time programming, I spend a lot of time interacting/teaching other teens interested in programming.
One thing that’s always bothered me is the lack of fellow teenage functional programmers. I’ve been wondering whether this is simply chance? Or is there something that makes functional programming bad for beginners? Especially teenage ones.
Well first let’s formally define what I mean by a functional language, the definition is always a bit fuzzy. When I refer to a functional language, I mean a language that:

Has primarily immutable data (Convention or enforced)
Functions as first class values
Functions are primarily “pure functions”

Notice that I left out

Type Systems
Algebraic data types + Pattern matching
Purity

Immutability
Perhaps immutability is the problem. It’s certainly weird to experienced imperative programmers. But what about for beginners?
I’d argue immutability is actually pretty natural for a new programmer. It means variables are just names for data. No more of that confusing “a variable is like a box that you can put values in” explanation. Especially once you get into subtleties like indirection mutation looks less appealing.
In the first course for computer science students at the University of Minnesota, there is a whole quiz filled with problems like this in Python:
    # What is the output of
    a = [1, 2, 3]
    b = a[1:]
    b[1] = 4
    print a
Once you have a whole quiz devoted to a topic, it’s safe to say that it’s confusing.
This class also taught Scheme, when it came time to explain let, the professor simply said

This is let.
(let ((a 1))
    (+ 1 a))
Just substitute a for 1 within the parens.

and that was that.
I’d posit that the reason is that with immutability you can simply substitute a name for its value and have unchanged semantics. With mutable variables, a variable is more than just it’s value.
Now that’s not to say immutability doesn’t get weird eventually, purely functional data structures do sometimes require some mental gymnastics, but there is certainly not more mental overhead than with imperative programming’s pervasive mutation.
First Class Functions
I find it hard to believe that first class functions are the problem since they’re not that unique anymore. Python, Ruby, and JavaScript all have them and they’re hugely popular with beginners.
The other thing is that you can largely ignore them until you need them. No one will make you use map, you could write the stupid repetitive recursion out every time. Additionally, many languages provide some sort of construct to let you avoid map, filter, or whatever. For example, in Haskell
    filter even . map (+1) $ [1..10]
    [x + 1 | x <- [1..10], even x]
Perfect for a beginner. In fact, Python stole these for precisely this reason.
Finally, they actually alleviate a lot of complexity. Look at anonymous classes and the strategy pattern. The whole thing is a very large, ugly hack for dealing with the lack of first class functions.
Pure Functions
This one isn’t too hard to argue. If you have some function f, if you call it
a = f(1);
b = f(1);
You’d really expect it to give you back the same thing twice. This has what people have come to expect from math. It’s much the same argument that I made for immutability, with a pure function there’s all sorts of nice assumptions you can make about it, once again a function application is just becomes a name for the resulting value. This is different than a function call in python where the actual computation is important because it has side effects.
I said “primarily pure” because some things are really impure, the classic example being readLine.
    print "Enter your age:";
    age = readLine();
    print "Enter your height:";
    height = readLine();
Now if we’d hope that age and height contain different values. In Haskell we have monads for this, but those are notoriously hard to understand. Instead, pragmatic impurity is probably the best course for a beginner’s language. Much like Scheme.
But Objects!
Now I know that someone is thinking, “But object orientation!!”. To them I say, you’re right: for some set of problems object orientation is a better model for a problem. But it’s a far smaller set of problems than you’d think.
It’s important for a beginner to be exposed to multiple paradigms, but I see no reason why the first paradigm shouldn’t be functional. In fact, I’ve outlined several reasons why it should be functional.

But Which One?
So if you’re some random teen about to start functional programming, which language would you choose?
Some of the more popular languages that fit my definition of functional:

Scheme/Racket
Clojure
Haskell
Erlang
Scala
SML
OCaml
F#

Notice that most of the common “pseudo-functional” languages (JavaScript, Ruby, etc) fail the first constraint of immutability. I also chose to leave off some of the more research oriented languages (Coq, Agda, etc) because we’re talking about beginners here.
Now for a language to be good for a beginner it has to

Not have an overly complicated type system
Have an implementation with good error messages
Have lots of libraries, particularly web/game frameworks
A batteries included standard library to get up and running with
Have good community support for newcomers

Haskell
We certainly can agree that Haskell is not a beginner’s language. Now Haskell is the language I write 99% of my code in and I’m saying this.
The type system, laziness, and monadic IO concept are all very daunting to a beginner. It doesn’t help that some of the error messages GHC produces are well… opaque. It’s not a bad language, but it’s not one that I would suggest for starting with.
SML/OCaml
Now SML and OCaml are both fine languages. But they don’t have the infrastructure to support a lot of teenage programmers.
They’re missing the game/GUI/web frameworks. They’re missing the tools. They’re missing the libraries. It’s just not there.
Other than that though, I see no reason why OCaml in particular wouldn’t make quite a reasonable language to start with. Pragmatically functional, reasonable type system, and strict semantics.
But the core language isn’t enough to make it a good first choice.
Scala
Scala, like Haskell, is a very nice language that definitely fails the simplicity criteria. The type system is just as sophisticated as Haskell’s but with worse inference you have to be more explicit.
Again, it’s just not a language for a beginner from what I’ve seen.
Though with access to the java ecosystem and a very active community it nicely passes every other criteria.
Erlang
Erlang may be a nice language for a beginner. Once again I’ll admit ignorance and leave it to someone else to comment on it’s suitability.
I suspect that the focus on concurrency will make it a bit less intuitive to a beginner who doesn’t have any interest in those issues.
Racket/Scheme
Scheme on its own just fails the library support. It’s good for a classroom but for the demanding teenage hacker, the lack of game/gui frameworks is just killer.
Racket is a different story. Racket actually has a good set of libraries for GUI’s and games. It’s simple enough for a beginner. It has a good community for beginners, being made by educators.
In fact, the only problem I see with Racket vs Python is just the lack of hype. There isn’t the same marketing going on for Racket as Python, but there certainly could be.
Racket is what I recommend to people who are interested in starting with functional programming.
Clojure
I think Clojure is another strong choice. It’s just as simple as Racket and a much better community behind it.
It’s got all of Java’s libraries for games and several web frameworks of it’s own. In particular I’d love to see someone write a really slick DSL for minecraft in Clojure. It would be a great hook to say “Hey, come learn Clojure because it makes writing minecraft mods trivial”.
Maybe in the future I’ll start recommending Clojure instead of Racket. It looks like it’s got a brighter future with the much stronger community drive behind it.

Conclusion
So where does this leave us? Well I think the answer to the original question is clear. Functional programming isn’t the problem, functional languages are the problem.
There isn’t a clear analog to something like Python or Ruby in the functional programming world. I think it’s a legitimate niche for a language to try to fill too.
Perhaps I’ve overlooked something, but right now I think functional programming has got a ways to go making it more accessible to beginners. And I think it’s definitely worth the effort since a lot of the marketing points of FP, easier reasoning, simplified concepts, etc are all excellent for beginners.

          
          
          comments powered by Disqus



Leaving Go
Danny Gratzer — Fri, 23 Aug 2013 00:00:00 UT

    Posted on August 23, 2013
    


    
    Tags: go, opinions
    


I’ve been using Go since November and I’ve decided that it’s time to give it up for my hobby projects. I’d still be happy to use it professionally, but I find that programming in Go isn’t “fun” in the same way that Python, Haskell, or Lisp is.
Go, The Good
The best part about Go isn’t actually Go. The community and infrastructure around it are excellent. The command line go tool really is nice.
By far my favorite part is go get. Package management is something that many a community has failed to address but Go seems to have handled it nicely.
This isn’t shocking I suppose. Go was definitely made by engineers to solve a very real world problem. I haven’t used Go for a project with 10 or 20 people but I suspect it would scale wonderfully.
On the squishier side, Go’s community is reasonably friendly. No newbies got their heads bitten off as far as I could see.
Go, The Not So Good
While the community for Go is great, the language is ehhh. Unfortunately, when I’m working on hobby projects, this is 80% of my concern. VB has good support, but I’m not hacking it.
The two main issues I have with Go are

The Type System
Extensibility

The Type System
Go’s type system is well… lacking as it stands right now. The main problem is that Go provides no safe system for polymorphism.
I’ll give you a trivial example, define a generic absolute value function in Go.
     func abs(x ???) ???{
         ???
    }
Now what are those ??? supposed to be? Well, we have no notion of parametric polymorphism so our only choice is subtyping polymorphism.
     func abs(x interface{}) interface{} {
         ???
    }
So now that we’ve just taken all our lovely, optimization friendly type information and thrown it away, let’s manually get it back!
    type Top interface{}
    func abs(x Top) Top {
        switch x.(type){
	        case int32:
    		    if x.(int32) < 0 {
			        return -x.(int32)
		        } else {
    			    return x.(int32)
		        }
	        case int64:
        		if x.(int64) < 0 {
			        return -x.(int64)
		        } else {
        			return x.(int64)
		        }
	        case float32:
        		if x.(float32) < 0 {
			        return -x.(float32)
		        } else {
        			return x.(float32)
		        }
	        case float64:
        		if x.(float64) < 0 {
			        return -x.(float32)
		        } else {
                            return x.(float64)
		        }
            }
        return nil
    }
Holy boilerplate batman! And using this means we are forced to stick a cast right in the middle of our perfectly safe code.
By the way, there’s an error in the above code? Did you catch it? It’s tricky because with all this code duplication you tend to just skim over the boilerplate and miss the nasty runtime errors.
A type system that regularly requires casts is just gross, it’s a sign that the type system isn’t expressive enough to describe a problem.
What would happen if we wrote this in Haskell?
    abs :: Num a => a -> a
    abs a = if a < 0 then -a else a
See the difference? And the Haskell version is extensible and cast free, it’ll work for any user defined types.
Now let’s be fair to Go, we can try this
     type Abser interface{
         func Negate() Abser
         func LtZero() Abser
     }
     func Abs2(x Abser) Abser{
         if x.LtZero() {
             return x.Negate()
         }
         return x
     }
But this still isn’t close to Haskell’s version for several reasons, the biggest one for me is that this version takes in some Abser and returns some Abser. Are those the same underlying implementations? Who knows!
So we still have an unsafe cast in there just to use it because we have no way of statically verifying that we’re getting the same underlying type back.
This kills any chance of safely composing functions that take in different interfaces, for example, if we had a function over int32s, we couldn’t do someFunc(abs(x)) because we’d have to stick our cast in there, someFunc(abs(x).(int32)). Now we’re just asking for trouble when there’s some error in the function that leads to a casting failure.
Doing this safely in Go looks like this,
    newX, err := abs(x).(int32)
    if err != nil {
        fmt.Println("Darn it!")
        // Handle errors
    }
    someFunc(newX)
Now if that doesn’t grind on you I really don’t know what would.
I don’t mind dynamic typing and the possibility of runtime errors, Python is fun to program in just like Haskell. But Go is imposing all the pain of static typing with pretty much none of the benefits.
The response of the Go community is “Abs is a 2 line function, just do it inline or per type” to which I respond: I want to define a generic algorithm, or datastructure, or really anything reasonably complex!
When I started Go, I thought this was just me missing a few clever tricks for how to properly utilize Go, I’m not so sure anymore. The entire Go math library requires casts to float64s to use, using a stack in Go requires casts from interface{}.
Coming from Haskell and Coq, this is not something I should have to put up with in 2013.
Extensibility
Consider the keyword range. It’s a deeply magical keyword that only works inside for loops on Go’s primitive data structures.
I like writing compilers so I end up dealing a lot with trees. I often want to have range traverse my AST? Tough, ain’t gonna happen!
This is just one example of many

Only Go’s primitive types may by parameterized over other types
Only magical primitive functions may return 1 or 2 arguments depending on context
Only magical primitives have real parametric polymorphism
Only Go’s primitives may have infix operators
and on and on and on!

These are all hitting the same problem, Go is not extensible. There simply isn’t a way to define a type and expect it to be as pleasant to use as a slice.
This apparently doesn’t bother Go’s maintainers, presumably because they designed Go and deal with problems which slices, maps, and chans model beautifully. For the rest of the world, it’s a pain in the butt.
Guy Steele gave a wonderful talk about “growing a language”. The core idea was to start with a small but very extensible language and allow users to determine which features are added.
The idea is that there’s simply no way that any group of designers could imagine how people will want to use their language so making it easy to extend solves the problem wonderfully.
In Lisp, CLOS (Common Lisp Object System) was originally a library. It was a user defined abstraction that was so popular it was ported into the standard.
Go is just the opposite. Any user defined abstractions are painfully, obnoxiously obvious. Go developers seems to consider this a “Good Thing”. On one hand it does aid code readability. On the other, it really limits what Go’s pleasant to use for.
As a trivial case study. Imagine we wanted to use Go for some form of scientific computing. We’d need some sort of Bignum type because int64 ain’t gonna hack it. In Python or Haskell, here’s how you add 2 bignums,
a + b
Here’s how you do it in Go,
a.Add(b)
Ok, it’s only a few characters, big deal. Now what does this do?
b.Mul(b).Sub(big.NewInt(4).Mul(a).Mul(c))
Or in Haskell
b*b - 4 * a * c
Which would you rather write? More importantly, which would you rather read?
I can’t help but feel like Go was designed with only problems the designers were facing in mind. This is great for them, but calling Go a general purpose language should mean that it’s nice to use for other sorts of problems too.
The argument of “it makes the code hard to read” seems a bit odd to me. Bad devs write bad code, but that doesn’t mean you should make it hard for good ones to write clean, concise code.
Conclusion
I’m really sad to have written this actually. I wanted to like Go a lot. I wanted a fast, compiled replacement for stuff I write in C right now. But Go is not that language. Shame.
Thank you to the Go team for all the hard work on the project and best of luck.

          
          
          comments powered by Disqus



Blog Re-Init
Danny Gratzer — Fri, 23 Aug 2013 00:00:00 UT

    Posted on August 23, 2013
    


    
    Tags: meta
    


After a lot of hair-pulling and teeth grinding. I’ve decided to move my blog from Blogger to Bitbucket + Hakyll. Blogger is an excellent platform, but I missed a few things,

Easy vcs (I like hg)
Nice, painless editing
Easy customization

Using Bitbucket, it’s trivial to use mercurial on my blog. Now I have a full revision history of every edit I make to my posts. No more losing my last 4 paragraphs because my laptop died.
It also means that I can (and am!) typing this post in Emacs. Personally I like markdown so I can use Markdown and Flyspell modes for writing blog posts!
Finally, I really don’t enjoy the process of trying to customize Blogger. For example adding syntax highlighting for Haskell was an uphill battle from the start and 9 months of blogging later, I’m still not happy with the results. Adding the equivalent in Hakyll took me 10 minutes.
My only problems now are

Setting up comments here
Routing all links to previous posts back to blogger

Then I’m all set!

          
          
          comments powered by Disqus

Code and Co

Runtime Tagging

The Basic Idea

The Implementation

Wrap Up

Two Different Flavors of Type Theory

Formal Type Theory and Props-as-Types #1

Behavioural/Computational Type Theory and Props-as-Types #2

Building Proof Assistants

Wrap Up

Type is not in Type

Background on JonPRL

The Main Result

Wrap Up

Solving Recursive Equations

Basic Domain Theory

Solving Recursive Equations in Cpo

Wrap Up

Learn Type Theory

Reading Advice

The Resources

Textbooks

Proof Assistants

Type Theory

Proof Theory

Category Theory

Other Goodness

Coinduction in JonPRL for Low Low Prices

Math Stuff

The Code

The Clincher

Wrap Up

A Basic Tutorial on JonPRL

Getting JonPRL

The Different Languages in JonPRL

The Term Language

Tactics

Commands

What on earth did we just do!?

Killer features

Wrap Up

Proving Cut Admissibility in Twelf

Background

The Twelf Stuff

The Theorem

Wrap Up

Examining Hackage: pipes

Getting The Code

Pipes.Internal

Pipes.Core

Pipes

Wrap Up

Compiling a Lazy Language in 1,000 words

Parsing

Type Checking

Optimizations/Simplifications

Spineless, Tagless, and Generally Wimpy IL

Code Generation

Conclusion

A Proof of Church Rosser in Twelf

Proving Church-Rosser

The Main Theorem

Wrap Up

Bracket Abstraction: The Smallest PL You've Ever Seen

What is SK Combinator Calculus?

Bracket Abstraction

Wrap Up

Compiling With CPS

What is CPS

STLC to CPS

Wrap Up

SML for Haskellers

What Do They Have in Common

What Is SML Missing

What Is Haskell Missing (in Comparison)

Wrap Up

Value vs Monomorphism Restriction

The Value Restriction

The Monomorphism Restriction

Wrap Up

Solving Recursive Equations in `Cpo`