Treating Programs like Vending Machines

Posted on December 19, 2014

Tags: haskell, types

Proving things about programs is quite hard. In order to make it simpler, we often lie a bit. We do this quite a lot in Haskell when we say things like “assuming everything terminates” or “for all sane values”. Most of the time, this is alright. We sometimes need to leave the universe of terminating things, though, whenever we want to prove things about streams or other infinite structures.

In fact, once we step into the wondrous world of the infinite we can’t rely on structural induction anymore. Remember that induction relies on the “well foundedness” of the thing we’re inducting upon, meaning that there can only be finitely many things smaller than any object. However there is an infinite descending chain of things smaller than an infinite structure! For some intuition here, something like foldr (which behaves just like structural induction) may not terminate on an infinite list.

This is quite a serious issue since induction was one of our few solid tools for proof. We can replace it though with a nifty trick called coinduction which gives rise to a useful notion of equality with bisimulation.

Vending Machines

Before we get to proving programs correct, let’s start with proving something simpler. The equivalence of two simple machines. These machines (A and B) have 3 buttons. Each time we push a button the machine reconfigures itself. A nice real world example of such machines would be vending machines. We push a button for coke and out pops a (very dusty) can of coke and the machine is now slightly different.

Intuitively, we might say that two vending machines are equivalent if and only if our interactions with them can’t distinguish one from the other. That is to say, pushing the same buttons on one gives the same output and leaves both machines in equivalent states.

To formalize this, we first need to formalize our notion of a vending machine. A vending machine is a comprised set of states. These states are connected by arrows labeled with a transition. We’ll refer to the start of a transition as its domain and its target as the codomain. This group of transitions and states is called a labeled transition system (LTS) properly.

To recap how this all relates back to vending machines

A state is a particular vending machine at a moment in time
A transition between A and B would mean we could push a button on A and wind up with B
The label of such a transition is the delicious sugary drink produced by pushing the button

Notice that this view pays no attention to all the mechanics going on behind the scenes of pushing a button, only the end result of the button push. We refer to the irrelevant stuff as the “internal state” of the vending machine.

Let’s consider a relation R with A R B if and only if

There exists a function f from transitions from A to transitions from B so that x and f(x) have the same label.
Further, if A R B and A has a transition x, then the codomain of x is related to the codomain of f(x).
There is a g satisfying 1. and 2., but from transitions from B to transitions from A.

This definition sets out to capture the notion that two states are related if we can’t distinguish between them. The fancy term for such a relation is a bisimulation. Now our notion of equivalence is called bisimilarity and denoted ~, it is the union of all bisimulations.

Now how could we prove that A ~ B? Since ~ is the union of all bisimulations, all we need to is construct a bisimulation so that A R B and hey presto, they’re bisimilar.

To circle back to vending machine terms, if for every button on machine A there’s a button on B that produces the same drink and leaves us with related machines then A and B are the same.

From Vending Machines to Programs

It’s all very well and good that we can talk about the equality of labeled transition systems, but we really want to talk about programs and pieces of data. How can we map our ideas about LTSs into programs?

Let’s start with everyone’s favorite example, finite and infinite lists. We define our domain of states to be

L(A) = {nil} ∪ {cons(n, xs) | n ∈ ℕ ∧ xs ∈ A}

We have to define this as a function over A which represents the tail of the list which means this definition isn’t recursive! It’s equivalent to

    data ListF a = Cons Int a | Nil

What we want here is a fixed point of L, an element X so that L(X) = X. This is important because it means

    cons :: ℕ → X → L(X)
    cons :: ℕ → L(X) → L(X)

Which is just the type we’d expect cons to have. There’s still a snag here, what fixed point do we want? How do we know one even exists? I’d prefer to not delve into the math behind this (see TAPL’s chapter on infinite types) but the gist of it is, if for any function F

F is monotone so that x ⊆ y ⇒ F(x) ⊆ F(y)
F is cocontinuous so that ∩ₓF(x) = F(∩ₓ x)

Then there exists an X = F(X) which is greater or equal to all other fixpoints. The proof of this isn’t too hard, I encourage the curious reader to go and have a look. Furthermore, poking around why we need cocontinuity is enlightening, it captures the notion of “nice” lazy functions. If you’ve looked at any domain theory, it’s similar to why we need continuity for least fixed pointed (inductive) functions.

This greatest fixed point what we get with Haskell’s recursive types and that’s what we want to model. What’s particularly interesting is that the greatest fixed point includes infinite data which is very different than the least fixed point which is what we usually prefer to think about when dealing with things like F-algebras and proofs by induction.

Now anyways, to show L has a fixed point we have to show it’s monotone. If X ⊆ Y then L(X) ⊆ L(Y) because x ∈ L(X) means x = nil ∈ L(Y) or x = cons(h, t), but since t ∈ X ⊆ Y then cons(h, t) ∈ L(Y). Cocontinuity is left as an exercise to the reader.

So L has a greatest fixed point: X. Let’s define an LTS with states being L(X) and with the transitions cons(a, x) → x labeled by a. What does bisimilarity mean in this context? Well nil ~ nil since neither have any transitions. cons(h, t) ~ cons(h', t') if and only if h = h' and t ~ t'. That sounds a lot like how equality works!

Demonstrate this let’s define two lists

    foo = cons(1, foo)
    bar = cons(1, cons(1, bar))

Let’s prove that foo ~ bar. Start by defining a relation R with foo R bar. Now we must show that each transition from foo can be matched with one from bar, since there’s only one from each this is easy. There’s a transition from foo → foo labeled by 1 and a transition from bar → cons(1, bar) also labeled by one. Here lies some trouble though, since we don’t know that foo R cons(1, bar), only that foo R bar. We can easily extend R with foo R cons(1, bar) though and now things are smooth sailing. The mapping of transitions for this new pair is identical to what we had before and since we know that foo R bar, our proof is finished.

To see the portion of the LTS our proof was about

 foo                  bar
  | 1                  | 1
 foo             cons(1, bar)
  | 1                  | 1
 foo                  bar

and our bisimulation R is just given by {(foo, bar), (foo, cons(1, bar))}.

Now that we’ve seen that we can map our programs into LTSs and apply our usual tricks there, let’s formalize this a bit.

A More Precise Formulation of Coinduction

First, what exactly is [co]induction? Coinduction is a proof principle for proving something about elements of the greatest fixed point of a function, F. We can prove that the greatest fixed point, X, is the union of all the sets Y so that Y ⊆ F(Y).

If we can prove that there exists an Y ⊆ F(Y) that captures our desired proposition then we know that Y ⊆ gfp(F). That is the principle of coinduction. Unlike the principle of induction we don’t get proofs about all members of a set, rather we get proves that there exists members which satisfy this property. It also should look very similar to how we proved things about ~.

So now that we’ve defined coinduction across a function, what functions do we want to actually plop into this? We already now what we want for lists,

List(A) = {nil} ∪ {cons(h, t) | h ∈ X ∧ t ∈ A}

But what about everything else? Well, we do know that each value is introduced by a rule. These rules are always of the form

Some Premises Here
——————————————————
 conclusion here

So for example, for lists we have

——————————————
 nil ∈ List(A)

    h ∈ H   t ∈ A
————————————–————————
 cons(h, t) ∈ List(A)

Now our rules can be converted into a function with a form like

 F(A) = ∪ᵣ {conclusion | premises}

So for lists this gives

F(A) = {nil} ∪ {cons(h, t) | h ∈ H ∧ t ∈ A}

as expected. We can imagine generalizing this to other things, like trees for example

————————–—————
leaf ∈ Tree(A)

x ∈ H    l ∈ A    r ∈ A
————————–—————————————–
 node(h, l, r) ∈ Tree(A)

Tree(A) = {leaf} ∪ {node(h, l, r) | x ∈ H ∧ l ∈ A ∧ r ∈ A}

Now the most common thing we want to prove is some notion of equability. This is harder then it seems because the usual notions of equality don’t work.

Instead we can apply bisimulation. Our approach is the same, we define a criteria for what it means to be a bisimulation across a certain type and define ~ as the union of all bisimulations. On lists we wanted the heads to be equal and the tails to be bisimilar, but what about on trees? We can take the same systematic approach we did before by considering what an LTS on trees would look like. leaf has no information contained in it and therefore no transitions. node(a, l, r) should have two transitions, left or right. Both of these give you a subtree contained by this node. What should they be labeled with? We can follow our intuitions from lists and label them both with a.

This leaves us with the following definition, R is a bisimulation on trees if and only if

leaf R leaf
node(a, l, r) R node(a', l', r') if and only if a = a' ∧ l R l' and r R r'

So to prove an equality between trees, all we must do is provide such an R and then we know R ⊆ ~!

This describes how we deal with coinductive things in general really. We define what it means for a relation to partially capture what we’re talking about (like a bisimulation) and then the full thing is the union of all of these different views! Sortedness could be expressed as a unitary relation S where

nil ∈ S
cons(a, xs) ∈ S → xs ∈ S ∧ Just a ≤ head xs

the sorted predicate is the union of all such relations!

Dithering about Duality

I’d like to reel off some pithy dualities between induction and coinduction

Coinduction is about observation, induction is about construction
Coinduction proves existentials, induction proves universals
Coinduction is proving your property is a subset of a group, induction is about proving a group is a subset of your property

Wrap Up

So that about wraps up this post.

We’ve seen how infinite structures demand a fundamentally different approach to proofs then finite ones. It’s not all puppies and rainbows though, considering how we managed to spend nearly 300 lines talking about it, coinduction is a lot less intuitive. It is however, our only choice if we want to have “real proofs” in Haskell (terms and conditions may apply).