This website is out of date, please refer to the site on github.

You Could Have Invented GHC.Generics

Posted on April 25, 2014
Tags: haskell

In Haskell right now there seem to be two main approaches to data-generic programming. There’s the whole Typeable/Data approaches which is a bit magic. Lately however, there has been a new kid on the block, GHC.Generics.

In this post we’ll step through the intuition for the library and (hopefully) help shed some light on why it exists and how to use it.

Boilerplate

Let’s imagine you, our young and brilliant Haskell hacker cranking out some code. You’ve probably gone the typesafe route and have lots and lots of types to encode invariants.

However, this proliferation of types is cramping your style a bit, you’re forced to create a new function over each type which seems to do exactly the same thing!

    mapFoo  :: (a -> b) -> Foo a  -> Foo b
    mapBar  :: (a -> b) -> Bar a  -> Bar b
    mapQuux :: (a -> b) -> Quux a -> Quux b

But, we’re clever enough to notice that this is obviously just fmap! So we can scrap all of this with fmap and -XDeriveFunctor.

But, what about other functions. There are a lot of things that are basically mechanical to define over each type. Serialization, field selection, and so on and so on. Each of these operations have something in common; they deal with the structure of the types rather than the actual representation of it.

Selecting the first fields from

    data Foo a = Foo a a a
    data Bar a = Bar a a a

is almost identical! The only difference is in the name. So, let’s figure out a way to talk about the structure of our types.

Dissecting an Algebraic Type

Now, when we go to dissect some type data Foo = ... we have two things to consider

  1. A list of constructors
  2. A list of fields for each constructor

Let’s start with (2) since it’s simpler. For types that are of the form

    data SomeType = OneConstructor field1 field2 field3 ...

we can almost think of them as really, really big tuples.

    type SomeType' = (field1, field2, field3, ...)

But, since we want to encode different numbers of fields in just one type, let’s transform this further into

    type SomeType'' = (field1, (field2, (field3, ...)))

There we have it, we can encode lists of fields as a deeply nested group of tuples.

We can now imagine something like

    {-# LANGUAGE TypeFamilies #-}
    type family TupleForm a

    data Foo a = Foo a a
    type instance TupleForm (Foo a) = (a, a)

    data Bar a = Bar a a
    type instance TupleForm (Bar a) = (a, a)

    class Tuple a where
       toT :: a -> TupleForm a
       fromT :: TupleForm a -> a

    instance Tuple (Foo a) where
     ...
    instance Tuple (Bar a) where
     ...

Now we can write generic functions by only writing them for the TupleForm of Foo and Bar. For example,

    gfst :: (TupleForm a ~ (b, c), Tuple a) => a -> b
    gfst = fst . toT

Now that we understand fields, let’s move on to constructors!

A list constructors is the dual to a list of fields, representing OR rather than AND. We can make a bit of a leap from this to thinking that our representations of the two should be dual. So what would be the dual of (a, b)? Why that would be Either a b!

This means for a type

    data SomeType = Bar Int | Baz Char | Quux ()
    type SomeType' = Either Int (Either Char ())

This covers almost every case, we just need to make sure we represent no argument constructors as constructors of one argument: (). Take a moment to think why.

A Procedure for Reifying

Let’s now outline an algorithm for turning some arbitrary type to the corresponding generic version.

For a type C, with constructors C1, C2, C3.. and fields C1^1, C1^2, C2^1…

  1. Change each set Cx^* to the TupleForm, call this TupleForm Tx
  2. Nest the Tx’s in Either’s, Either T1 (Either T2 (Either T3 ...))

And that’s it, let’s practice on some data types to check that it works.

    data Test = Foo Int Char | Bar Int Bool Char | Quux
    type Test' = Either (Int, Char) (Either (Int, (Bool, Char)) ())

    data Maybe a  = Just a | Nothing
    type Maybe' a = Either a ()

So we can see that this transformation is pretty mechanical! There’s one hiccup though: what do we do with recursive types?

We’ll handle it the same way that GHC.Generics does, we just don’t transform the recursive arguments into the generic representation lest we end up with an infinite tree.

So [a] should look like

    type List a = Either (a, [a]) ()

Building a Library

Now if we want to build this into a library, we’d like to provide a few of our own data types rather than hijacking Either and (,).

    {-# LANGUAGE TypeOperators #-}

    data (:*:) a b = a :*: b       -- Like (,) a b
    data (:+:) a b = InL a | InR b -- Like Either a b
    data U         = U             -- Like (), U is for Unit

Now all our transformation are the same, but the results are prettier thanks to the type level operators

    type List   a = (a :*: [a]) :+: U
    type Maybe' a = a :+: U

Now to facilitate generic programming, we’ll lug one more parameter through each of these constructors and add another two types to wrap meta information and constants respectively

    data (:*:) a b p = a p :*: b p           -- Like (,) a b
    data (:+:) a b p = L1 (a p) | R1 (b p) -- Like Either a b
    data U         p = U                     -- Like (), U is for Unit

    newtype M1 i c f p = M1 (f p) -- i and c are meta info
    newtype K1 i c   p = K1 c

Now because we’re expecting all our arguments to (:*:) and (:+:) to be of kind * -> * we use K1 to wrap a normal type like Int so that it can take an argument.

M1 is a bit odd, it’s used to store information about our data entirely in phantom types. We can imagine having a bunch of types that represent different things, like whether the tree of constructors represents such and such data type or what constructor we’re dealing with. It’s not terribly relevant to the rest of this post, but useful in some odd cases.

Now we can repeat our transformation we’d discussed earlier just using the new constructors instead. We can imagine wrapping up this whole class like this

    class Generic a where
      type family Rep a :: * -> *
      to   :: a       -> Rep a
      from :: Rep a p -> a

This is very much in the spirit of our Tuple type class, but now our type family returns something of type * -> * to leave room for our extra p parameter

The Real Deal

As clever readers will have noticed, the above type class is precisely what GHC.Generics exports! We have successfully reached full circle and now have arrived at GHC.Generics' API.

The only difference between us and GHC.Generics is their Generic class can be derived almost identically to our algorithm. The only slight difference is rather than a “list” of :*:’s or :+:’s they make a tree, this makes little difference to most programs however.

To wrap things up, let’s finish by showcasing making a simple generic debugging dumper.

To begin with, we’ll define a class GDump and will make instances for the GHC.Generics types

    class GDump a where
       gdump :: a -> String

    instance GDump (U1 p) where
      gdump U1 = "()"

    instance Show c => GDump (K1 i c p) where
      gdump (K1 c) = show c

    instance (GDump (f p), GDump (g p)) => GDump ((:*:) f g p) where
      gdump (a :*: b) = "(" ++ gdump a ++ " :*: " ++ gdump b ++ ")"

    instance (GDump (f p), GDump (g p)) => GDump ((:+:) f g p) where
      gdump (L1 a) = "(Left  " ++ gdump a ++ ")"
      gdump (R1 a) = "(Right " ++ gdump a ++ ")"

    instance (GDump (f p)) => GDump (M1 a b f p) where
      gdump (M1 f) = gdump f

And now we can create a class for “normal” values and use -XDefaultSignatures to give the default implementation a Generic constraint

    class Dump a where
      dump :: a -> String

      default dump :: (Generic a, GDump (Rep a ())) => a -> String
      dump a = gdump (from' a)
        where from' :: Generic a => a -> Rep a ()
              from' = from  -- A hack to stop the type checker from whining about p

And now we can just use this default implementation.

    instance Show a => Dump (Maybe a)

Using this we can print suitably boring representations of generic types, for free!

comments powered by Disqus