Generic programming is a powerful way to define a function that works in an analogous way for a class of types. In this article, I describe the latest approach to generic programming that is implemented in GHC. This approach goes back to the paper A Generic Deriving Mechanism for Haskell by José Pedro Magalhães, Atze Dijkstra, Johan Jeuring, and Andres Löh.

## Motivation

Parametric polymorphism allows you to write functions that deal with values of any type. An example of such a function is the `reverse` function, whose type is `[a] -> [a]`. You can apply `reverse` to any list, no matter what types the elements have.

However, parametric polymorphism does not allow your functions to depend on the structure of the concrete types that are used in place of type variables. So values of these types are always treated as black boxes. For example, the `reverse` function only reorders the elements of the given list. A function of type `[a] -> [a]` could also drop elements (like the `tail` function does) or duplicate elements (like the `cycle` function does), but it could never invent new elements (except for ⊥) or analyze elements.

Now there are situation where a function is suitable for a class of types that share certain properties. For example, the `sum` function works for all types that have a notion of binary addition. Haskell uses type classes to support such functions. For example, the `Num` class provides the method `(+)`, which is used in the definition of `sum`, whose type `Num a => [a] -> a` contains a respective class constraint.

The methods of a class have to be implemented separately for every type that is an instance of the class. This is reasonable for methods like `(+)`, where the implementations for the different instances differ fundamentally. However, it is unfortunate for methods that are implemented in an analogous way for most of the class instances. An example of such a method is `(==)`, since there is a canonical way of checking values of algebraic data types for equality. It works by first comparing the outermost data constructors of the two given values and if they match, the individual fields. Only when the data constructors and all the fields match, the two values are considered equal.

For several standard classes, including `Eq`, Haskell provides the deriving mechanism to generate instances with default method implementations whose precise functionality depends on the structure of the type. Unfortunately, there is no possibility in standard Haskell to extend this deriving mechanism to user-defined classes. Generic programming is a way out of this problem.

## Prerequisites

For generic programming, we need several language extensions. The good thing is that only one of them, `DeriveGeneric`, is specific to generic programming. The other ones have uses in other areas as well. Furthermore, `DeriveGeneric` is a very small extension. So the generic programming approach we describe here can be considered very lightweight.

We state the full set of necessary extensions with the following pragma:

``````{-# LANGUAGE DefaultSignatures,
DeriveGeneric,
FlexibleContexts,
TypeFamilies,
TypeOperators #-}``````

Apart from these language extensions, we need the module `GHC.Generics`:

``import GHC.Generics``

## Our running example

As our running example, we pick serialization and deserialization of values. Serialization means converting a value into a bit string, and deserialization means parsing a bit string in order to get back a value.

We introduce a type `Bit` for representing bits:

``data Bit = O | I deriving (Eq, Show)``

Furthermore, we define the class of all types that support serialization and deserialization as follows:

``````class Serializable a where

put :: a -> [Bit]

get :: [Bit] -> (a, [Bit])``````

There is a canonical way of serializing values of algebraic data types. It works by first encoding the data constructor of the given value as a sequence of bits and then serializing the individual fields. To show this approach in action, we define an algebraic data type `Tree`, which is a type of labeled binary trees:

``data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving Show``

An instantiation of `Serializable` for `Tree` that follows the canonical serialization approach can be carried out as follows:

``````instance Serializable a => Serializable (Tree a) where

put Leaf                     = [O]
put (Branch left root right) = [I]       ++
put left  ++
put root  ++
put right

get (O : bits) = (Leaf, bits)
get (I : bits) = (Branch left root right, bits''') where

(left,  bits')   = get bits
(root,  bits'')  = get bits'
(right, bits''') = get bits''``````

Of course, it quickly becomes cumbersome to provide such an instance declaration for every algebraic data type that should use the canonical serialization approach. So we want to implement the canonical approach once and for all and make it easily usable for arbitrary types that are amenable to it. Generic programming makes this possible.

## Representations

An algebraic data type is essentially a sum of products where the terms “sum” and “product” are understood as follows:

• A sum is a variant type. In Haskell, `Either` is the canonical type constructor for binary sums, and the empty type `Void` from the `void` package is the nullary sum.

• A product is a tuple type. In Haskell, `(,)` is the canonical type constructor for binary products, and `()` is the nullary product.

The key idea of generic programming is to map types to representations that make the sum-of-products structure explicit and to implement canonical behavior based on these representations instead of the actual types.

The `GHC.Generics` module defines a number of type constructors for constructing representations:

``````data V1 p

infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

data U1 p = U1

infixr 6 :*:
data (:*:) f g p = f p :*: g p

newtype K1 i a p = K1 { unK1 :: a }

newtype M1 i c f p = M1 { unM1 :: f p }``````

All of these type constructors take a final parameter `p`. This parameter is relevant only when dealing with higher-order classes. In this article, however, we only discuss generic programming with first-order classes. In this case, the parameter `p` is ignored. The different type constructors play the following roles:

• `V1` is for the nullary sum.

• `(:+:)` is for binary sums.

• `U1` is for the nullary product.

• `(:*:)` is for binary products.

• `K1` is a wrapper for fields of algebraic data types. Its parameter `i` used to provide some information about the field at the type level, but is now obsolete.

• `M1` is a wrapper for attaching meta information at the type level. Its parameter `i` denotes the kind of the language construct the meta information refers to, and its parameter `c` provides access to the meta information.

The `GHC.Generics` module furthermore introduces a class `Generic`, whose instances are the types for which a representation exists. Its definition is as follows:

``````class Generic a where

type Rep a :: * -> *

from :: a -> (Rep a) p

to :: (Rep a) p -> a``````

A type `Rep a` is the representation of the type `a`. The methods `from` and `to` convert from values of the actual type to values of the representation type and vice versa.

To see all this in action, we make `Tree a` an instance of `Generic`:

``````instance Generic (Tree a) where

type Rep (Tree a) =
M1 D D1_Tree (
M1 C C1_Tree_Leaf U1
:+:
M1 C C1_Tree_Branch (
M1 S NoSelector (K1 R (Tree a))
:*:
M1 S NoSelector (K1 R a)
:*:
M1 S NoSelector (K1 R (Tree a))
)
)

from Leaf                     = M1 (L1 (M1 U1))
from (Branch left root right) = M1 (
R1 (
M1 (
M1 (K1 left)
:*:
M1 (K1 root)
:*:
M1 (K1 right)
))
)

to (M1 (L1 (M1 U1)))      = Leaf
to (M1 (
R1 (
M1 (
M1 (K1 left)
:*:
M1 (K1 root)
:*:
M1 (K1 right)
))
))                    = Branch left root right``````

The types `D1_Tree`, `C1_Tree_Leaf`, and `C1_Tree_Branch` are type-level representations of the type constructor `Tree`, the data constructor `Leaf`, and the data constructor `Branch`, respectively. We declare them as empty types:

``````data D1_Tree
data C1_Tree_Leaf
data C1_Tree_Branch``````

We need to make these types instances of the classes `Datatype` and `Constructor`, which are part of `GHC.Generics` as well. These classes provide a link between the type-level representations of type and data constructors and the meta information related to them. This meta information particularly covers the identifiers of the type and data constructors, which are needed when implementing canonical implementations for methods like `show` and `read`. The instance declarations for the `Tree`-related types are as follows:

``````instance Datatype D1_Tree where

datatypeName _ = "Tree"

moduleName _ = "Main"

instance Constructor C1_Tree_Leaf where

conName _ = "Leaf"

instance Constructor C1_Tree_Branch where

conName _ = "Branch"``````

Instantiating the `Generic` class as shown above is obviously an extremely tedious task. However, it is possible to instantiate `Generic` completely automatically for any given algebraic data type, using the `deriving` syntax. This is what the `DeriveGeneric` language extension makes possible.

So instead of making `Tree a` an instance of `Generic` by hand, as we have done above, we could have declared the `Tree` type as follows in the first place:

``````data Tree a = Leaf | Branch (Tree a) a (Tree a)
deriving (Show, Generic)``````

## Implementing canonical behavior

As mentioned above, we implement canonical behavior based on representations. Let us see how this works in the case of the `Serializable` class.

We introduce a new class `Serializable'` whose methods provide serialization and deserialization for representation types:

``````class Serializable' f where

put' :: f p -> [Bit]

get' :: [Bit] -> (f p, [Bit])``````

We instantiate this class for all the representation types:

``````instance Serializable' U1 where

put' U1 = []

get' bits = (U1, bits)

instance (Serializable' r, Serializable' s) =>
Serializable' (r :*: s) where

put' (rep1 :*: rep2) = put' rep1 ++ put' rep2

get' bits = (rep1 :*: rep2, bits'') where

(rep1, bits')  = get' bits
(rep2, bits'') = get' bits'

instance Serializable' V1 where

put' _ = error "attempt to put a void value"

get' _ = error "attempt to get a void value"

instance (Serializable' r, Serializable' s) =>
Serializable' (r :+: s) where

put' (L1 rep) = O : put' rep
put' (R1 rep) = I : put' rep

get' (O : bits) = let (rep, bits') = get' bits in
(L1 rep, bits')
get' (I : bits) = let (rep, bits') = get' bits in
(R1 rep, bits')

instance Serializable' r => Serializable' (M1 i a r) where

put' (M1 rep) = put' rep

get' bits = (M1 rep, bits') where

(rep, bits') = get' bits

instance Serializable a => Serializable' (K1 i a) where

put' (K1 val) = put val

get' bits = (K1 val, bits') where

(val, bits') = get bits``````

Note that in the case of `K1`, the context mentions `Serializable`, not `Serializable'`, and the methods `put'` and `get` call `put` and `get`, not `put'` and `get'`. The reason is that the value wrapped in `K1` has an ordinary type, not a representation type.

## Accessing canonical behavior

We can now apply canonical behavior to ordinary types using the methods `from` and `to` from the `Generic` class. For example, we can implement functions `defaultPut` and `defaultGet` that provide canonical serialization and deserialization for all instances of `Generic`:

``````defaultPut :: (Generic a, Serializable' (Rep a)) =>
a -> [Bit]
defaultPut = put' . from

defaultGet :: (Generic a, Serializable' (Rep a)) =>
[Bit] -> (a, [Bit])
defaultGet bits = (to rep, bits') where

(rep, bits') = get' bits``````

We can use these functions in instance declarations for `Serializable`. For example, we can make `Tree a` an instance of `Serializable` in the following way:

``````instance Serializable a => Serializable (Tree a) where

put = defaultPut

get = defaultGet``````

Compared to the instance declaration we had initially, this one is a real improvement, since we do not have to implement the desired behavior of `put` and `get` by hand anymore. However, it still contains boilerplate code in the form of the trivial method declarations. It would be better to establish `defaultPut` and `defaultGet` as defaults in the class declaration:

``````class Serializable a where

put :: a -> [Bit]
put = defaultPut

get :: [Bit] -> (a, [Bit])
get = defaultGet``````

However, this is not possible, since the types of `defaultPut` and `defaultGet` are less general than the types of `put` and `get`, as they put additional constraints on the type `a`. Luckily, GHC supports the language extension `DefaultSignatures`, which allows us to give default implementations that have less general types than the actual methods (and consequently work only for those instances that are compatible with these less general types). Using `DefaultSignatures`, we can declare the `Serializable` class as follows:

``````class Serializable a where

put :: a -> [Bit]
default put :: (Generic a, Serializable' (Rep a)) =>
a -> [Bit]
put = defaultPut

get :: [Bit] -> (a, [Bit])
default get :: (Generic a, Serializable' (Rep a)) =>
[Bit] -> (a, [Bit])
get = defaultGet``````

With this class declaration in place, we can make `Tree a` an instance of `Serializable` as follows:

``instance Serializable a => Serializable (Tree a)``

With the minor extension `DeriveAnyClass`, which is provided by GHC starting from Version 7.10, we can even use the `deriving` keyword to instantiate `Serializable` for `Tree a`. As a result, we only have to write the following in order to define the `Tree` type and make it an instance of `Serializable`:

``````data Tree a = Leaf | Branch (Tree a) a (Tree a)
deriving (Show, Generic, Serializable)``````

So finally, we can use our own classes like the Haskell standard classes regarding the use of deriving clauses, except that we have to additionally derive an instance declaration for `Generic`.

## Specialized implementations

Usually, not all instances of a class should or even can be generated by means of generic programming, but some instances have to be crafted by hand. For example, making `Int` an instance of `Serializable` requires manual work, since `Int` is not an algebraic data type.

However, there is no problem with this, since we still have the opportunity to write explicit instance declarations, despite the presence of a generic solution. This is in line with the standard deriving mechanism: you can make use of it, but you are not forced to do so. So we can have the following instance declaration, for example:

``````instance Serializable Int where

put val = replicate val I ++ [O]

get bits = (length is, bits') where

(is, O : bits') = span (== I) bits``````

Of course, the serialization approach we use here is not very efficient, but the instance declaration illustrates the point we want to make.

## 5 thoughts on “Generic programming in Haskell”

1. Flip101

How would you go about making a function like uniplate transformBi with GHC.Generics?

Like

2. Robert Peszek

I have noticed some minor typos:

• The link to “the latest approach” appears to be broken now.
• `M1` (definition) as stated does not have `c` as indicated (“`c` provides access to the meta information”), but it has `a` instead.
• The definition of `get` for `Int` does not seem to be total, as the pattern match `0 : bits'` may fail.

Also, reading this post, I got the impression (more reading between lines, I guess) that the generic implementation of `Serializable` somehow gets implemented with the magic of `DeriveAnyClass`, which is clearly not the case. Maybe clarifying which boilerplate is automatable and which is not would make it a bit more clear.

Like

1. Wolfgang Jeltsch Post author

Glad that you like my blog. ☺

I fixed the link and the definition of `M1`.

The definition of `get` for `Int` is correct. The list of following bits cannot start with `I`, since `span` removes all leading `I`-bits. So the pattern match can only fail if the list of following bits is empty. However, this would mean an illegal encoding was given, and the function should fail in this case.

Like

3. Politron

All examples in the net are related with tree leafs and even worst mathematics reference. Is that hard and complex this subject. Can I see a simple example of how to process a data type `Male` and `Female` in a generic code where both of them share age and name but not sex?

Like

1. Wolfgang Jeltsch Post author

In simple cases generic programming is probably pointless to use.

Like