This article is a writeup of a Theory Lunch talk I gave on 4 February 2016. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with.
Parametric polymorphism allows you to write functions that deal with values of any type. An example of such a function is the reverse
function, whose type is [a] -> [a]
. You can apply reverse
to any list, no matter what types the elements have.
However, parametric polymorphism does not allow your functions to depend on the structure of the concrete types that are used in place of type variables. So values of these types are always treated as black boxes. For example, the reverse
function only reorders the elements of the given list. A function of type [a] -> [a]
could also drop elements (like the tail
function does) or duplicate elements (like the cycle
function does), but it could never invent new elements (except for ⊥) or analyze elements.
Now there are situation where a function is suitable for a class of types that share certain properties. For example, the sum
function works for all types that have a notion of binary addition. Haskell uses type classes to support such functions. For example, the Num
class provides the method (+)
, which is used in the definition of sum
, whose type Num a => [a] -> a
contains a respective class constraint.
The methods of a class have to be implemented separately for every type that is an instance of the class. This is reasonable for methods like (+)
, where the implementations for the different instances differ fundamentally. However, it is unfortunate for methods that are implemented in an analogous way for most of the class instances. An example of such a method is (==)
, since there is a canonical way of checking values of algebraic data types for equality. It works by first comparing the outermost data constructors of the two given values and if they match, the individual fields. Only when the data constructors and all the fields match, the two values are considered equal.
For several standard classes, including Eq
, Haskell provides the deriving mechanism to generate instances with default method implementations whose precise functionality depends on the structure of the type. Unfortunately, there is no possibility in standard Haskell to extend this deriving mechanism to user-defined classes. Generic programming is a way out of this problem.
For generic programming, we need several language extensions. The good thing is that only one of them, DeriveGeneric
, is specific to generic programming. The other ones have uses in other areas as well. Furthermore, DeriveGeneric
is a very small extension. So the generic programming approach we describe here can be considered very lightweight.
We state the full set of necessary extensions with the following pragma:
{-# LANGUAGE DefaultSignatures,
DeriveGeneric,
FlexibleContexts,
TypeFamilies,
TypeOperators #-}
Apart from these language extensions, we need the module GHC.Generics
:
import GHC.Generics
As our running example, we pick serialization and deserialization of values. Serialization means converting a value into a bit string, and deserialization means parsing a bit string in order to get back a value.
We introduce a type Bit
for representing bits:
data Bit = O | I deriving (Eq, Show)
Furthermore, we define the class of all types that support serialization and deserialization as follows:
class Serializable a where
put :: a -> [Bit]
get :: [Bit] -> (a, [Bit])
There is a canonical way of serializing values of algebraic data types. It works by first encoding the data constructor of the given value as a sequence of bits and then serializing the individual fields. To show this approach in action, we define an algebraic data type Tree
, which is a type of labeled binary trees:
data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving Show
An instantiation of Serializable
for Tree
that follows the canonical serialization approach can be carried out as follows:
instance Serializable a => Serializable (Tree a) where
put Leaf = [O]
put (Branch left root right) = [I] ++
put left ++
put root ++
put right
get (O : bits) = (Leaf, bits)
get (I : bits) = (Branch left root right, bits''') where
(left, bits') = get bits
(root, bits'') = get bits'
(right, bits''') = get bits''
Of course, it quickly becomes cumbersome to provide such an instance declaration for every algebraic data type that should use the canonical serialization approach. So we want to implement the canonical approach once and for all and make it easily usable for arbitrary types that are amenable to it. Generic programming makes this possible.
An algebraic data type is essentially a sum of products where the terms “sum” and “product” are understood as follows:
A sum is a variant type. In Haskell, Either
is the canonical type constructor for binary sums, and the empty type Void
from the void
package is the nullary sum.
A product is a tuple type. In Haskell, (,)
is the canonical type constructor for binary products, and ()
is the nullary product.
The key idea of generic programming is to map types to representations that make the sum-of-products structure explicit and to implement canonical behavior based on these representations instead of the actual types.
The GHC.Generics
module defines a number of type constructors for constructing representations:
data V1 p
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)
data U1 p = U1
infixr 6 :*:
data (:*:) f g p = f p :*: g p
newtype K1 i a p = K1 { unK1 :: a }
newtype M1 i a f p = M1 { unM1 :: f p }
All of these type constructors take a final parameter p
. This parameter is relevant only when dealing with higher-order classes. In this article, however, we only discuss generic programming with first-order classes. In this case, the parameter p
is ignored. The different type constructors play the following roles:
V1
is for the nullary sum.
(:+:)
is for binary sums.
U1
is for the nullary product.
(:*:)
is for binary products.
K1
is a wrapper for fields of algebraic data types. Its parameter i
used to provide some information about the field at the type level, but is now obsolete.
M1
is a wrapper for attaching meta information at the type level. Its parameter i
denotes the kind of the language construct the meta information refers to, and its parameter c
provides access to the meta information.
The GHC.Generics
module furthermore introduces a class Generic
, whose instances are the types for which a representation exists. Its definition is as follows:
class Generic a where
type Rep a :: * -> *
from :: a -> (Rep a) p
to :: (Rep a) p -> a
A type Rep a
is the representation of the type a
. The methods from
and to
convert from values of the actual type to values of the representation type and vice versa.
To see all this in action, we make Tree a
an instance of Generic
:
instance Generic (Tree a) where
type Rep (Tree a) =
M1 D D1_Tree (
M1 C C1_Tree_Leaf U1
:+:
M1 C C1_Tree_Branch (
M1 S NoSelector (K1 R (Tree a))
:*:
M1 S NoSelector (K1 R a)
:*:
M1 S NoSelector (K1 R (Tree a))
)
)
from Leaf = M1 (L1 (M1 U1))
from (Branch left root right) = M1 (
R1 (
M1 (
M1 (K1 left)
:*:
M1 (K1 root)
:*:
M1 (K1 right)
))
)
to (M1 (L1 (M1 U1))) = Leaf
to (M1 (
R1 (
M1 (
M1 (K1 left)
:*:
M1 (K1 root)
:*:
M1 (K1 right)
))
)) = Branch left root right
The types D1_Tree
, C1_Tree_Leaf
, and C1_Tree_Branch
are type-level representations of the type constructor Tree
, the data constructor Leaf
, and the data constructor Branch
, respectively. We declare them as empty types:
data D1_Tree
data C1_Tree_Leaf
data C1_Tree_Branch
We need to make these types instances of the classes Datatype
and Constructor
, which are part of GHC.Generics
as well. These classes provide a link between the type-level representations of type and data constructors and the meta information related to them. This meta information particularly covers the identifiers of the type and data constructors, which are needed when implementing canonical implementations for methods like show
and read
. The instance declarations for the Tree
-related types are as follows:
instance Datatype D1_Tree where
datatypeName _ = "Tree"
moduleName _ = "Main"
instance Constructor C1_Tree_Leaf where
conName _ = "Leaf"
instance Constructor C1_Tree_Branch where
conName _ = "Branch"
Instantiating the Generic
class as shown above is obviously an extremely tedious task. However, it is possible to instantiate Generic
completely automatically for any given algebraic data type, using the deriving
syntax. This is what the DeriveGeneric
language extension makes possible.
So instead of making Tree a
an instance of Generic
by hand, as we have done above, we could have declared the Tree
type as follows in the first place:
data Tree a = Leaf | Branch (Tree a) a (Tree a)
deriving (Show, Generic)
As mentioned above, we implement canonical behavior based on representations. Let us see how this works in the case of the Serializable
class.
We introduce a new class Serializable'
whose methods provide serialization and deserialization for representation types:
class Serializable' f where
put' :: f p -> [Bit]
get' :: [Bit] -> (f p, [Bit])
We instantiate this class for all the representation types:
instance Serializable' U1 where
put' U1 = []
get' bits = (U1, bits)
instance (Serializable' r, Serializable' s) =>
Serializable' (r :*: s) where
put' (rep1 :*: rep2) = put' rep1 ++ put' rep2
get' bits = (rep1 :*: rep2, bits'') where
(rep1, bits') = get' bits
(rep2, bits'') = get' bits'
instance Serializable' V1 where
put' _ = error "attempt to put a void value"
get' _ = error "attempt to get a void value"
instance (Serializable' r, Serializable' s) =>
Serializable' (r :+: s) where
put' (L1 rep) = O : put' rep
put' (R1 rep) = I : put' rep
get' (O : bits) = let (rep, bits') = get' bits in
(L1 rep, bits')
get' (I : bits) = let (rep, bits') = get' bits in
(R1 rep, bits')
instance Serializable' r => Serializable' (M1 i a r) where
put' (M1 rep) = put' rep
get' bits = (M1 rep, bits') where
(rep, bits') = get' bits
instance Serializable a => Serializable' (K1 i a) where
put' (K1 val) = put val
get' bits = (K1 val, bits') where
(val, bits') = get bits
Note that in the case of K1
, the context mentions Serializable
, not Serializable'
, and the methods put'
and get
call put
and get
, not put'
and get'
. The reason is that the value wrapped in K1
has an ordinary type, not a representation type.
We can now apply canonical behavior to ordinary types using the methods from
and to
from the Generic
class. For example, we can implement functions defaultPut
and defaultGet
that provide canonical serialization and deserialization for all instances of Generic
:
defaultPut :: (Generic a, Serializable' (Rep a)) =>
a -> [Bit]
defaultPut = put' . from
defaultGet :: (Generic a, Serializable' (Rep a)) =>
[Bit] -> (a, [Bit])
defaultGet bits = (to rep, bits') where
(rep, bits') = get' bits
We can use these functions in instance declarations for Serializable
. For example, we can make Tree a
an instance of Serializable
in the following way:
instance Serializable a => Serializable (Tree a) where
put = defaultPut
get = defaultGet
Compared to the instance declaration we had initially, this one is a real improvement, since we do not have to implement the desired behavior of put
and get
by hand anymore. However, it still contains boilerplate code in the form of the trivial method declarations. It would be better to establish defaultPut
and defaultGet
as defaults in the class declaration:
class Serializable a where
put :: a -> [Bit]
put = defaultPut
get :: [Bit] -> (a, [Bit])
get = defaultGet
However, this is not possible, since the types of defaultPut
and defaultGet
are less general than the types of put
and get
, as they put additional constraints on the type a
. Luckily, GHC supports the language extension DefaultSignatures
, which allows us to give default implementations that have less general types than the actual methods (and consequently work only for those instances that are compatible with these less general types). Using DefaultSignatures
, we can declare the Serializable
class as follows:
class Serializable a where
put :: a -> [Bit]
default put :: (Generic a, Serializable' (Rep a)) =>
a -> [Bit]
put = defaultPut
get :: [Bit] -> (a, [Bit])
default get :: (Generic a, Serializable' (Rep a)) =>
[Bit] -> (a, [Bit])
get = defaultGet
With this class declaration in place, we can make Tree a
an instance of Serializable
as follows:
instance Serializable a => Serializable (Tree a)
With the minor extension DeriveAnyClass
, which is provided by GHC starting from Version 7.10, we can even use the deriving
keyword to instantiate Serializable
for Tree a
. As a result, we only have to write the following in order to define the Tree
type and make it an instance of Serializable
:
data Tree a = Leaf | Branch (Tree a) a (Tree a)
deriving (Show, Generic, Serializable)
So finally, we can use our own classes like the Haskell standard classes regarding the use of deriving clauses, except that we have to additionally derive an instance declaration for Generic
.
Usually, not all instances of a class should or even can be generated by means of generic programming, but some instances have to be crafted by hand. For example, making Int
an instance of Serializable
requires manual work, since Int
is not an algebraic data type.
However, there is no problem with this, since we still have the opportunity to write explicit instance declarations, despite the presence of a generic solution. This is in line with the standard deriving mechanism: you can make use of it, but you are not forced to do so. So we can have the following instance declaration, for example:
instance Serializable Int where
put val = replicate val I ++ [O]
get bits = (length is, bits') where
(is, O : bits') = span (== I) bits
Of course, the serialization approach we use here is not very efficient, but the instance declaration illustrates the point we want to make.
Monad
class. The reason is typically that the return or the bind operation of such a type m
has a constraint on the type parameter of m
. As a result, all the nice library support for monads is unusable for such types. This problem is called the constrained-monad problem.
In my article The Constraint
kind, I described a solution to this problem, which involved changing the Monad
class. In this article, I present a solution that works with the standard Monad
class. This solution has been developed by Neil Sculthorpe, Jan Bracker, George Giorgidze, and Andy Gill. It is described in their paper The Constrained-Monad Problem and implemented in the constrained-normal package.
This article is a write-up of a Theory Lunch talk I gave quite some time ago. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with.
We have to enable a couple of language extensions:
{-# LANGUAGE ConstraintKinds,
ExistentialQuantification,
FlexibleInstances,
GADTSyntax,
Rank2Types #-}
Furthermore, we need to import some modules:
import Data.Set hiding (fold, map)
import Data.Natural hiding (fold)
These imports require the packages containers and natural-numbers to be installed.
The Set
type has an associated monad structure, consisting of a return and a bind operation:
returnSet :: a -> Set a
returnSet = singleton
bindSet :: Ord b => Set a -> (a -> Set b) -> Set b
bindSet sa g = unions (map g (toList sa))
We cannot make Set
an instance of Monad
though, since bindSet
has an Ord
constraint on the element type of the result set, which is caused by the use of unions
.
For a solution, let us first look at how monadic computations on sets would be expressed if Set
was an instance of Monad
. A monadic expression would be built from non-monadic expressions and applications of return
and (>>=)
. For every such expression, there would be a normal form of the shape
s_{1} >>= \
x_{1} ->
s_{2} >>= \
x_{2} ->
… ->
s_{n} >>= \
x_{n} -> return
r
where the s_{i} would be non-monadic expressions of type Set
. The existence of a normal form would follow from the monad laws.
We define a type UniSet
of such normal forms:
data UniSet a where
ReturnSet :: a -> UniSet a
AtmBindSet :: Set a -> (a -> UniSet b) -> UniSet b
We can make UniSet
an instance of Monad
where the monad operations build expressions and normalize them on the fly:
instance Monad UniSet where
return a = ReturnSet a
ReturnSet a >>= f = f a
AtmBindSet sa h >>= f = AtmBindSet sa h' where
h' a = h a >>= f
Note that these monad operations are analogous to operations on lists, with return
corresponding to singleton construction and (>>=)
corresponding to concatenation. Normalization happens in (>>=)
by applying the left-identity and the associativity law for monads.
We can use UniSet
as an alternative set type, representing a set by a normal form that evaluates to this set. This way, we get a set type that is an instance of Monad
. For this to be sane, we have to hide the data constructors of UniSet
, so that different normal forms that evaluate to the same set cannot be distinguished.
Now we need functions that convert between Set
and UniSet
. Conversion from Set
to UniSet
is simple:
toUniSet :: Set a -> UniSet a
toUniSet sa = AtmBindSet sa ReturnSet
Conversion from UniSet
to Set
is expression evaluation:
fromUniSet :: Ord a => UniSet a -> Set a
fromUniSet (ReturnSet a) = returnSet a
fromUniSet (AtmBindSet sa h) = bindSet sa g where
g a = fromUniSet (h a)
The type of fromUniSet
constrains the element type to be an instance of Ord
. This single constraint is enough to make all invocations of bindSet
throughout the conversion legal. The reason is our use of normal forms. Since normal forms are “right-leaning”, all applications of (>>=)
in them have the same result type as the whole expression.
Let us now look at a different monad, the multiset monad.
We represent a multiset as a function that maps each value of the element type to its multiplicity in the multiset, with a multiplicity of zero denoting absence of this value:
newtype MSet a = MSet { mult :: a -> Natural }
Now we define the return operation:
returnMSet :: Eq a => a -> MSet a
returnMSet a = MSet ma where
ma b | a == b = 1
| otherwise = 0
For defining the bind operation, we need to define a class Finite
of finite types whose sole method enumerates all the values of the respective type:
class Finite a where
values :: [a]
The implementation of the bind operation is as follows:
bindMSet :: Finite a => MSet a -> (a -> MSet b) -> MSet b
bindMSet msa g = MSet mb where
mb b = sum [mult msa a * mult (g a) b | a <- values]
Note that the multiset monad differs from the set monad in its use of constraints. The set monad imposes a constraint on the result element type of bind, while the multiset monad imposes a constraint on the first argument element type of bind and another constraint on the result element type of return.
Like in the case of sets, we define a type of monadic normal forms:
data UniMSet a where
ReturnMSet :: a -> UniMSet a
AtmBindMSet :: Finite a =>
MSet a -> (a -> UniMSet b) -> UniMSet b
The key difference to UniSet
is that UniMSet
involves the constraint of the bind operation, so that normal forms must respect this constraint. Without this restriction, it would not be possible to evaluate normal forms later.
The Monad
–UniMSet
instance declaration is analogous to the Monad
–UniSet
instance declaration:
instance Monad UniMSet where
return a = ReturnMSet a
ReturnMSet a >>= f = f a
AtmBindMSet msa h >>= f = AtmBindMSet msa h' where
h' a = h a >>= f
Now we define conversion from MSet
to UniMSet
:
toUniMSet :: Finite a => MSet a -> UniMSet a
toUniMSet msa = AtmBindMSet msa ReturnMSet
Note that we need to constrain the element type in order to fulfill the constraint incorporated into the UniMSet
type.
Finally, we define conversion from UniMSet
to MSet
:
fromUniMSet :: Eq a => UniMSet a -> MSet a
fromUniMSet (ReturnMSet a) = returnMSet a
fromUniMSet (AtmBindMSet msa h) = bindMSet msa g where
g a = fromUniMSet (h a)
Here we need to impose an Eq
constraint on the element type. Note that this single constraint is enough to make all invocations of returnMSet
throughout the conversion legal. The reason is again our use of normal forms.
The solutions to the constrained-monad problem for sets and multisets are very similar. It is certainly not good if we have to write almost the same code for every new constrained monad that we want to make accessible via the Monad
class. Therefore, we define a generic type that covers all such monads:
data UniMonad c t a where
Return :: a -> UniMonad c t a
AtmBind :: c a =>
t a -> (a -> UniMonad c t b) -> UniMonad c t b
The parameter t
of UniMonad
is the underlying data type, like Set
or MSet
, and the parameter c
is the constraint that has to be imposed on the type parameter of the first argument of the bind operation.
For every c
and t
, we make UniMonad c t
an instance of Monad
:
instance Monad (UniMonad c t) where
return a = Return a
Return a >>= f = f a
AtmBind ta h >>= f = AtmBind ta h' where
h' a = h a >>= f
We define a function lift
that converts from the underlying data type to UniMonad
and thus generalizes toUniSet
and toUniMSet
:
lift :: c a => t a -> UniMonad c t a
lift ta = AtmBind ta Return
Evaluation of normal forms is just folding with the return and bind operations of the underlying data type. Therefore, we implement a fold operator for UniMonad
:
fold :: (a -> r)
-> (forall a . c a => t a -> (a -> r) -> r)
-> UniMonad c t a
-> r
fold return _ (Return a) = return a
fold return atmBind (AtmBind ta h) = atmBind ta g where
g a = fold return atmBind (h a)
Note that fold
does not need to deal with constraints, neither with constraints on the result type parameter of return (like Eq
in the case of MSet
), nor with constraints on the result type parameter of bind (like Ord
in the case of Set
). This is because fold
works with any result type r
.
Now let us implement Monad
-compatible sets and multisets based on UniMonad
.
In the case of sets, we face the problem that UniMonad
takes a constraint for the type parameter of the first bind argument, but bindSet
does not have such a constraint. To solve this issue, we introduce a type class Unconstrained
of which every type is an instance:
class Unconstrained a
instance Unconstrained a
The implementation of Monad
-compatible sets is now straightforward:
type UniMonadSet = UniMonad Unconstrained Set
toUniMonadSet :: Set a -> UniMonadSet a
toUniMonadSet = lift
fromUniMonadSet :: Ord a => UniMonadSet a -> Set a
fromUniMonadSet = fold returnSet bindSet
The implementation of Monad
-compatible multisets does not need any utility definitions, but can be given right away:
type UniMonadMSet = UniMonad Finite MSet
toUniMonadMSet :: Finite a => MSet a -> UniMonadMSet a
toUniMonadMSet = lift
fromUniMonadMSet :: Eq a => UniMonadMSet a -> MSet a
fromUniMonadMSet = fold returnMSet bindMSet
As usual, this article is written using literate programming. The article source is a literate Curry file, which you can load into KiCS2 to play with the code.
I want to thank all the people from the Curry mailing list who have helped me improving the code in this article.
We import the module SearchTree
:
import SearchTree
We define the type Sym
of symbols and the type Str
of symbol strings:
data Sym = M | I | U
showSym :: Sym -> String
showSym M = "M"
showSym I = "I"
showSym U = "U"
type Str = [Sym]
showStr :: Str -> String
showStr str = concatMap showSym str
Next, we define the type Rule
of rules:
data Rule = R1 | R2 | R3 | R4
showRule :: Rule -> String
showRule R1 = "R1"
showRule R2 = "R2"
showRule R3 = "R3"
showRule R4 = "R4"
So far, the Curry code is basically the same as the Haskell code. However, this is going to change below.
Rule application becomes a lot simpler in Curry. In fact, we can code the rewriting rules almost directly to get a rule application function:
applyRule :: Rule -> Str -> Str
applyRule R1 (init ++ [I]) = init ++ [I, U]
applyRule R2 ([M] ++ tail) = [M] ++ tail ++ tail
applyRule R3 (pre ++ [I, I, I] ++ post) = pre ++ [U] ++ post
applyRule R4 (pre ++ [U, U] ++ post) = pre ++ post
Note that we do not return a list of derivable strings, as we did in the Haskell solution. Instead, we use the fact that functions in Curry are nondeterministic.
Furthermore, we do not need the helper functions splits
and replace
that we used in the Haskell implementation. Instead, we use the ++
-operator in conjunction with functional patterns to achieve the same functionality.
Now we implement a utility function applyRules
for repeated rule application. Our implementation uses a similar trick as the famous Haskell implementation of the Fibonacci sequence:
applyRules :: [Rule] -> Str -> [Str]
applyRules rules str = tail strs where
strs = str : zipWith applyRule rules strs
The Haskell implementation does not need the applyRules
function, but it needs a lot of code about derivation trees instead. In the Curry solution, derivation trees are implicit, thanks to nondeterminism.
A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations:
data Deriv = Deriv [DStep] Str
data DStep = DStep Str Rule
showDeriv :: Deriv -> String
showDeriv (Deriv steps goal) = " " ++
concatMap showDStep steps ++
showStr goal ++
"\n"
showDerivs :: [Deriv] -> String
showDerivs derivs = concatMap ((++ "\n") . showDeriv) derivs
showDStep :: DStep -> String
showDStep (DStep origin rule) = showStr origin ++
"\n-> (" ++
showRule rule ++
") "
Now we implement a function derivation
that takes two strings and returns the derivations that turn the first string into the second:
derivation :: Str -> Str -> Deriv
derivation start end
| start : applyRules rules start =:= init ++ [end]
= Deriv (zipWith DStep init rules) end where
rules :: [Rule]
rules free
init :: [Str]
init free
Finally, we define a function printDerivations
that explicitly invokes a breadth-first search to compute and ultimately print derivations:
printDerivations :: Str -> Str -> IO ()
printDerivations start end = do
searchTree <- getSearchTree (derivation start end)
putStr $ showDerivs (allValuesBFS searchTree)
You may want to enter
printDerivations [M, I] [M, I, U]
at the KiCS2 prompt to see the derivations
function in action.
cabal update
cabal install grapefruit-ui-gtk grapefruit-examples
Many thanks to Samuel Gélineau for providing several patches. (By the way, Samuel maintains an interesting page that compares different FRP libraries, the FRP Zoo.)
Grapefruit 0.1 is actually a phase-out model, which I only update to work with newer GHC versions. However, I am working on a new Grapefruit. This will be based on my research about FRP semantics and will be quite different from the old one. I expect that the sound theoretical foundation will lead to a more powerful library with a more sensible interface. One particular new feature will be integration of side effects into FRP, in a purely functional style.
The background is that the head of the Software Department of the Institute of Cybernetics, Ahto Kalja, recently received the Order of the White Star, 4th class from the President of Estonia. On this account, Estonian TV conducted an interview with him, during which they recorded also parts of my notes that were still present on the whiteboard in our coffee room.
You can watch the video online. The relevant part, which is about e-government, is from 18:14 to 21:18. I enjoyed it very much hearing Ahto Kalja’s colleague Arvo Ott talking about electronic tax returns and seeing some formula about limits immediately afterwards. At 20:38, there is also some Haskell-like pseudocode.
Like Haskell, Curry has support for literate programming. So I wrote this blog post as a literate Curry file, which is available for download. If you want to try out the code, you have to install the Curry system KiCS2. The code uses the functional patterns language extension, which is only supported by KiCS2, as far as I know.
The functional fragment of Curry is very similar to Haskell. The only fundamental difference is that Curry does not support type classes.
Let us do some functional programming in Curry. First, we define a type whose values denote me and some of my relatives.
data Person = Paul
| Joachim
| Rita
| Wolfgang
| Veronika
| Johanna
| Jonathan
| Jaromir
Now we define a function that yields the father of a given person if this father is covered by the Person
type.
father :: Person -> Person
father Joachim = Paul
father Rita = Joachim
father Wolfgang = Joachim
father Veronika = Joachim
father Johanna = Wolfgang
father Jonathan = Wolfgang
father Jaromir = Wolfgang
Based on father
, we define a function for computing grandfathers. To keep things simple, we only consider fathers of fathers to be grandfathers, not fathers of mothers.
grandfather :: Person -> Person
grandfather = father . father
Logic programming languages like Prolog are able to search for variable assignments that make a given proposition true. Curry, on the other hand, can search for variable assignments that make a certain expression defined.
For example, we can search for all persons that have a grandfather according to the above data. We just enter
grandfather person where person free
at the KiCS2 prompt. KiCS2 then outputs all assignments to the person
variable for which grandfather person
is defined. For each of these assignments, it additionally prints the result of the expression grandfather person
.
Functions in Curry can actually be non-deterministic, that is, they can return multiple results. For example, we can define a function element
that returns any element of a given list. To achieve this, we use overlapping patterns in our function definition. If several equations of a function definition match a particular function application, Curry takes all of them, not only the first one, as Haskell does.
element :: [el] -> el
element (el : _) = el
element (_ : els) = element els
Now we can enter
element "Hello!"
at the KiCS2 prompt, and the system outputs six different results.
We have already seen how to combine functional and logic programming with Curry. Now we want to do pure logic programming. This means that we only want to search for variable assignments, but are not interested in expression results. If you are not interested in results, you typically use a result type with only a single value. Curry provides the type Success
with the single value success
for doing logic programming.
Let us write some example code about routes between countries. We first introduce a type of some European and American countries.
data Country = Canada
| Estonia
| Germany
| Latvia
| Lithuania
| Mexico
| Poland
| Russia
| USA
Now we want to define a relation called borders
that tells us which country borders which other country. We implement this relation as a function of type
Country -> Country -> Success
that has the trivial result success
if the first country borders the second one, and has no result otherwise.
Note that this approach of implementing a relation is different from what we do in functional programming. In functional programming, we use Bool
as the result type and signal falsity by the result False
. In Curry, however, we signal falsity by the absence of a result.
Our borders
relation only relates countries with those neighbouring countries whose names come later in alphabetical order. We will soon compute the symmetric closure of borders
to also get the opposite relationships.
borders :: Country -> Country -> Success
Canada `borders` USA = success
Estonia `borders` Latvia = success
Estonia `borders` Russia = success
Germany `borders` Poland = success
Latvia `borders` Lithuania = success
Latvia `borders` Russia = success
Lithuania `borders` Poland = success
Mexico `borders` USA = success
Now we want to define a relation isConnected
that tells whether two countries can be reached from each other via a land route. Clearly, isConnected
is the equivalence relation that is generated by borders
. In Prolog, we would write clauses that directly express this relationship between borders
and isConnected
. In Curry, on the other hand, we can write a function that generates an equivalence relation from any given relation and therefore does not only work with borders
.
We first define a type alias Relation
for the sake of convenience.
type Relation val = val -> val -> Success
Now we define what reflexive, symmetric, and transitive closures are.
reflClosure :: Relation val -> Relation val
reflClosure rel val1 val2 = rel val1 val2
reflClosure rel val val = success
symClosure :: Relation val -> Relation val
symClosure rel val1 val2 = rel val1 val2
symClosure rel val2 val1 = rel val1 val2
transClosure :: Relation val -> Relation val
transClosure rel val1 val2 = rel val1 val2
transClosure rel val1 val3 = rel val1 val2 &
transClosure rel val2 val3
where val2 free
The operator &
used in the definition of transClosure
has type
Success -> Success -> Success
and denotes conjunction.
We define the function for generating equivalence relations as a composition of the above closure operators. Note that it is crucial that the transitive closure operator is applied after the symmetric closure operator, since the symmetric closure of a transitive relation is not necessarily transitive.
equivalence :: Relation val -> Relation val
equivalence = reflClosure . transClosure . symClosure
The implementation of isConnected
is now trivial.
isConnected :: Country -> Country -> Success
isConnected = equivalence borders
Now we let KiCS2 compute which countries I can reach from Estonia without a ship or plane. We do so by entering
Estonia `isConnected` country where country free
at the prompt.
We can also implement a nondeterministic function that turns a country into the countries connected to it. For this, we use a guard that is of type Success
. Such a guard succeeds if it has a result at all, which can only be success
, of course.
connected :: Country -> Country
connected country1
| country1 `isConnected` country2 = country2
where country2 free
Curry has a predefined operator
=:= :: val -> val -> Success
that stands for equality.
We can use this operator, for example, to define a nondeterministic function that yields the grandchildren of a given person. Again, we keep things simple by only considering relationships that solely go via fathers.
grandchild :: Person -> Person
grandchild person
| grandfather grandkid =:= person = grandkid
where grandkid free
Note that grandchild
is the inverse of grandfather
.
Functional patterns are a language extension that allows us to use ordinary functions in patterns, not just data constructors. Functional patterns are implemented by KiCS2.
Let us look at an example again. We want to define a function split
that nondeterministically splits a list into two parts.^{1} Without functional patterns, we can implement splitting as follows.
split' :: [el] -> ([el],[el])
split' list | front ++ rear =:= list = (front,rear)
where front, rear free
With functional patterns, we can implement splitting in a much simpler way.
split :: [el] -> ([el],[el])
split (front ++ rear) = (front,rear)
As a second example, let us define a function sublist
that yields the sublists of a given list.
sublist :: [el] -> [el]
sublist (_ ++ sub ++ _) = sub
In the grandchild
example, we showed how we can define the inverse of a particular function. We can go further and implement a generic function inversion operator.
inverse :: (val -> val') -> (val' -> val)
inverse fun val' | fun val =:= val' = val where val free
With this operator, we could also implement grandchild
as inverse grandfather
.
Inverting functions can make our lives a lot easier. Consider the example of parsing. A parser takes a string and returns a syntax tree. Writing a parser directly is a non-trivial task. However, generating a string from a syntax tree is just a simple functional programming exercise. So we can implement a parser in a simple way by writing a converter from syntax trees to strings and inverting it.
We show this for the language of all arithmetic expressions that can be built from addition, multiplication, and integer constants. We first define types for representing abstract syntax trees. These types resemble a grammar that takes precedence into account.
type Expr = Sum
data Sum = Sum Product [Product]
data Product = Product Atom [Atom]
data Atom = Num Int | Para Sum
Now we implement the conversion from abstract syntax trees to strings.
toString :: Expr -> String
toString = sumToString
sumToString :: Sum -> String
sumToString (Sum product products)
= productToString product ++
concatMap ((" + " ++) . productToString) products
productToString :: Product -> String
productToString (Product atom atoms)
= atomToString atom ++
concatMap ((" * " ++) . atomToString) atoms
atomToString :: Atom -> String
atomToString (Num num) = show num
atomToString (Para sum) = "(" ++ sumToString sum ++ ")"
Implementing the parser is now extremely simple.
parse :: String -> Expr
parse = inverse toString
KiCS2 uses a depth-first search strategy by default. However, our parser implementation does not work with depth-first search. So we switch to breadth-first search by entering
:set bfs
at the KiCS2 prompt. Now we can try out the parser by entering
parse "2 * (3 + 4)"
.
Note that our split
function is not the same as the split
function in Curry’s List
module.↩
Let me first describe the MU puzzle shortly. The puzzle deals with strings that may contain the characters , , and . We can derive new strings from old ones using the following rewriting system:
The question is whether it is possible to turn the string into the string using these rules.
You may want to try to solve this puzzle yourself, or you may want to look up the solution on the Wikipedia page.
The code is not only concerned with deriving from , but with derivations as such.
We import Data.List
:
import Data.List
We define the type Sym
of symbols and the type Str
of symbol strings:
data Sym = M | I | U deriving Eq
type Str = [Sym]
instance Show Sym where
show M = "M"
show I = "I"
show U = "U"
showList str = (concatMap show str ++)
Next, we define the type Rule
of rules as well as the list rules
that contains all rules:
data Rule = R1 | R2 | R3 | R4 deriving Show
rules :: [Rule]
rules = [R1,R2,R3,R4]
We first introduce a helper function that takes a string and returns the list of all splits of this string. Thereby, a split of a string str
is a pair of strings str1
and str2
such that str1 ++ str2 == str
. A straightforward implementation of splitting is as follows:
splits' :: Str -> [(Str,Str)]
splits' str = zip (inits str) (tails str)
The problem with this implementation is that walking through the result list takes quadratic time, even if the elements of the list are left unevaluated. The following implementation solves this problem:
splits :: Str -> [(Str,Str)]
splits str = zip (map (flip take str) [0 ..]) (tails str)
Next, we define a helper function replace
. An expression replace old new str
yields the list of all strings that can be constructed by replacing the string old
inside str
by new
.
replace :: Str -> Str -> Str -> [Str]
replace old new str = [front ++ new ++ rear |
(front,rest) <- splits str,
old `isPrefixOf` rest,
let rear = drop (length old) rest]
We are now ready to implement the function apply
, which performs rule application. This function takes a rule and a string and produces all strings that can be derived from the given string using the given rule exactly once.
apply :: Rule -> Str -> [Str]
apply R1 str | last str == I = [str ++ [U]]
apply R2 (M : tail) = [M : tail ++ tail]
apply R3 str = replace [I,I,I] [U] str
apply R4 str = replace [U,U] [] str
apply _ _ = []
Now we want to build derivation trees. A derivation tree for a string str
has the following properties:
str
.str
by a single rule application.We first define types for representing derivation trees:
data DTree = DTree Str [DSub]
data DSub = DSub Rule DTree
Now we define the function dTree
that turns a string into its derivation tree:
dTree :: Str -> DTree
dTree str = DTree str [DSub rule subtree |
rule <- rules,
subStr <- apply rule str,
let subtree = dTree subStr]
A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations:
data Deriv = Deriv [DStep] Str
data DStep = DStep Str Rule
instance Show Deriv where
show (Deriv steps goal) = " " ++
concatMap show steps ++
show goal ++
"\n"
showList derivs
= (concatMap ((++ "\n") . show) derivs ++)
instance Show DStep where
show (DStep origin rule) = show origin ++
"\n-> (" ++
show rule ++
") "
Now we implement a function derivs
that converts a derivation tree into the list of all derivations that start with the tree’s root label. The function derivs
traverses the tree in breadth-first order.
derivs :: DTree -> [Deriv]
derivs tree = worker [([],tree)] where
worker :: [([DStep],DTree)] -> [Deriv]
worker tasks = rootDerivs tasks ++
worker (subtasks tasks)
rootDerivs :: [([DStep],DTree)] -> [Deriv]
rootDerivs tasks = [Deriv (reverse revSteps) root |
(revSteps,DTree root _) <- tasks]
subtasks :: [([DStep],DTree)] -> [([DStep],DTree)]
subtasks tasks = [(DStep root rule : revSteps,subtree) |
(revSteps,DTree root subs) <- tasks,
DSub rule subtree <- subs]
Finally, we implement the function derivations
which takes two strings and returns the list of those derivations that turn the first string into the second:
derivations :: Str -> Str -> [Deriv]
derivations start end
= [deriv | deriv@(Deriv _ goal) <- derivs (dTree start),
goal == end]
You may want to enter
derivations [M,I] [M,U,I]
at the GHCi prompt to see the derivations
function in action. You can also enter
derivations [M,I] [M,U]
to get an idea about the solution to the MU puzzle.
Constraint
kind. In this blog post, I will show some examples of how this new feature can be used. This is a write-up of my Theory Lunch talk from 7 February 2013. The source of this article is a literate Haskell file, which you can download and load into GHCi.
The example code in this article needs support for the Constraint
kind, of course. So we have to enable the appropriate language extension (which is surprisingly called ConstraintKinds
instead of ConstraintKind
). Furthermore, we want to make use of type families. All in all, this leads to the following LANGUAGE
pragma:
{-# LANGUAGE ConstraintKinds, TypeFamilies #-}
We will define our own version of the Monad
class. Therefore, we have to hide the Monad
class from the Prelude:
import Prelude hiding (Monad (..))
We will need the module Data.Set
from the containers
package for some example code:
import Data.Set
Last, but not least, we have to import the kind Constraint
:
import GHC.Exts (Constraint)
Originally, classes and contexts were not first-class citizens in Haskell. The introduction of the Constraint
kind has changed this. Classes and contexts can now be used as parameters of types, for example. This is because they are now types themselves.
However, classes and contexts are still not types in the strict sense. There are still no values of type Eq
or Eq Integer
, for example. As I have explained in my previous post, Haskell’s notion of type is more general than the usual one. In particular, functions on types are types themselves. However, they are not types of kind *
. The same holds for classes and contexts. They are not types of kind *
, but they are types of some other kinds, so that they can generally be used in places where types can be used.
The new kind Constraint
, which is exported by GHC.Exts
, is the kind of all contexts. Classes and contexts are now handled as follows:
Each class with parameters of kinds k_1
through k_n
is a type of kind k_1 -> k_n -> Constraint
.
Each tuple type (t_1, ..., t_n)
where t_1
through t_n
are of kind Constraint
is also of kind Constraint
and denotes the conjunction of t_1
through t_n
. As a corner case, the nullary tuple type ()
is also of type Constraint
and denotes the constraint that is always true.
A context can be any type of kind Constraint
.
These rules guarantee that classes and contexts can be used as before. For example, (Read val, Show val)
is still a context, because Read
and Show
are types of kind * -> Constraint
, so Read val
and Show val
are types of kind Constraint
, and therefore (Read val, Show val)
is a type of kind Constraint
.
However, classes and constraints can be used in new ways now. Here are some examples:
Classes can be partially applied, and the results can be used like classes again.
Classes, partially applied classes, and contexts can be parameters of types and instances of classes.
Aliases of classes, partially applied classes, and contexts can be defined using type
declarations.
Families of classes, partially applied classes, and contexts can be defined using type synonym families.
In the remainder of this article, I will illustrate the last two of these points.
Sometimes, the same conjunction of several contexts appears in multiple types. In such cases, it can become cumbersome to always write these conjunctions explicitly. For example, there might be several functions in a library that deal with values that can be turned into strings and generated from strings. In this case, the types of these functions will typically have a context that contains constraints Show val
and Read val
. With the Constraint
kind, we can define context aliases Text val
as follows:
type Text val = (Show val, Read val)
Instead of Show val, Read val
, we can now simply write Text val
in contexts.
A few years ago, there was an attempt to implement support for context aliases (often called class aliases) in GHC. With the Constraint
kind, this is now obsolete, as context aliases are now just a special kind of type aliases.
We will illustrate the use of context families by defining a generalized version of the Monad
class.
The actual definition of a monad from category theory says that a monad on a category 𝒞 consists of an endofunctor on 𝒞 and some natural transformations. In Haskell, however, a monad is defined to be an instance of the Monad
class, which contains the two methods return
and (>>=)
. Haskell monads are monads on the category Hask, the category of kind-*
types and functions.
There are monads in the category theory sense that are almost monads in the Haskell sense, but not quite. One example is the monad behind Haskell’s Set
type. There are reasonable implementations of return
and (>>=)
for Set
:
setReturn :: el -> Set el
setReturn = singleton
setBind :: (Ord el, Ord el') =>
Set el -> (el -> Set el') -> Set el'
setBind set1 gen2 = unions (Prelude.map gen2 (toList set1))
The problem is that the type of setBind
is too restrictive, as it restricts the choice of element types by a context, whereas there is no such restriction in the type of (>>=)
. The reason for the restriction on element types is that the Set
monad is not a monad on the category Hask, but on the full subcategory of Hask whose objects are the instances of Ord
.
Using context families, we can generalize the Monad
class such that restrictions on the type parameters of Monad
instances become possible. We introduce a type synonym family Object
such that Object mon val
is the constraint that the parameter val
must fulfill when working with the Monad
instance mon
. We provide a default definition for Object
that does not restrict monad parameters. Finally, we change the types of return
and (>>=)
such that they restrict their Monad
instance parameters accordingly. The new declaration of the Monad
class is as follows:
class Monad mon where
type Object mon val :: Constraint
type Object mon val = ()
return :: Object mon val =>
val -> mon val
(>>=) :: (Object mon val, Object mon val') =>
mon val -> (val -> mon val') -> mon val'
We can now make Set
an instance of Monad
:
instance Monad Set where
type Object Set el = Ord el
return = setReturn
(>>=) = setBind
We can make every instance of the original Monad
class an instance of our new Monad
class. Because of the default definition of Object
, we do not need to define Object
in these cases. So the instance
declarations can look exactly like those for the original Monad
class. Here is an example for the []
type:
instance Monad [] where
return x = [x]
(>>=) = flip concatMap