Dependently typed programming and theorem proving in Haskell

Programming languages with dependent types allow us to specify powerful properties of values using the type system. By employing the Curry–Howard correspondence, we can also use these languages as proof languages for higher-order logics. In this blog post, I want to demonstrate that Haskell as supported by the Glasgow Haskell Compiler (GHC) can give us almost the same features.

The complete article can be seen as a kind of literate program. Extracting all code snippets and glueing them together yields a valid Haskell module. For your convenience, you can download the complete code as an ordinary Haskell source code file.

Prerequisites

Plain Haskell 2010 is not enough for emulating dependent types. We need support for generalized algebraic data types and for type synonym families. Furthermore, we want to use operator syntax on the type level for making our code easier to read. We declare the necessary language extensions:

{-# LANGUAGE GADTs, TypeFamilies, TypeOperators #-}

We want to define better typed versions of certain Prelude functions. For convenience, we want them to have the same names as their Prelude counterparts. Therefore, we hide the corresponding Prelude functions:

import Prelude hiding (head, tail, (++), (+), replicate)

Since Haskell does not come with a type for natural numbers, we make use of the natural-numbers package:

import Data.Natural

We are now ready to turn to the actual development.

Dependently typed programming

Dependent types are types that are parameterized by values. In Haskell, however, parameters of types have to be types again. So we represent values by types. Let us look how this works for natural numbers. The usual ADT declaration of natural numbers uses two data constructors Zero and Succ. These data constructors become type constructors now:

data Zero

data Succ nat

Note that it does not matter what values the types Zero and Succ cover. It is most reasonable to leave these types empty.

Now we define a list type with a length parameter almost like we would do in Agda:

data List el len where

    Nil  ::                      List el Zero

    Cons :: el -> List el len -> List el (Succ len)

If we let GHCi compute the type of

Cons 'H' $ Cons 'i' $ Cons '!' $ Nil ,

for example, it will correctly answer with

List Char (Succ (Succ (Succ Zero))) .

We can drop the static length information from lists by converting them to Haskell’s ordinary list type:

toList :: List el len -> [el]
toList Nil           = []
toList (Cons el els) = el : toList els

Using toList, we implement a Show instance for displaying values of type List in GHCi:

instance (Show el) => Show (List el len) where

    show = show . toList

Let us actually make use of the length parameter in list types. We first construct a function head, which can only be applied to non-empty lists and thus cannot fail:

head :: List el (Succ len) -> el
head (Cons el _) = el

The type of the corresponding function tail also shows that the length of a list’s tail is one less than the length of the original list:

tail :: List el (Succ len) -> List el len
tail (Cons _ els) = els

A more advanced example of using static length information is list concatenation. Let us construct a dependently typed version of the well-known ++-operator. Its type should specify that the length of the result is the sum of the lengths of the arguments. So we have to implement addition of type-level naturals. We introduce a type synonym family for this:

infixl 6 :+

type family nat1 :+ nat2 :: *

type instance Zero      :+ nat2 = nat2
type instance Succ nat1 :+ nat2 = Succ (nat1 :+ nat2)

We can now define list concatenation as follows:

infixr 5 ++

(++) :: List el len1 -> List el len2 -> List el (len1 :+ len2)
Nil         ++ list = list
Cons el els ++ list = Cons el (els ++ list)

Note that the type checker guarantees that our implementation of concatenation really fulfills the length constraint expressed in the type.

We want to implement a dependently typed version of the Prelude function replicate, which takes a natural number len and some value val and returns a list that contains len times the value val. Its desired type would be

(len : Nat) -> el -> List el len

in an Agda-like language. This type gives the first argument the name len. Since len is used in the result type, the type of a result depends on the concrete argument.

The problem for our Haskell emulation of dependent types is that the len argument of the function must be a value, while the len parameter of List must be a type. So we have to link value-level naturals and type-level naturals. We do this by defining a type Nat with one parameter such that for each n ∈ ℕ with a type-level representation nat, the type Nat nat contains exactly one value, which represents n:

data Nat nat where

    Zero ::            Nat Zero

    Succ :: Nat nat -> Nat (Succ nat)

If we ask GHCi for the type of the value

Succ (Succ (Succ Zero)) ,

we get the answer

Nat (Succ (Succ (Succ Zero))) ,

which demonstrates that the value is now reflected at the type level.

As in the case of List, we can define a conversion function for forgetting the type parameter of Nat, and a Show instance based on it:

toNatural :: Nat nat -> Natural
toNatural Zero       = 0
toNatural (Succ nat) = succ (toNatural nat)

instance Show (Nat nat) where

    show = show . toNatural

Analogously to the implementation of (++), we can implement addition for values of type Nat:

infixl 6 +

(+) :: Nat nat1 -> Nat nat2 -> Nat (nat1 :+ nat2)
Zero      + nat2 = nat2
Succ nat1 + nat2 = Succ (nat1 + nat2)

However, we actually wanted to implement a dependently typed replicate. We can do this as follows:

replicate :: Nat len -> el -> List el len
replicate Zero       _  = Nil
replicate (Succ len) el = Cons el (replicate len el)

The definition of Nat ensures that the first argument of replicate and the type variable len represent the same natural number. So the type of replicate guarantees that the first argument determines the length of the result list.

Theorem proving

Theorem proving in a dependently typed programming language works by exploiting the Curry–Howard correspondence. Propositions correspond to types and their proofs correspond to expressions of the corresponding types. In this blog post, we will only discuss how to prove equations. Proving other things works analogously.

Let us first define the equality predicate. We can define a predicate by introducing a corresponding GADT. The data constructors of this GADT serve as atomic proofs, their types correspond to axioms. A type constructor (:==) that implements equality is defined as follows:

infix 4 :==

data val1 :== val2 where

    Refl :: val :== val

We implement a Show instance for (:==) that enables us to show the only possible normal form of an equality proof, which is Refl:

instance Show (val1 :== val2) where

    show Refl = "Refl"

We will now prove that 0 is a right unit of addition, that is,

n ∈ ℕ . n + 0 = n .

It might seem reasonable to give the proof the polymorphic type

nat :+ Zero :== nat .

However, the proof has to be done by induction, so that we would have to do pattern matching on the natural number parameter. We cannot do pattern matching on a type, so we have to pass the natural number represented by nat as a value of type Nat nat. We implement the proof as follows:

rightUnit :: Nat nat -> nat :+ Zero :== nat
rightUnit Zero       = Refl
rightUnit (Succ nat) = case rightUnit nat of
                           Refl -> Refl

As a second and last example, we proof that addition is associative, that is,

n₁, n₂, n₃ ∈ ℕ . (n₁ + n₂) + n₃ = n₁ + (n₂ + n₃) .

The Haskell code is as follows:

assoc :: Nat nat1                                          ->
         Nat nat2                                          ->
         Nat nat3                                          ->
         (nat1 :+ nat2) :+ nat3 :== nat1 :+ (nat2 :+ nat3)
assoc Zero        nat2 nat3 = Refl
assoc (Succ nat1) nat2 nat3 = case assoc nat1 nat2 nat3 of
                                  Refl -> Refl

Note that we pass all three natural numbers involved as values of Nat, although we only do pattern matching on the first one. The reason is that we have to specify the second and third number in the recursive call. If we would not do this, they would have to be inferred, which turns out to be impossible.

So have we shown that Haskell can replace Agda if we are willing to write a bit of boilerplate code? Not really. A crucial property of Agda, which is important for proving theorems, is that all values are fully defined. In Haskell, however, values may be partially or completely undefined. In particular, the value ⊥, which denotes undefinedness, is an element of every type. So every proposition has at least one proof, which makes our logic inconsistent.

A solution is to accept a value as a proof only if it does not contain ⊥. We can check if ⊥ is present by evaluating the value completely. This succeeds if and only if there is no ⊥. There are two problems with this approach. First, it causes a runtime overhead, which is not necessary when using a language like Agda. Second, complete evaluation is not possible for arbitrary functions, since it would mean applying the functions in question to all possible arguments. Since function types correspond to implications, and implications occur often in general statements, this is a serious drawback.

So theorem proving is obviously something that should be left to languages that can handle it better than Haskell can. That said, I think that it is nevertheless fun to see how far we can go with present-day Haskell.

About these ads

5 comments on “Dependently typed programming and theorem proving in Haskell

  1. mmirman says:

    As it turns out, your Haskell theorems do not need to use values, thus solving your ⊥-problem. You can use logic programming with type classes to prove theorems. Also, your method of theorem proving also fails to ensure the type safety of the theorem statements themselves. :+ could still be extended invalidating the truth of the theorems in it’s previous uses. The solution to these two problems is posted here: http://kormacode.blogspot.com/2012/05/almost-lf-in-haskell.html

    I do really like this method however, the syntax seems much cleaner.

    • I could surely extend :+, like this, for example:

      type instance () :+ Zero = Suc Zero

      However, this would not really invalidate my theorems. The fact that 0 is a right unit of addition, for example, is encoded by the Haskell type

      Nat nat -> nat :+ Zero :== nat .

      If I pass a value of type Nat () to the proof of this theorem, I will receive ⊥ as the result, since the only value of type Nat () is ⊥, and thus the pattern matching done by the proof will fail. However, I have to check every alleged proof for bottoms anyhow.

      The approach you present in Logic Programming with GADTs and Classes seems to be very similar to mine. The List type you define there is completely analogous to my Nat type. The difference is that you use classes with functional dependencies where I use type synonym families. As a result, you implement the different cases of append via different instance declarations, while in my definition of (+), I use the fact that pattern matching on GADT values restricts the type parameters of the GADT.

      I cannot see how the definition of the Iso class in Almost LF in Haskell really ensures that the only nat with Iso Nat nat are the type-level natural numbers. I could very well do this:

      data Bogus
      
      instance Iso Nat Bogus

      Note that even the informal requirement that into and outof form an isomorphism is fulfilled, since both Nat Bogus and Bogus are empty.

      That said, it is possible to prevent a Haskell-encoded predicate from being extended. I describe a generic approach to this in Subsection 6.2 of Generic Record Combinators with Static Type Checking.

      Instead of using this generic implementation, you can also specialize it manually for each predicate. For example, you can define a natural number predicate as follows:

      data All_Nat item = All_Nat (item Z)
                                  (forall nat . (Nat nat) =>
                                                item (S nat))
      
      closed_Nat :: (forall nat . (Nat nat) => item nat) ->
                    All_Nat item
      closed_Nat item = All_Nat item item
      
      class Nat nat where
      
          specialize_Nat :: All_Nat item -> item nat
      
      instance Nat Z where
      
          specialize_Nat (All_Nat item _) = item
      
      instance (Nat nat) => Nat (S nat) where
      
          specialize_Nat (All_Nat _ item) = item

      Note that the type of specialize_Nat is isomorphic to

      All_Nat item -> forall nat . (Nat nat) => item nat .

      The informal requirement for instances of Nat is that closed_Nat and specialize_Nat are inverses of each other.

      If you try to make the Bogus type from above an instance of Nat, you will run into problems. There is no implementation of specialize_Nat that preserves the property that closed_Nat and specialize_Nat are each other’s inverses. Actually, there is no type-correct implementation of specialize_Nat except ⊥.

  2. mmirman says:

    Unless I am incorrect, in the ⊥ instance, the only difference between the class solution and the data family solution is that in the class solution, because the proofs are entirely on the level of types, they all get automatically evaluated at compile time, rather than having to add a check for ⊥ manually to run at runtime.

    For ensuring classes are closed:

    data Bogus
    
    instance Iso Nat Bogus where
    
        outof ? = ?

    If you have Iso Nat Bogus, outof :: Nat Bogus -> Bogus. So you could define it as, outof _ = ⊥ yes. What is (or should have been), what kept these closed was that the module did not export into and outof. Your solution is also cool, but I think still suffers the problem of requiring that every type have its own module (unless you want to perform bottom checks).

    • If you are doing proofs at the type level using classes, then you probably do not need to evaluate proof terms at runtime. When I said that your class-based and my type-family-based solution were similar, I only referred to how we implement functions like append and (+) on both the type- and the value-level. I think in your blog posts you did not show how to actually proof things, did you?

      If your module does not export into and outof, you can still give the instance declaration for Iso Nat Bogus I showed, since this declaration does not contain any method implementations.

      Unfortunately, my approach suffers from the need of bottom checks even if I put stuff related to a particular type in a separate module. Someone else can always write something like this:

      stupidity :: Zero :== Succ nat
      stupidity = undefined

      So let us use Agda for proofs.

      • mmirman says:

        No, In my post my focus was not on theorem proving, just that an embedding of “Sort of LF” was possible, considering that we don’t have termination or totality checking in either of our cases. I’m only using the runtime to show that we can use this technique to meta program Haskell, and not just prove properties of Haskell.

        I’m working under the assumption of “-Werror” being used, since the compiler can perform that check.

When replying to another comment, please press that comment’s “Reply” button first.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s