Haskell Type Systems – Baking Dimension into a Type

haskelltype-safetytype-systems

Suppose I want to write a library that deals with vectors and matrices. Is it possible to bake the dimensions into the types, so that operations of incompatible dimensions generate an error at compile time?

For example I would like the signature of dot product to be something like

dotprod :: Num a, VecDim d => Vector a d -> Vector a d -> a

where the d type contains a single integer value (representing the dimension of these Vectors).

I suppose this could be done by defining (by hand) a separate type for each integer, and group them in a type class called VecDim. Is there some mechanism to "generate" such types?

Or perhaps some better/simpler way of achieving the same thing?

Best Answer

To expand on @KarlBielefeldt's answer, here's a full example of how to implement Vectors - lists with a statically-known number of elements - in Haskell. Hold on to your hat...

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}

import Prelude hiding (foldr, zipWith)
import qualified Prelude
import Data.Type.Equality
import Data.Foldable
import Data.Traversable

As you can see from the long list of LANGUAGE directives, this'll only work with a recent version of GHC.

We need a way of representing lengths within the type system. By definition, a natural number is either zero (Z) or it's the successor of some other natural number (S n). So, for example, the number 3 would be written S (S (S Z)).

data Nat = Z | S Nat

With the DataKinds extension, this data declaration introduces a kind called Nat and two type constructors called S and Z - in other words we have type-level natural numbers. Note that the types S and Z don't have any member values - only types of kind * are inhabited by values.

Now we introduce a GADT representing vectors with a known length. Note the kind signature: Vec requires a type of kind Nat (ie a Z or an S type) to represent its length.

data Vec :: Nat -> * -> * where
    VNil :: Vec Z a
    VCons :: a -> Vec n a -> Vec (S n) a
deriving instance (Show a) => Show (Vec n a)
deriving instance Functor (Vec n)
deriving instance Foldable (Vec n)
deriving instance Traversable (Vec n)

The definition of vectors is similar to that of linked lists, with some extra type-level information about its length. A vector is either VNil, in which case it has a length of Z(ero), or it's a VCons cell adding an item to another vector, in which case its length is one more than the other vector (S n). Note that there's no constructor argument of type n. It's just used at compile time to track lengths, and will be erased before the compiler generates machine code.

We've defined a vector type which carries around static knowledge of its length. Let's query the type of a few Vecs to get a feel for how they work:

ghci> :t (VCons 'a' (VCons 'b' VNil))
(VCons 'a' (VCons 'b' VNil)) :: Vec ('S ('S 'Z)) Char  -- (S (S Z)) means 2
ghci> :t (VCons 13 (VCons 11 (VCons 3 VNil)))
(VCons 13 (VCons 11 (VCons 3 VNil))) :: Num a => Vec ('S ('S ('S 'Z))) a  -- (S (S (S Z))) means 3

The dot product proceeds just as it would for a list:

-- note that the two Vec arguments are declared to have the same length
vap :: Vec n (a -> b) -> Vec n a -> Vec n b
vap VNil VNil = VNil
vap (VCons f fs) (VCons x xs) = VCons (f x) (vap fs xs)

zipWith :: (a -> b -> c) -> Vec n a -> Vec n b -> Vec n c
zipWith f xs ys = fmap f xs `vap` ys

dot :: Num a => Vec n a -> Vec n a -> a
dot xs ys = foldr (+) 0 $ zipWith (*) xs ys

vap, which 'zippily' applies a vector of functions to a vector of arguments, is Vec's applicative <*>; I didn't put it in an Applicative instance because it gets messy. Note also that I'm using the foldr from the compiler-generated instance of Foldable.

Let's try it out:

ghci> let v1 = VCons 2 (VCons 1 VNil)
ghci> let v2 = VCons 4 (VCons 5 VNil)
ghci> v1 `dot` v2
13
ghci> let v3 = VCons 8 (VCons 6 (VCons 1 VNil))
ghci> v1 `dot` v3
<interactive>:20:10:
    Couldn't match type ‘'S 'Z’ with ‘'Z’
    Expected type: Vec ('S ('S 'Z)) a
      Actual type: Vec ('S ('S ('S 'Z))) a
    In the second argument of ‘dot’, namely ‘v3’
    In the expression: v1 `dot` v3

Great! You get a compile-time error when you try to dot vectors whose lengths don't match.

Here's an attempt at a function to concatenate vectors together:

-- This won't compile because the type checker can't deduce the length of the returned vector
-- VNil +++ ys = ys
-- (VCons x xs) +++ ys = VCons x (concat xs ys)

The length of the output vector would be the sum of the lengths of the two input vectors. We need to teach the type checker how to add Nats together. For this we use a type-level function:

type family (n :: Nat) :+: (m :: Nat) :: Nat where
    Z :+: m = m
    (S n) :+: m = S (n :+: m)

This type family declaration introduces a function on types called :+: - in other words, it's a recipe for the type checker to calculate the sum of two natural numbers. It's defined recursively - whenever the left operand is greater than Zero we add one to the output and reduce it by one in the recursive call. (It's a good exercise to write a type function which multiplies two Nats.) Now we can make +++ compile:

infixr 5 +++
(+++) :: Vec n a -> Vec m a -> Vec (n :+: m) a
VNil +++ ys = ys
(VCons x xs) +++ ys = VCons x (concat xs ys)

Here's how you use it:

ghci> VCons 1 (VCons 2 VNil) +++ VCons 3 (VCons 4 VNil)
VCons 1 (VCons 2 (VCons 3 (VCons 4 VNil)))

So far so simple. What about when we want to do the opposite of concatenation and split a vector in two? The lengths of the output vectors depend on the runtime value of the arguments. We'd like to write something like this:

-- this won't work because there aren't any values of type `S` and `Z`
-- split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)

but unfortunately Haskell won't let us do that. Allowing the value of the n argument to appear in the return type (this is commonly called a dependent function or pi type) would require "full-spectrum" dependent types, whereas DataKinds only gives us promoted type constructors. To put it another way, the type constructors S and Z don't appear at the value level. We'll have to settle for singleton values for a run-time representation of a certain Nat.*

data Natty (n :: Nat) where
    Zy :: Natty Z  -- pronounced 'zed-y'
    Sy :: Natty n -> Natty (S n)  -- pronounced 'ess-y'
deriving instance Show (Natty n)

For a given type n (with kind Nat), there is precisely one term of type Natty n. We can use the singleton value as a run-time witness for n: learning about a Natty teaches us about its n and vice versa.

split :: Natty n ->
         Vec (n :+: m) a ->  -- the input Vec has to be at least as long as the input Natty
         (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (Cons x xs) = let (ys, zs) = split n xs
                           in (Cons x ys, zs)

Let's take it for a spin:

ghci> split (Sy (Sy Zy)) (VCons 1 (VCons 2 (VCons 3 VNil)))
(VCons 1 (VCons 2 VNil), VCons 3 VNil)
ghci> split (Sy (Sy Zy)) (VCons 3 VNil)
<interactive>:116:21:
    Couldn't match type ‘'S ('Z :+: m)’ with ‘'Z’
    Expected type: Vec ('S ('S 'Z) :+: m) a
      Actual type: Vec ('S 'Z) a
    Relevant bindings include
      it :: (Vec ('S ('S 'Z)) a, Vec m a) (bound at <interactive>:116:1)
    In the second argument of ‘split’, namely ‘(VCons 3 VNil)’
    In the expression: split (Sy (Sy Zy)) (VCons 3 VNil)

In the first example, we successfully split a three-element vector at position 2; then we got a type error when we tried to split a vector at a position past the end. Singletons are the standard technique for making a type depend on a value in Haskell.

_{* The singletons library contains some Template Haskell helpers to generate singleton values like Natty for you.}

Last example. What about when you don't know the dimensionality of your vector statically? For example, what if we're trying to build a vector from run-time data in the form of a list? You need the type of the vector to depend on the length of the input list. To put it another way, we can't use foldr VCons VNil to build a vector because the type of the output vector changes with each iteration of the fold. We need to keep the length of the vector a secret from the compiler.

data AVec a = forall n. AVec (Natty n) (Vec n a)
deriving instance (Show a) => Show (AVec a)

fromList :: [a] -> AVec a
fromList = Prelude.foldr cons nil
    where cons x (AVec n xs) = AVec (Sy n) (VCons x xs)
          nil = AVec Zy VNil

AVec is an existential type: the type variable n does not appear in the return type of the AVec data constructor. We're using it to simulate a dependent pair: fromList can't tell you the length of the vector statically, but it can return something you can pattern-match on to learn the length of the vector - the Natty n in the first element of the tuple. As Conor McBride puts it in a related answer, "You look at one thing, and in doing so, learn about another".

This is a common technique for existentially quantified types. Because you can't actually do anything with data for which you don't know the type - try writing a function of data Something = forall a. Sth a - existentials often come bundled up with GADT evidence which allows you to recover the original type by performing pattern-matching tests. Other common patterns for existentials include packaging up functions to process your type (data AWayToGetTo b = forall a. HeresHow a (a -> b)) which is a neat way of doing first-class modules, or building-in a type class dictionary (data AnOrd = forall a. Ord a => AnOrd a) which can help emulate subtype polymorphism.

ghci> fromList [1,2,3]
AVec (Sy (Sy (Sy Zy))) (VCons 1 (VCons 2 (VCons 3 Nil)))

Dependent pairs are useful whenever the static properties of data depend on dynamic information not available at compile time. Here's filter for vectors:

filter :: (a -> Bool) -> Vec n a -> AVec a
filter f = foldr (\x (AVec n xs) -> if f x
                                    then AVec (Sy n) (VCons x xs)
                                    else AVec n xs) (AVec Zy VNil)

To dot two AVecs, we need to prove to GHC that their lengths are equal. Data.Type.Equality defines a GADT which can only be constructed when its type arguments are the same:

data (a :: k) :~: (b :: k) where
    Refl :: a :~: a  -- short for 'reflexivity'

When you pattern-match on Refl, GHC knows that a ~ b. There are also a few functions to help you work with this type: we'll be using gcastWith to convert between equivalent types, and TestEquality to determine whether two Nattys are equal.

To test the equality of two Nattys, we're going to need to make use of the fact that if two numbers are equal, then their successors are also equal (:~: is congruent over S):

congSuc :: (n :~: m) -> (S n :~: S m)
congSuc Refl = Refl

Pattern matching on Refl on the left-hand side lets GHC know that n ~ m. With that knowledge, it's trivial that S n ~ S m, so GHC lets us return a new Refl right away.

Now we can write an instance of TestEquality by straightforward recursion. If both numbers are zero, they are equal. If both numbers have predecessors, they are equal iff the predecessors are equal. (If they're not equal, just return Nothing.)

instance TestEquality Natty where
    -- testEquality :: Natty n -> Natty m -> Maybe (n :~: m)
    testEquality Zy Zy = Just Refl
    testEquality (Sy n) (Sy m) = fmap congSuc (testEquality n m)  -- check whether the predecessors are equal, then make use of congruence
    testEquality Zy _ = Nothing
    testEquality _ Zy = Nothing

Now we can put the pieces together to dot a pair of AVecs of unknown length.

dot' :: Num a => AVec a -> AVec a -> Maybe a
dot' (AVec n u) (AVec m v) = fmap (\proof -> gcastWith proof (dot u v)) (testEquality n m)

First, pattern match on the AVec constructor to pull out a runtime representation of the vectors' lengths. Now use testEquality to determine whether those lengths are equal. If they are, we'll have Just Refl; gcastWith will use that equality proof to ensure that dot u v is well-typed by discharging its implicit n ~ m assumption.

ghci> let v1 = fromList [1,2,3]
ghci> let v2 = fromList [4,5,6]
ghci> let v3 = fromList [7,8]
ghci> dot' v1 v2
Just 32
ghci> dot' v1 v3
Nothing  -- they weren't the same length

Note that, since a vector without static knowledge of its length is basically a list, we've effectively re-implemented the list version of dot :: Num a => [a] -> [a] -> Maybe a. The difference is that this version is implemented in terms of the vectors' dot. Here's the point: before the type checker will allow you to call dot, you must have tested whether the input lists have the same length using testEquality. I am prone to getting if-statements the wrong way round, but not in a dependently-typed setting!

You can't avoid using existential wrappers at the edges of your system, when you're dealing with runtime data, but you can use dependent types everywhere inside your system and keep the existential wrappers at the edges, when you perform input validation.

Since Nothingis not very informative, you could further refine the type of dot' to return a proof that the lengths aren't equal (in the form of evidence that their difference is not 0) in the failure case. This is pretty similar to the standard Haskell technique of using Either String a to possibly return an error message, although a proof term is far more computationally useful than a string!

Thus ends this whistle-stop tour of some of the techniques that are common in dependently-typed Haskell programming. Programming with types like this in Haskell is really cool, but really awkward at the same time. Breaking up all your dependent data into lots of representations which mean the same thing - Nat the type, Nat the kind, Natty n the singleton - is really quite cumbersome, despite the existence of code-generators to help with the boilerplate. There are also presently limitations on what can be promoted to the type level. It's tantalising though! The mind boggles at the possibilities - in the literature there are examples in Haskell of strongly typed printf, database interfaces, UI layout engines...

If you want some further reading, there's a growing body of literature about dependently typed Haskell, both published and on sites like Stack Overflow. A good starting point is the Hasochism paper - the paper goes through this very example (among others), discussing the painful parts in some detail. The Singletons paper demonstrates the technique of singleton values (such as Natty). For more information about dependent typing in general, the Agda tutorial is a good place to start; also, Idris is a language in development that's (roughly) designed to be "Haskell with dependent types".

Type systems prevent errors

Type systems eliminates illegal programs. Consider the following Python code.

 a = 'foo'
 b = True
 c = a / b

In Python, this program fails; it throws an exception. In a language like Java, C#, Haskell, whatever, this isn't even a legal program. You entirely avoid these errors because they simply aren't possible in the set of input programs.

Similarly, a better type system rules out more errors. If we jump up to super advanced type systems we can say things like this:

 Definition divide x (y : {x : integer | x /= 0}) = x / y

Now the type system guarantees that there aren't any divide-by-0 errors.

What sort of errors

Here's a brief list of what errors type systems can prevent

Out-of-range errors
SQL injection
Generalizing 2, many safety issues (what taint checking is for in Perl)
Out-of-sequence errors (forgetting to call init)
Forcing a subset of values to be used (for example, only integers greater than 0)
~~Nefarious kittens~~ (Yes, it was a joke)
Loss-of-precision errors
Software transactional memory (STM) errors (this needs purity, which also requires types)
Generalizing 8, controlling side effects
Invariants over data structures (is a binary tree balanced?)
Forgetting an exception or throwing the wrong one

And remember, this is also at compile time. No need to write tests with 100% code coverage to simply check for type errors, the compiler just does it for you :)

Case study: Typed lambda calculus

Alright, let's examine the simplest of all type systems, simply typed lambda calculus.

Basically there are two types,

Type = Unit | Type -> Type

And all terms are either variables, lambdas, or application. Based on this, we can prove that any well typed program terminates. There is never a situation where the program will get stuck or loop forever. This isn't provable in normal lambda calculus because well, it isn't true.

Think about this, we can use type systems to guarentee that our program doesn't loop forever, rather cool right?

Detour into dynamic types

Dynamic type systems can offer identical guarantees as static type systems, but at runtime rather than compile time. Actually, since it's runtime, you can actually offer more information. You lose some guarantees however, particularly about static properties like termination.

So dynamic types don't rule out certain programs, but rather route malformed programs to well-defined actions, like throwing exceptions.

TLDR

So the long and the short of it, is that type systems rule out certain programs. Many of the programs are broken in some way, therefore, with type systems we avoid these broken programs.

Best Answer

Related Solutions

Haskell – Why No Type-Level Lambda Abstractions?

Type Systems – What Are the Safety Benefits of a Type System

Type systems prevent errors

What sort of errors

Case study: Typed lambda calculus

Detour into dynamic types

TLDR

Related Topic