Type systems prevent errors
Type systems eliminates illegal programs. Consider the following Python code.
a = 'foo'
b = True
c = a / b
In Python, this program fails; it throws an exception. In a language like Java, C#, Haskell, whatever, this isn't even a legal program. You entirely avoid these errors because they simply aren't possible in the set of input programs.
Similarly, a better type system rules out more errors. If we jump up to super advanced type systems we can say things like this:
Definition divide x (y : {x : integer | x /= 0}) = x / y
Now the type system guarantees that there aren't any divide-by-0 errors.
What sort of errors
Here's a brief list of what errors type systems can prevent
- Out-of-range errors
- SQL injection
- Generalizing 2, many safety issues (what taint checking is for in Perl)
- Out-of-sequence errors (forgetting to call init)
- Forcing a subset of values to be used (for example, only integers greater than 0)
Nefarious kittens (Yes, it was a joke)
- Loss-of-precision errors
- Software transactional memory (STM) errors (this needs purity, which also requires types)
- Generalizing 8, controlling side effects
- Invariants over data structures (is a binary tree balanced?)
- Forgetting an exception or throwing the wrong one
And remember, this is also at compile time. No need to write tests with 100% code coverage to simply check for type errors, the compiler just does it for you :)
Case study: Typed lambda calculus
Alright, let's examine the simplest of all type systems, simply typed lambda calculus.
Basically there are two types,
Type = Unit | Type -> Type
And all terms are either variables, lambdas, or application. Based on this, we can prove that any well typed program terminates. There is never a situation where the program will get stuck or loop forever. This isn't provable in normal lambda calculus because well, it isn't true.
Think about this, we can use type systems to guarentee that our program doesn't loop forever, rather cool right?
Detour into dynamic types
Dynamic type systems can offer identical guarantees as static type systems, but at runtime rather than compile time. Actually, since it's runtime, you can actually offer more information. You lose some guarantees however, particularly about static properties like termination.
So dynamic types don't rule out certain programs, but rather route malformed programs to well-defined actions, like throwing exceptions.
TLDR
So the long and the short of it, is that type systems rule out certain programs. Many of the programs are broken in some way, therefore, with type systems we avoid these broken programs.
Best Answer
To expand on @KarlBielefeldt's answer, here's a full example of how to implement Vectors - lists with a statically-known number of elements - in Haskell. Hold on to your hat...
As you can see from the long list of
LANGUAGE
directives, this'll only work with a recent version of GHC.We need a way of representing lengths within the type system. By definition, a natural number is either zero (
Z
) or it's the successor of some other natural number (S n
). So, for example, the number 3 would be writtenS (S (S Z))
.With the DataKinds extension, this
data
declaration introduces a kind calledNat
and two type constructors calledS
andZ
- in other words we have type-level natural numbers. Note that the typesS
andZ
don't have any member values - only types of kind*
are inhabited by values.Now we introduce a GADT representing vectors with a known length. Note the kind signature:
Vec
requires a type of kindNat
(ie aZ
or anS
type) to represent its length.The definition of vectors is similar to that of linked lists, with some extra type-level information about its length. A vector is either
VNil
, in which case it has a length ofZ
(ero), or it's aVCons
cell adding an item to another vector, in which case its length is one more than the other vector (S n
). Note that there's no constructor argument of typen
. It's just used at compile time to track lengths, and will be erased before the compiler generates machine code.We've defined a vector type which carries around static knowledge of its length. Let's query the type of a few
Vec
s to get a feel for how they work:The dot product proceeds just as it would for a list:
vap
, which 'zippily' applies a vector of functions to a vector of arguments, isVec
's applicative<*>
; I didn't put it in anApplicative
instance because it gets messy. Note also that I'm using thefoldr
from the compiler-generated instance ofFoldable
.Let's try it out:
Great! You get a compile-time error when you try to
dot
vectors whose lengths don't match.Here's an attempt at a function to concatenate vectors together:
The length of the output vector would be the sum of the lengths of the two input vectors. We need to teach the type checker how to add
Nat
s together. For this we use a type-level function:This
type family
declaration introduces a function on types called:+:
- in other words, it's a recipe for the type checker to calculate the sum of two natural numbers. It's defined recursively - whenever the left operand is greater thanZ
ero we add one to the output and reduce it by one in the recursive call. (It's a good exercise to write a type function which multiplies twoNat
s.) Now we can make+++
compile:Here's how you use it:
So far so simple. What about when we want to do the opposite of concatenation and split a vector in two? The lengths of the output vectors depend on the runtime value of the arguments. We'd like to write something like this:
but unfortunately Haskell won't let us do that. Allowing the value of the
n
argument to appear in the return type (this is commonly called a dependent function or pi type) would require "full-spectrum" dependent types, whereasDataKinds
only gives us promoted type constructors. To put it another way, the type constructorsS
andZ
don't appear at the value level. We'll have to settle for singleton values for a run-time representation of a certainNat
.*For a given type
n
(with kindNat
), there is precisely one term of typeNatty n
. We can use the singleton value as a run-time witness forn
: learning about aNatty
teaches us about itsn
and vice versa.Let's take it for a spin:
In the first example, we successfully split a three-element vector at position 2; then we got a type error when we tried to split a vector at a position past the end. Singletons are the standard technique for making a type depend on a value in Haskell.
* The
singletons
library contains some Template Haskell helpers to generate singleton values likeNatty
for you.Last example. What about when you don't know the dimensionality of your vector statically? For example, what if we're trying to build a vector from run-time data in the form of a list? You need the type of the vector to depend on the length of the input list. To put it another way, we can't use
foldr VCons VNil
to build a vector because the type of the output vector changes with each iteration of the fold. We need to keep the length of the vector a secret from the compiler.AVec
is an existential type: the type variablen
does not appear in the return type of theAVec
data constructor. We're using it to simulate a dependent pair:fromList
can't tell you the length of the vector statically, but it can return something you can pattern-match on to learn the length of the vector - theNatty n
in the first element of the tuple. As Conor McBride puts it in a related answer, "You look at one thing, and in doing so, learn about another".This is a common technique for existentially quantified types. Because you can't actually do anything with data for which you don't know the type - try writing a function of
data Something = forall a. Sth a
- existentials often come bundled up with GADT evidence which allows you to recover the original type by performing pattern-matching tests. Other common patterns for existentials include packaging up functions to process your type (data AWayToGetTo b = forall a. HeresHow a (a -> b)
) which is a neat way of doing first-class modules, or building-in a type class dictionary (data AnOrd = forall a. Ord a => AnOrd a
) which can help emulate subtype polymorphism.Dependent pairs are useful whenever the static properties of data depend on dynamic information not available at compile time. Here's
filter
for vectors:To
dot
twoAVec
s, we need to prove to GHC that their lengths are equal.Data.Type.Equality
defines a GADT which can only be constructed when its type arguments are the same:When you pattern-match on
Refl
, GHC knows thata ~ b
. There are also a few functions to help you work with this type: we'll be usinggcastWith
to convert between equivalent types, andTestEquality
to determine whether twoNatty
s are equal.To test the equality of two
Natty
s, we're going to need to make use of the fact that if two numbers are equal, then their successors are also equal (:~:
is congruent overS
):Pattern matching on
Refl
on the left-hand side lets GHC know thatn ~ m
. With that knowledge, it's trivial thatS n ~ S m
, so GHC lets us return a newRefl
right away.Now we can write an instance of
TestEquality
by straightforward recursion. If both numbers are zero, they are equal. If both numbers have predecessors, they are equal iff the predecessors are equal. (If they're not equal, just returnNothing
.)Now we can put the pieces together to
dot
a pair ofAVec
s of unknown length.First, pattern match on the
AVec
constructor to pull out a runtime representation of the vectors' lengths. Now usetestEquality
to determine whether those lengths are equal. If they are, we'll haveJust Refl
;gcastWith
will use that equality proof to ensure thatdot u v
is well-typed by discharging its implicitn ~ m
assumption.Note that, since a vector without static knowledge of its length is basically a list, we've effectively re-implemented the list version of
dot :: Num a => [a] -> [a] -> Maybe a
. The difference is that this version is implemented in terms of the vectors'dot
. Here's the point: before the type checker will allow you to calldot
, you must have tested whether the input lists have the same length usingtestEquality
. I am prone to gettingif
-statements the wrong way round, but not in a dependently-typed setting!You can't avoid using existential wrappers at the edges of your system, when you're dealing with runtime data, but you can use dependent types everywhere inside your system and keep the existential wrappers at the edges, when you perform input validation.
Since
Nothing
is not very informative, you could further refine the type ofdot'
to return a proof that the lengths aren't equal (in the form of evidence that their difference is not 0) in the failure case. This is pretty similar to the standard Haskell technique of usingEither String a
to possibly return an error message, although a proof term is far more computationally useful than a string!Thus ends this whistle-stop tour of some of the techniques that are common in dependently-typed Haskell programming. Programming with types like this in Haskell is really cool, but really awkward at the same time. Breaking up all your dependent data into lots of representations which mean the same thing -
Nat
the type,Nat
the kind,Natty n
the singleton - is really quite cumbersome, despite the existence of code-generators to help with the boilerplate. There are also presently limitations on what can be promoted to the type level. It's tantalising though! The mind boggles at the possibilities - in the literature there are examples in Haskell of strongly typedprintf
, database interfaces, UI layout engines...If you want some further reading, there's a growing body of literature about dependently typed Haskell, both published and on sites like Stack Overflow. A good starting point is the Hasochism paper - the paper goes through this very example (among others), discussing the painful parts in some detail. The Singletons paper demonstrates the technique of singleton values (such as
Natty
). For more information about dependent typing in general, the Agda tutorial is a good place to start; also, Idris is a language in development that's (roughly) designed to be "Haskell with dependent types".