Haskell – Type Checking and Recursive Types (Writing the Y Combinator in Haskell/Ocaml)

haskellocamlstatic-typingtype-systems

When explaining the Y combinator in the context of Haskell, it's usually noted that the straight-forward implementation won't type-check in Haskell because of its recursive type.

For example, from Rosettacode:

The obvious definition of the Y combinator in Haskell canot be used
because it contains an infinite recursive type (a = a -> b). Defining
a data type (Mu) allows this recursion to be broken.

newtype Mu a = Roll { unroll :: Mu a -> a }

fix :: (a -> a) -> a
fix = \f -> (\x -> f (unroll x x)) $ Roll (\x -> f (unroll x x))

And indeed, the “obvious” definition does not type check:

λ> let fix f g = (\x -> \a -> f (x x) a) (\x -> \a -> f (x x) a) g

<interactive>:10:33:
    Occurs check: cannot construct the infinite type:
      t2 = t2 -> t0 -> t1
    Expected type: t2 -> t0 -> t1
      Actual type: (t2 -> t0 -> t1) -> t0 -> t1
    In the first argument of `x', namely `x'
    In the first argument of `f', namely `(x x)'
    In the expression: f (x x) a

<interactive>:10:57:
    Occurs check: cannot construct the infinite type:
      t2 = t2 -> t0 -> t1
    In the first argument of `x', namely `x'
    In the first argument of `f', namely `(x x)'
    In the expression: f (x x) a
(0.01 secs, 1033328 bytes)

The same limitation exists in Ocaml:

utop # let fix f g = (fun x a -> f (x x) a) (fun x a -> f (x x) a) g;;
Error: This expression has type 'a -> 'b but an expression was expected of type 'a                                    
       The type variable 'a occurs inside 'a -> 'b

However, in Ocaml, one can allow recursive types by passing in the -rectypes switch:

   -rectypes
          Allow  arbitrary  recursive  types  during type-checking.  By default, only recursive
          types where the recursion goes through an object type are supported.

By using -rectypes, everything works:

utop # let fix f g = (fun x a -> f (x x) a) (fun x a -> f (x x) a) g;;
val fix : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>
utop # let fact_improver partial n = if n = 0 then 1 else n*partial (n-1);;
val fact_improver : (int -> int) -> int -> int = <fun>
utop # (fix fact_improver) 5;;
- : int = 120

Being curious about type systems and type inference, this raises some questions I'm still not able to answer.

First, how does the type checker come up with the type t2 = t2 -> t0 -> t1? Having come up with that type, I guess the problem is
that the type (t2) refers to itself on the right side?
Second, and
perhaps most interesting, what is the reason for the Haskell/Ocaml
type systems to disallow this? I guess there is a good reason
since Ocaml also will not allow it by default even if it can deal
with recursive types if given the -rectypes switch.

If these are really big topics, I'd appreciate pointers to relevant literature.

Best Answer

First, the GHC error,

GHC is attempting to unify a few constraints with x, first, we use it as a function so

x :: a -> b

Next we use it as a value to that function

x :: a

And finally we unify it with the original argument expression so

x :: (a -> b) -> c -> d

Now x x becomes an attempt to unify t2 -> t1 -> t0, however, We can't unify this since it would require unifying t2, the first argument of x, with x. Hence our error message.

Next, why not general recursive types. Well the first point worth noting is the difference between equi and iso recursive types,

equi-recursive are what you'd expect mu X . Type is exactly equivalent to expanding or folding it arbitrarily.
iso-recursive types provide a pair of operators, fold and unfold which fold and unfold the recursive definitions of types.

Now equi-recursive types sound ideal, but are absurdly hard to get right in complex types systems. It can actually make type checking undecidable. I'm not familiar with every detail of OCaml's type system but fully equirecursive types in Haskell can cause the typechecker to loop arbitrarily trying to unify types, by default, Haskell makes sure that type checking terminates. Further more, in Haskell, type synonyms are dumb, the most useful recursive types would be defined like type T = T -> (), however are inlined almost immediately in Haskell, but you can't inline a recursive type, it's infinite! Therefore, recursive types in Haskell would demand a huge overhaul to how synonyms are handled, probably not worth the effort to put even as a language extension.

Iso-recursive types are a bit of a pain to use, you more or less have to explicitly tell the type checker how to fold and unfold your types, making your programs more complex to read and write.

However, this is very similar to what your doing with your Mu type. Roll is fold, and unroll is unfold. So actually, we do have iso-recursive types baked in. However, equi-recursive types are just too complex so systems like OCaml and Haskell force you to pass recurrences through type level fixpoints.

Now if this interests you, I'd recommend Types and Programming Languages. My copy is sitting open in my lap as I'm writing this to make sure I've got the right terminology :)

Related Solutions

Functional Programming – Alternatives to Language Purity

You have described an effect system. It’s true that there are other effect systems than monads, but in practice monads give you a lot of expressive power that you would need to reinvent in any practical effect system you might devise.

For example:

I start by tagging my I/O procedures with an io effect.
My pure functions can still throw exceptions, but throwing exceptions isn’t io, so I add an exn effect. Now I need to be able to combine effects.
I want to have non-I/O functions that use mutation internally on a local heap h, and I want the compiler to make sure that my mutable references don’t escape their scope, so I add an st<h> effect. Now I need parameterised effects.
For higher-order functions, I need to write multiple versions for every combination of effects I want to support—for example, map with a pure function versus map with an impure function. Now I need effect polymorphism.
I notice that my user-defined type could be used to model effects (benign failure, nondeterminism) that the language doesn’t have natively. Now I need user-defined effects.

Monads support all of these use cases:

Functions that may return errors can use the Either monad.
Effects can be combined with monad transformers—if I need exceptions and I/O, I can use the EitherT IO monad.
Monads can be parameterised like any other type; the ST h monad lets you do local mutation on STRefs in the scope of a heap h.
Because of the Monad typeclass, I can overload an operation to work on any kind of effect. For example, mapM :: (Monad m) => (a -> m b) -> [a] -> m [b] works in IO, but it also works in Either or ST or any other monad.
All monads are implemented as user-defined types, except for things like IO and ST which must be implemented in the Haskell runtime.

There are of course some significant warts in Haskell’s use of monads for side effects.

For one thing, monad transformers are generally not commutative, so StateT Maybe is different from MaybeT State, and you have to be careful to choose the combination that you actually mean.

The bigger problem is that a function can be polymorphic in which monad is being used, but it cannot be polymorphic in whether a monad is being used. So we get loads of duplicated code: map vs. mapM; zipWith vs. zipWithM, &c.

Now, in answer to your actual question…

Koka from Microsoft Research is a language with an effect system that does not suffer from these problems. For example, the type of Koka’s map function includes a type parameter e for effects:

map : (xs : list<a>, f : (a) -> e b) -> e list<b>

And this parameter can be instantiated to the null effect, making map work for pure and impure functions alike. In addition, the order in which effects are specified is immaterial: <exn,io> is the same as <io,exn>.

It’s also worth mentioning that Java’s checked exception mechanism is an example of an effect system that people have widely accepted, but I don’t feel it adequately answers your question because it’s not general.

Lambda – Return Type Inference Explained

Your problem is that you link method calls before the program is fully type-checked. In a complicated language with subtype polymorphism and parametric polymorphism, you cannot do any kind of linking before you know the type (or in the case of subtyping, the type bounds) of an expression. The solution is to introduce type variables for expressions of (initially) unknown type, and properly unify type variables with concrete types. If you cannot resolve a type variable, that's an error and not an invitation to deduce the most general type.

A proper type inferer would look at the code

var list2 = list.mapped { s => s.toUpperCase }

in the static environment

list : List[String]
List[E]#mapped[U] : (E => U) => List[U]
String#toUpperCase : () => String

like this:

infer var list2 = list.mapped { s => s.toUpperCase }
- infer list.mapped { s => s.toUpperCase }
  - infer list
    - known to be List[String]
    - save E = String to the static environment for this expression.
  - resolves to List[E]#mapped[U]
  - introduce variable $U for U.
  - known to be (E => $U) => List[$U]
  - unify arguments: (E => $U) >: typeof({ s => s.toUpperCase })
    - infer { s => s.toUpperCase }
      - introduce variables $a, $b
      - known to be $a => $b
      - unify (E => $U) >: ($a => $b): E <: $a and $b <: $U. Technically, these are just type bounds, but let's consider them equal.
      - save E = $a, $b = $U to the static environment for this expression
      - save s : $a to the static environment for this expression
      - infer s.toUpperCase
        
        infer s
        
        known to be String
        
        infer .toUpperCase.
        
        matches String#toUpperCase
        
        known to be () => String
        
        is String
      - unify String <: $b, but lets consider them equal
      - save $b = String to the static environment for this expression
    - is ($a => $b) = (String => String)
    - already unified (E => $U) >: ($a => $b).
  - is List[$U] = List[String]
- infer var list2
  - introduce variable $c
  - save list2 : $c into the static environment for this scope
  - is $c
- unify $c >: List[String], but let's consider them equal
- save $c = List[String] into the static environment

After this round of type inference, asking the static environment for the type of list2 gives us List[String]. The above steps roughly follow along the type inference steps for a Hindley-Milner type system, but note that you are probably dealing with subtyping, which makes type inference far more complicated: unifying a type only provides us bounds on the type and very rarely a concrete type. In the above example, I ignored these details and always unified the types as equals.

The second major problem is infering the type of the object on which to dispatch for a method call. In this example, I always tried to infer the type of the object before resolving the method call. However, this is not necessary. A method List[E]#mapper[U] : (E => U) => List[U] could also be viewed as a free function .mapper[E, U]: (List[E], E => U) => List[U]. In other words, the implicit this argument is transformed to an explicit argument. This allows us use type variables for the invocant variable, and makes it easier to deal with a set of overloads. Consider the following example:

static env:
  .add[E]    : (List[E], E) => void
  .add[E]    : (List[E], List[E]) => void
  .add[K, V] : (Map[K, V], K, V) => void
  .add       : (Integer, Integer) => Integer
  a          : $a
  b          : $b

expression:
    a.add(b)

when type-infering that expression, we initially only know that this is some kind of .add call, but we do not know which one – we cannot resolve it at this point. We have to unify each of the possible types with the given types. If one unification fails, we remove that possibility from the set of overloads.

infer a.add(b)
- candidate (List[E], E) => void
  - variable $c for E
  - unify arguments
    - $a = List[$c]
    - $b = $c
  - is void
- candidate (List[E], List[E]) => void
  - variable $c for E
  - unify arguments
    - $a = List[$c]
    - $b = List[$c]
  - is void
- candidate (Map[K, V], K, V) => void
  - variable $c for K
  - variable $d for V
  - unify arguments
    - $a = Map[$c, $d]
    - $b = $c
    - ERROR: no argument for V-parameter
  - discard candidate
- candidate (Integer, Integer) => Integer
  - unifiy arguments
    - $a = Integer
    - $b = Integer
  - is Integer

So after doing a round of type inference, we are still left with a number of choices. Some languages employ some ranking mechanism, e.g. methods on subtypes are preferred over methods on supertypes. In our example, there is no sensible ranking. If you want your type-checker to be moderately well-performing, you should issue a compiler exception here, e.g.

foo.sourcefile:42:3 error: could not resolve call to method `add`
    a.add(b)
      ^
    candidates are:
    List[E]#add(E elem)
    List[E]#add(List[E] elems)
    Integer#add(Integer rhs)

Likewise, you should display an error when the set of possible methods is empty after attempting unification.

Note that for a different static environment, the the set of possible solutions would only contain one method which you could bind to. E.g.:

a : List[$c]
b : $c

Since the call can be resolved unambiguously in that case, we can continue doing type inference for the rest of the compilation unit in hopes of resolving the $c type variable.

Depending on how you implement generics, you could link the method call directly after successfully infering the type for the call. If you need to know the value of the type variable $c, you need to wait until type inference for the compilation unit has completed, and can do linking in a second pass. This is now easy to do since all expressions are already annotated with their inferred type.

Best Answer

Related Solutions

Functional Programming – Alternatives to Language Purity

Lambda – Return Type Inference Explained

Related Topic