Haskell – Does Haskell require a garbage collector

garbage-collectionhaskell

I'm curious as to why Haskell implementations use a GC.

I can't think of a case where GC would be necessary in a pure language. Is it just an optimization to reduce copying, or is it actually necessary?

I'm looking for example code that would leak if a GC wasn't present.

Best Answer

As others have already pointed out, Haskell requires automatic, dynamic memory management: automatic memory management is necessary because manual memory management is unsafe; dynamic memory management is necessary because for some programs, the lifetime of an object can only be determined at runtime.

For example, consider the following program:

main = loop (Just [1..1000]) where
  loop :: Maybe [Int] -> IO ()
  loop obj = do
    print obj
    resp <- getLine
    if resp == "clear"
     then loop Nothing
     else loop obj

In this program, the list [1..1000] must be kept in memory until the user types "clear"; so the lifetime of this must be determined dynamically, and this is why dynamic memory management is necessary.

So in this sense, automated dynamic memory allocation is necessary, and in practice this means: yes, Haskell requires a garbage collector, since garbage collection is the highest-performance automatic dynamic memory manager.

However...

Although a garbage collector is necessary, we might try to find some special cases where the compiler can use a cheaper memory management scheme than garbage collection. For instance, given

f :: Integer -> Integer
f x = let x2 = x*x in x2*x2

we might hope for the compiler to detect that x2 can safely be deallocated when f returns (rather than waiting for the garbage collector to deallocate x2). Essentially, we are asking that the compiler perform escape analysis to convert allocations in to garbage-collected heap to allocations on the stack wherever possible.

This is not too unreasonable to ask for: the jhc haskell compiler does this, although GHC does not. Simon Marlow says that GHC's generational garbage collector makes escape analysis mostly unnecessary.

jhc actually uses a sophisticated form of escape analysis known as region inference. Consider

f :: Integer -> (Integer, Integer)
f x = let x2 = x * x in (x2, x2+1)

g :: Integer -> Integer
g x = case f x of (y, z) -> y + z

In this case, a simplistic escape analysis would conclude that x2 escapes from f (because it is returned in the tuple), and hence x2 must be allocated on the garbage-collected heap. Region inference, on the other hand, is able to detect that x2 can be deallocated when g returns; the idea here is that x2 should be allocated in g's region rather than f's region.

Beyond Haskell

While region inference is helpful in certain cases as discussed above, it appears to be difficult to reconcile effectively with lazy evaluation (see Edward Kmett's and Simon Peyton Jones' comments). For instance, consider

f :: Integer -> Integer
f n = product [1..n]

One might be tempted to allocate the list [1..n] on the stack and deallocate it after f returns, but this would be catastrophic: it would change f from using O(1) memory (under garbage collection) to O(n) memory.

Extensive work was done in the 1990s and early 2000s on region inference for the strict functional language ML. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg have written a quite readable retrospective on their work on region inference, much of which they integrated into the MLKit compiler. They experimented with purely region-based memory management (i.e. no garbage collector) as well as hybrid region-based/garbage-collected memory management, and reported that their test programs ran "between 10 times faster and 4 times slower" than pure garbage-collected versions.