How different is garbage collection in pure languages

functional programminggarbage-collectionlanguage-agnostic

In a pure language like Haskell, all data is immutable and no existing data structures can be changed in any way. Additionally, many algorithms on immutable data and functional programming patterns generate large amounts of garbage by nature (chains of map creating intermediate lists for example).

What strategies and techniques do garbage collectors employ in the face of purity that they wouldn't otherwise? What works very well in an impure language's GC that doesn't in a pure context? What other new problems do pure languages create for GCs?

Best Answer

The current implementation of ghc uses a strategy that only works because the language is pure functional and data is immutable: because no variable can ever be altered to refer to anything newer, objects only hold references to older objects, so it runs a generational garbage collector; since an object referred to by a higher generation cannot be deleted until that generation is GCd, it promotes objects to higher generations eagerly; and since nothing is going to alter references while the GC is sweeping them, it can run in parallel.

Here’s a paper with more detail.

Related Solutions

Garbage Collection – Why Only Sweep the Heap?

The garbage collector does scan the stack -- to see what things in the heap are currently being used (pointed to) by things on the stack.

It makes no sense for the garbage collector to consider collecting stack memory because the stack is not managed that way: Everything on the stack is considered to be "in use." And memory used by the stack is automatically reclaimed when you return from method calls. Memory management of stack space is so simple, cheap and easy that you wouldn't want garbage collection to be involved.

(There are systems, such as smalltalk, where stack frames are first-class objects stored in the heap and garbage collected like all other objects. But that's not the popular approach these days. Java's JVM and Microsoft's CLR use the hardware stack and contiguous memory.)

Do all functional languages use garbage collection

Not that I know of, though I'm no functional programming expert.

It seems pretty difficult in principle, because values returned from functions may contain references to other values that were created (on the stack) within the same function, or might just as easily have been passed in as a parameter, or referenced by something passed in as a parameter. In C, this issue is dealt with by allowing that dangling pointers (or more precisely, undefined behaviour) may occur if the programmer doesn't get things right. That's not the kind of solution that functional language designers approve of.

There are potential solutions, though. One idea is to make the lifetime of the value a part of the type of the value, along with references to it, and define type-based rules that prevent stack-allocated values from being returned from, or referenced by something returned from, a function. I've not worked through the implications, but I suspect it would be horrible.

For monadic code, there's another solution which is (actually or almost) monadic too, and could give a kind of automatically deterministically-destructed IORef. The principle is to define "nesting" actions. When combined (using an associative operator), these define a nesting control flow - I think "XML element", with the left-most of the values providing the outer begin-and-end-tag pair. These "XML tags" are just defining ordering of monadic actions at another level of abstraction.

At some point (at the right hand side of the chain of associative composition) you need some kind of terminator to end the nesting - something to fill the hole in the middle. The need for a terminator is what probably means the nesting composition operator isn't monadic, though again, I'm not entirely sure as I haven't worked through the details. As all applying the terminator does is convert a nesting action into effectively a composed normal monadic action, maybe not - it doesn't necessarily affect the nesting composition operator.

Many of these special actions would have a null "end-tag" step, and would equate the "begin-tag" step with some simple monadic action. But some would represent variable declarations. These would represent the constructor with the begin-tag, and the destructor with the end-tag. So you get something like...

act = terminate ((def-var "hello" ) >>>= \h ->
                 (def-var " world") >>>= \w ->
                 (use-val ((get h) ++ (get w)))
                )

Translating to a monadic composition with the following execution order, each tag (not element) becoming a normal monadic action...

<def-var val="hello">  --  construction
  <def-var val=" world>  --  construction
    <use-val ...>
      <terminator/>
    </use-val>  --  do nothing
  </def-val>  --  destruction
</def-val>  --  destruction

Rules like this could allow C++-style RAII to be implemented. The IORef-like references cannot escape their scope, for similar reasons to why normal IORefs can't escape the monad - the rules of the associative composition provide no way for the reference to escape.

EDIT - I nearly forgot to say - there's a definite area I'm unsure about here. It's important to ensure that an outer variable can't reference an inner one, basically, so there must be restrictions one what you can do with these IORef-like references. Again, I haven't worked through all the details.

Therefore, construction could e.g. open a file which destruction closes. Construction could open a socket which destruction closes. Basically, as in C++, the variables become resource managers. But unlike C++, there are no heap-allocated objects that cannot be automatically destructed.

Although this structure supports RAII, you still need a garbage collector. Although a nesting action can allocate and free memory, treating it as a resource, there's still all the references to (potentially shared) functional values within that chunk of memory and elsewhere. Given that the memory could be simply allocated on the stack, avoiding the need for a heap free, the real significance (if there is any) is for other kinds of resource management.

So what this achieves is to separate RAII-style resource management from memory management, at least in the case where RAII is based on simple nesting scope. You still need a garbage collector for memory management, but you get safe and timely automatic deterministic cleanup of other resources.

Best Answer

Related Solutions

Garbage Collection – Why Only Sweep the Heap?

Do all functional languages use garbage collection

Related Topic