How to implement a garbage collector

garbage-collection

Could anyone point me to a good source on how to implement garbage collection? I am making a lisp-like interpreted language. It currently uses reference counting, but of course that fails at freeing circularly dependent objects.

I've been reading of mark and sweep, tricolor marking, moving and nonmoving, incremental and stop-the-world, but… I don't know what the best way to keep the objects neatly separated into sets while keeping per-object memory overhead at a minimum, or how to do things incrementally.

I've read some languages with reference counting use circular reference detection, which I could use. I am aware I could use freely available collectors like Boehm, but I would like to learn how to do it myself.

I would appreciate any online material with some sort of tutorial or help for people with no experience on the topic like myself.

Best Answer

Could anyone point me to a good source on how to implement garbage collection?

There's a lot of advanced material about garbage collection out there. The Garbage Collection Handbook is great. But I found there was precious little basic introductory information so I wrote some articles about it. Prototyping a mark-sweep garbage collector describes a minimal mark-sweep GC written in F#. The Very Concurrent Garbage Collector describes a more advanced concurrent collector. HLVM is a virtual machine I wrote that includes a stop-the-world collector that handles threading.

The simplest way to implement a garbage collector is:

  1. Make sure you can collate the global roots. These are the local and global variables that contain references into the heap. For local variables, push them on to a shadow stack for the duration of their scope.

  2. Make sure you can traverse the heap, e.g. every value in the heap is an object that implements a Visit method that returns all of the references from that object.

  3. Keep the set of all allocated values.

  4. Allocate by calling malloc and inserting the pointer into the set of all allocated values.

  5. When the total size of all allocated values exceeds a quota, kick off the mark and then sweep phases. This recursively traverses the heap accumulating the set of all reachable values.

  6. The set difference of the allocated values minus the reachable values is the set of unreachable values. Iterate over them calling free and removing them from the set of allocated values.

  7. Set the quota to twice the total size of all allocated values.

Related Topic