Garbage Collection – How a Garbage Collector Follows Pointers to Discover Live Objects

garbage-collection

I have read through A tour of V8: Garbage Collection and am stuck on this part:

Distinguishing pointers and data on the heap is the first problem any garbage collector needs to solve. The GC needs to follow pointers in order to discover live objects.

I understand the rest of the document for the most part, talking about how to actually perform the garbage collection once you have the live objects. But I don't understand how you identify the live objects.

After that quote they go on to describe 3 ways of distinguishing pointers from data, but the second one is glossed over a bit:

As long as we can identify what class an object comes from, we can find all of its pointers.

That sounds nice, I would like to know how to do that. I assume they mean when you construct a new class like new Foo(), under the hood it associates the instance of the class with a pointer to the actual class. But I don't see how the garbage collector then uses this information to determine which objects are live or not.

Likewise, this resource is a bit vague on the same detail:

The garbage collector works in two phases: the mark phase, and the sweep phase. In the mark phase, the garbage collector starts walks object references in order to find objects that are still reachable. The garbage collector starts at a few basic places where the object references are stored and given names (the stack, and global storage, and static storage), and then traverses references in the objects.

This is a bit more helpful, but still pretty vague:

Tracing garbage collection traverses the object graph forward,
starting with the roots, to find the live data. Reference counting
traverses the object graph forward, starting with the anti-roots (the
set of objects whose reference counts were decremented to 0), to
find dead data.

Intuitively, one can think of tracing as operating on “matter” and
reference counting as operating on “anti-matter”.

I don't see how you can traverse objects, and know that it is ready to be garbage collected or not. Any tips on understanding that part would be of help. Thank you!

Best Answer

I'm sure Erik's answer has lots of great info but I think you are stuck on the fundamentals. I'm more familiar with Java GC but the basic principles are pretty universal.

I understand the rest of the document for the most part, talking about how to actually perform the garbage collection once you have the live objects. But I don't understand how you identify the live objects.

The thing I think you are missing is that there is necessarily a set of root references. In Java that's your stack, and static values etc. In JS, I'd guess it's things like the window object. I think a picture helps here:

Objects in memory

The circles represent objects and the arrows are references to other objects. On the left I've labelled three objects as being part of the root references. What that is exactly isn't terribly important but what you need to know is that these are basically implicit/hardcoded. They are fundamental to the runtime environment i.e. there's no need to find them. We know what and where they are inherently.

I think the easiest algorithm to understand is the mark-sweep and it's guaranteed to find all the garbage unlike other approaches such as reference counting. The first step is to mark.

enter image description here

I've added a green checkmark to each object that is referenced from the roots. You should be able to verify whether I've gotten it right (please do!) Just start with each root and follow it's references. Then follow the references from those until there are no more paths to follow. This is the 'live set' of objects.

Now comes the sweep:

enter image description here

Everything that wasn't identified as live is considered part of the 'dead set'. These are what can be removed. Notice the cycle of references in the lower-right corner. These are all in the dead set even though there are reference to them. This is because there is no path from the roots to any of those objects.