Object-Oriented Programming – Memory Organization of Objects and Classes in Assembly

assemblyobject-oriented

How are objects organized in memory?

For instance, I know that a function is a piece of code in memory, that expects parameters via the stack and/or the registers and handles it's own stack frame.

But objects are a much more complicated structure. How are they organized?
Does each object have "links" to methods and passes address to itself to that method?

It would be great to see a good explanation of this topic.

UPD. I made the question more exact, and I'm mainly interested in statically typing languages.

Best Answer

If there is no dynamic dispatch (polymorphism), "methods" are just sugary functions, perhaps with an implicit additional parameter. Accordingly, instances of classes with no polymorphic behavior are essentially C structs for the purpose of code generation.

For classical dynamic dispatch in a static type system, there is basically one predominant strategy: vtables. Every instance gets one additional pointer that refers to (a limited representation of) its type, most importantly the vtable: An array of function pointers, one per method. Since the the full set of methods for every type (in the inheritance chain) is known at compile time, one can assign consecutive indices (0..N for N methods) to the methods and invoke the methods by looking up the function pointer in the vtable using this index (again passing the instance reference as additional parameter).

For more dynamic class-based languages, typically classes themselves are first-class objects and each object instead has a reference to its class object. The class object, in turn, owns the methods in some language-dependent manner (in Ruby, methods are a core part of the object model, in Python they're just function objects with tiny wrappers around them). The classes typically store references to their superclass(es) as well, and delegate the search for inherited methods to those classes to aid metaprogramming which adds and alters methods.

There are many other systems that aren't based on classes, but they differ significantly, so I'll only pick out one interesting design alternative: When you can add new (sets of) methods to all types at will anywhere in the program (e.g. type classes in Haskell and traits in Rust), the full set of methods isn't known while compiling. To resolve this, one creates a vtable per trait and passes them around when the trait implementation is required. That is, code like this:

void needs_a_trait(SomeTrait &x) { x.method2(1); }
ConcreteType x = ...;
needs_a_trait(x);

is compiled down to this:

functionpointer SomeTrait_ConcreteType_vtable[] = { &method1, &method2, ... };
void needs_a_trait(void *x, functionpointer vtable[]) { vtable[1](x, 1); }
ConcreteType x = ...;
needs_a_trait(x, SomeTrait_ConcreteType_vtable);

This also means the vtable information isn't embedded in the object. If you want references to an "instance of a trait" that will behave correctly when, for example, stored in data structures that contain many different types, one can create a fat pointer (instance_pointer, trait_vtable). This is actually a generalization of the above strategy.

Related Topic