Compiler Design – How Are Generics Implemented in a Modern Compiler?

compilerdesign-patternsprogramming-languages

What I mean here is how do we go from some template T add(T a, T b) ... into the generated code? I've thought of a few ways to achieve this, we store the generic function in an AST as Function_Node and then each time we use it we store in the original function node a copy of itself with all of the types T substituted with the types that are being used. For example add<int>(5, 6) will store a copy of the generic function for add and substitute all types T in the copy with int.

So it would look something like:

struct Function_Node {
    std::string name; // etc.
    Type return_type;
    std::vector<std::pair<Type, std::string>> arguments;
    std::vector<Function_Node> copies;
};

Then you could generate code for these and when you visit a Function_Node where the list of copies copies.size() > 0, you invoke visitFunction on all of the copies.

visitFunction(Function_Node& node) {
    if (node.copies.size() > 0) {
        for (auto& node : nodes.copies) {
            visitFunction(node);
        }
        // it's a generic function so we don't want
        // to emit code for this.
        return;
    }
}

Would this work out well? How do modern compilers approach this problem? I think perhaps another way to do this would be you could inject the copies into the AST so that it runs through all of the semantic phases. I also thought perhaps you could generate them in an immediate form like Rust's MIR or Swifts SIL for example.

My code is written in Java, the examples here are C++ because it's a bit less verbose for examples – but the principle is basically the same thing. Though there may be a few errors because it's just written out by hand in the question box.

Note that I mean modern compiler as in what is the best way to approach this problem. And when I say generics I don't mean like Java generics where they use type erasure.

Best Answer

How are generics implemented in a modern compiler?

I invite you to read the source code of a modern compiler if you wish to know how a modern compiler works. I'd start with the Roslyn project, which implements C# and Visual Basic compilers.

In particular I draw your attention to the code in the C# compiler which implements type symbols:

https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Symbols

and you might also want to look at the code for convertibility rules. There is much there that pertains to the algebraic manipulation of generic types.

https://github.com/dotnet/roslyn/tree/master/src/Compilers/CSharp/Portable/Binder/Semantics/Conversions

I tried hard to make the latter easy to read.

I've thought of a few ways to achieve this, we store the generic function in an AST as Function_Node and then each time we use it we store in the original function node a copy of itself with all of the types T substituted with the types that are being used.

You are describing templates, not generics. C# and Visual Basic have actual generics in their type systems.

Briefly, they work like this.

  • We begin by establishing rules for what formally constitutes a type at compile time. For example: int is a type, a type parameter T is a type, for any type X, the array type X[] is also a type, and so on.

  • The rules for generics involve substitution. For example, class C with one type parameter is not a type. It's a pattern for making types. class C with one type parameter called T, under substitution with int for T is a type.

  • Rules describing the relationships between types -- compatibility upon assignment, how to determine the type of an expression, and so on -- are designed and implemented in the compiler.

  • A bytecode language that supports generic types in its metadata system is designed and implemented.

  • At runtime the JIT compiler turns the bytecode into machine code; it is responsible for constructing the appropriate machine code given a generic specialization.

So for example, in C# when you say

class C<T> { public void X(T t) { Console.WriteLine(t); } }
...
var c = new C<int>(); 
c.X(123);

then the compiler verifies that in C<int>, the argument int is a valid substitution for T, and generates metadata and bytecode accordingly. At runtime, the jitter detects that a C<int> is being created for the first time and generates the appropriate machine code dynamically.

Related Topic