Compiler – Storing Individual Lines of Code and Functions in a Concrete Syntax Tree

compiler

I'm trying to write a simple compiler for learning purposes. I've been reading the Dragon Book and Modern Compiler Design and one part I don't understand is how the Concrete Syntax Tree is actually created and stored.

I understand that by looping through the Tokens produced by the Lexer it's a simple matter to collect all the pieces of an assignment operator; for example:

int i = 0;

is pretty straight-forward to collect the type, identifier and that we're assigning a value of const_number zero. And I understand how this concrete syntax tree looks like.

And if it's assigned like an expressions like:

int i = a * b;

I also understand what this concrete syntax tree would look like.

But then let's say I have:

int i = functionCall();

What does this look like in a concrete syntax tree?

And further, considering a language like C that's a bunch of functions, with one of them, the main function being denoted as the entry point; how does this all fit into a concrete syntax tree?

Does each one have its own tree?

The creation of a heirarchy of Node types for my tree, each with the specific components it needs makes sense to me; but not how this factors in function calls; unless every single function was inlined.

Additional Info from Comments

So, say I have some code that looks like:

int AddProc(int i, int j)
{
    return i + j;
}

void main()
{
    int x = 8;
    int y = 0;
    int z = x + y;
    x = AddProc(y,z);
}

The Token stream starts from the top to the bottom; simple; each token tells the Parser if it's a TYPE or ID or CONST or ADD_OP whatever. The first stage of the parser is to produce a Concrete Syntax Tree, that's then turned into an Abstract Syntax Tree.

My question is what does the Concrete Syntax Tree look like for the above; and further, the AST as well?

Best Answer

Concrete syntax trees follow directly from the grammatical production rules of the language (i.e. its grammar). They are complex, wordy and offer little benefit for the next phases of the compiler (analysis and code generation).

I don't think that most compilers represent or store concrete syntax (let alone concrete syntax trees), concrete syntax is at best manifest within the parsing algorithm itself (for example, sometimes using recursion); generally on successfully parsing something the parser generates some intermediate data structure, and if that is a tree, it is likely more reflective of an abstract syntax tree.

Look at the diagram for the "Parse Tree" in this answer and https://stackoverflow.com/a/10176731/471129 , and compare with the AST further on in @Guy's answer.

http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/

Related Topic