Java Parsing – AST Construction in LL1 Non-Recursive Parser

javaparsingsyntaxtrees

I have implemented a LL1 parser in a non recursive approach with a explicit stack.

The following algorithm is from the Dragon Book:

set zp to point to the first symbol of w;
set X to the top stack symbol;
while ( X != $ ) { /* stack is not empty */
    if ( X is a ) 
       pop the stack and advance zp;
    else if ( X is a terminal )
       error();
    else if ( M[X, a] is an error entry )
       error();
    else if ( M[X,a] = X -+ Y1Y2 Yk ) {
       output the production X -+ YlY2 - . Yk;
       pop the stack;
       push Yk, Yk-1,. . . , Yl onto the stack, with Yl on top;

    set X to the top stack symbol;
}

The book says:

The parser is controlled by a program that considers X, the symbol on
top of the stack, and a, the current input symbol. If X is a
nonterminal, the parser chooses an X-production by consulting entry
M[X, a] of the parsing table IM. (Additional code could be executed
here, for example, code to construct a node in a parse tree.)
Otherwise, it checks for a match between the terminal X and current
input symbol a.

However i need more info on how to construct the expression tree nodes under this approach. I have a node hierarchy of UnaryOperator, BinaryOperator, etc but dont know where to instanciate it.

Yet i havent found any simple example of this (with for example the arithmetic language).

Best Answer

It happens just as the cited text from the book explains: when you expand a nonterminal via its grammar rule (given by M[X, a]), then you can create a corresponding node.

Say you have rules of the following form:

Term -> Factor Term'
Term' -> * Term | / Term | ε

Factor -> x | y | ... (simplified for individual numbers, letters, what-have-you)

Then, once you expand Term -> Factor Term' you can create a Term node with two child nodes. When you successfully parse the first number via the Factor -> ... rule (this is the first if in your example code now) you can attribute this number to the already created Factor node.

Next, you expand for example Term' -> * Term via M[Term',*] and create a new Term node.

Continuing, you will parse the * and annotate it at your Term' node, expand Term -> Factor Term' once more, thus creating two more nodes, successfully parse a Factor and annotate its number to the second Factor node and finally, on end of input you will parse Term' via the epsilon production (M[Term',$] = ε), which tells you that you can remove that Term' node (though that may be optional).

What you end up with for an input string like 3*4 is then a tree like this:

Term ( Factor(3), Term' (*, Term ( Factor(4) ) ) )

In a post-processing step, you could simplify the resulting tree, as nonterminals like Term' stem from making the grammar non-left-recursive, but are otherwise unsuitable for the resulting AST, so you would want to reverse the grammar transformation on your resulting tree to get something like this:

Mult (Number(3), Number(4))

Best Answer

Related Solutions

Techniques for parsing XML

How to Get Lookahead Symbol When Constructing LR(1) NFA for Parser

Related Topic