I have implemented a LL1 parser in a non recursive approach with a explicit stack.
The following algorithm is from the Dragon Book:
set zp to point to the first symbol of w;
set X to the top stack symbol;
while ( X != $ ) { /* stack is not empty */
if ( X is a )
pop the stack and advance zp;
else if ( X is a terminal )
error();
else if ( M[X, a] is an error entry )
error();
else if ( M[X,a] = X -+ Y1Y2 Yk ) {
output the production X -+ YlY2 - . Yk;
pop the stack;
push Yk, Yk-1,. . . , Yl onto the stack, with Yl on top;
set X to the top stack symbol;
}
The book says:
The parser is controlled by a program that considers X, the symbol on
top of the stack, and a, the current input symbol. If X is a
nonterminal, the parser chooses an X-production by consulting entry
M[X, a] of the parsing table IM. (Additional code could be executed
here, for example, code to construct a node in a parse tree.)
Otherwise, it checks for a match between the terminal X and current
input symbol a.
However i need more info on how to construct the expression tree nodes under this approach. I have a node hierarchy of UnaryOperator, BinaryOperator, etc but dont know where to instanciate it.
Yet i havent found any simple example of this (with for example the arithmetic language).
Best Answer
It happens just as the cited text from the book explains: when you expand a nonterminal via its grammar rule (given by
M[X, a]
), then you can create a corresponding node.Say you have rules of the following form:
Then, once you expand
Term -> Factor Term'
you can create aTerm
node with two child nodes. When you successfully parse the first number via theFactor -> ...
rule (this is the firstif
in your example code now) you can attribute this number to the already createdFactor
node.Next, you expand for example
Term' -> * Term
viaM[Term',*]
and create a newTerm
node.Continuing, you will parse the
*
and annotate it at yourTerm'
node, expandTerm -> Factor Term'
once more, thus creating two more nodes, successfully parse aFactor
and annotate its number to the secondFactor
node and finally, on end of input you will parseTerm'
via the epsilon production (M[Term',$] = ε
), which tells you that you can remove thatTerm'
node (though that may be optional).What you end up with for an input string like 3*4 is then a tree like this:
Term ( Factor(3), Term' (*, Term ( Factor(4) ) ) )
In a post-processing step, you could simplify the resulting tree, as nonterminals like
Term'
stem from making the grammar non-left-recursive, but are otherwise unsuitable for the resulting AST, so you would want to reverse the grammar transformation on your resulting tree to get something like this:Mult (Number(3), Number(4))