Background
I'm currently designing my own programming language as a research project.
I have most of the grammar done and written down as context-free grammar, and it should be working as is. – Now I'm working on the actual compiler that should translate the language into x86 binary assembly code
, more specifically, I am working on the parser
(the front end).
The language's syntax is, for the most part, very similar to Java/C/C++.
The parser, which constructs an intermediate representation out of source code, works as follows:
The grammar is built as a big tree in which the actual source code only determines the leaves; Each syntactic variable (or nonterminal) has it's own class, and each of these classes has a static get(byte[] source, int offset)
method which returns a new leaf (node), or null
if the source code syntax does not fit this nonterminal structure.
I am using a variation of predictive parsing
, by the way.
For the nonterminal DataType
, I have chosen the following grammatical structure:
DataType:
PrimitiveDataType
| ArrayDataType
| ComplexDataType
ArrayDataType:
DataType []
Did I mention this language is object-oriented?
So the problem here is that when DataType
's get
method is called, it first checks whether the following is a primitive data type, PrimitiveDataType
's get
method is called. Assuming we have an array, this would return null
, so it continues on to check whether it's an ArrayDataType
, by calling it's get
method.
Problem
Arrays may be created of any data type, including arrays themselves (which would look like Type[][]
). So what ArrayDataType
's get
method would do is again call DataType
's get
method to figure out what type the array is of.
Unfortunately, this is where my parser implementation would fail because this behavior results in a loop!
Question
Would there be any good/better design alternatives to this?
Best Answer
Predictive parser you selected (LL(k)) means you will have to solve left-recursion problems. Algorithm for solving direct and indirect recursions is clearly described on wikipedia:
http://en.wikipedia.org/wiki/Left_recursion
Some info can be found in posts here on StackOverflow:
https://stackoverflow.com/questions/2652060/removing-left-recursion https://stackoverflow.com/questions/2999755/removing-left-recursion-in-antlr https://stackoverflow.com/questions/4994036/left-recursion-elimination
In human language (non-scientific :) "left-recursion problem" means you can't endlessly go into recursion with non-terminal (A -> Ab) again and again. At some time you HAVE TO feed parser algorithm with a terminal to breake a loop.
In BNF this could look like:
Recursion Problem:
One solution:
For your grammar this could look like:
If your parser generator doesn't allow empty productions and/or if you want to process array types separately, try something like this: