Abstract Syntax Tree – How It Executes Source Code

parsingprogramming-languages

After researching how a parser generates an AST, I believe that I can know attempt to create one. Before I started on this project, I began to ponder what I should be done next after creating a AST that represented my language grammar. Despite my research into this topic, I have not surfaced any quality resources explaining what should be done with the AST to execute the source code.

Take this example for instance:

var = 10 + 2.

A parser might create an AST that is similar to this:

     =
    / \
  var  +
      / \
     10  2

What would be done next with the example AST above. Would the parser record the variable and its value? Or does the parser simply generate the AST, and its up to some other program to evaluate the AST.

It seems to me that creating a AST is making more work for the rest of the program. Whatever reads over the AST has to keep track of all kinds of statements and scopes. Would it not make more sense to just group your tokens into statements, and execute each statement individually without a AST?

Note: My question is not a duplicate of Is an AST enough to build any translator?. The OP of that question is asking if a AST is enough to implement any language feature?. I'm asking how one would execute the source code of a language from an AST?. some parts of the post may be similar, but the overall questions of each one are very different.

Best Answer

The parser just dissects the code into logically meaningful parts. It is then up to a following processing step to do something with the parse result.

The following step (in your case a code interpreter) can focus on the semantics. The parser did all the dirty work, its result is clean and simple. So it does make things easier. Note that you start out with a piece of text (one string or a collection of lines) and the parser gives you an object model. It seems you have no clear picture of an AST, of what that means in terms of an implementation in code. So let's get into that.

An execution engine would read the tree and find one object: an assignnent. The assignment class contains two properties: left and right. Left is a variable object, right is an expression object.

The expression object is a tree too which is basically a bunch of objects that contain other objects. These are typically processed recursively, as the processor hits the first object in the tree it does not know or care how deep the tree is, it just processes that one object and delegates the processing of contained objects to the same code. When that code hits a leaf (an object that is not complex, that has no contained objects) it performs an action and/or returns a result. Then everything bubbles back up the call stack until the top object gets its result and can do its thing, in your example add 10 and 2.

So what makes it easy is the ability to walk down a nested collection of objects and solve it recursively. Parsing text and acting on it in one go can get messy soon and may produce half-baked results because there may be syntax errors further down and you already started executing.

Related Solutions

Java Parsing – Better Design Alternatives for Special Cases

Predictive parser you selected (LL(k)) means you will have to solve left-recursion problems. Algorithm for solving direct and indirect recursions is clearly described on wikipedia:

http://en.wikipedia.org/wiki/Left_recursion

Some info can be found in posts here on StackOverflow:

https://stackoverflow.com/questions/2652060/removing-left-recursion https://stackoverflow.com/questions/2999755/removing-left-recursion-in-antlr https://stackoverflow.com/questions/4994036/left-recursion-elimination

In human language (non-scientific :) "left-recursion problem" means you can't endlessly go into recursion with non-terminal (A -> Ab) again and again. At some time you HAVE TO feed parser algorithm with a terminal to breake a loop.

In BNF this could look like:

Recursion Problem:

NT: NT T
NT: T

One solution:

NT: T NT2
NT2: T NT2
NT2:

For your grammar this could look like:

DataType:
    PrimitiveDataType ArrayDimensions
 |  ComplexDataType ArrayDimensions

ArrayDimensions:
    [] ArrayDimensions
 |

If your parser generator doesn't allow empty productions and/or if you want to process array types separately, try something like this:

DataType:
    DataTypeName
 |  ArrayDataType

ArrayDataType:
    DataTypeName ArrayDimensions

DataTypeName:
    PrimitiveDataType
 |  ComplexDataType

ArrayDimensions:
    []
 |  [] ArrayDimensions

ANTLR vs Parsing Libraries – When to Use Each

You could use either to perform either task. The difference is what each is meant for. You can draw pictures or diagrams in Excel if you want to, but you can also draw a picture in something built for that purpose.

JSON and XML libraries are designed around general purpose loading of documents, pulling parts out of them, or transforming their structures into different structures, etc.

ANTLR on the other hand is a tool designed for generating parsers for compilers. It is tailored specifically to suit the needs of that task.

If you use an xml or json parser to parse either of those, you'll ultimately end up writing a bunch of code that transforms your input into an AST of some sort, for you to process. So, whether you want to write and debug all of that, or use something that gives it to you up front, that's up to you.

Best Answer

Related Solutions

Java Parsing – Better Design Alternatives for Special Cases

ANTLR vs Parsing Libraries – When to Use Each

Related Topic