Abstract Syntax Tree – How It Executes Source Code

parsingprogramming-languages

After researching how a parser generates an AST, I believe that I can know attempt to create one. Before I started on this project, I began to ponder what I should be done next after creating a AST that represented my language grammar. Despite my research into this topic, I have not surfaced any quality resources explaining what should be done with the AST to execute the source code.

Take this example for instance:

var = 10 + 2.

A parser might create an AST that is similar to this:

     =
    / \
  var  +
      / \
     10  2

What would be done next with the example AST above. Would the parser record the variable and its value? Or does the parser simply generate the AST, and its up to some other program to evaluate the AST.

It seems to me that creating a AST is making more work for the rest of the program. Whatever reads over the AST has to keep track of all kinds of statements and scopes. Would it not make more sense to just group your tokens into statements, and execute each statement individually without a AST?


Note: My question is not a duplicate of Is an AST enough to build any translator?. The OP of that question is asking if a AST is enough to implement any language feature?. I'm asking how one would execute the source code of a language from an AST?. some parts of the post may be similar, but the overall questions of each one are very different.

Best Answer

The parser just dissects the code into logically meaningful parts. It is then up to a following processing step to do something with the parse result.

The following step (in your case a code interpreter) can focus on the semantics. The parser did all the dirty work, its result is clean and simple. So it does make things easier. Note that you start out with a piece of text (one string or a collection of lines) and the parser gives you an object model. It seems you have no clear picture of an AST, of what that means in terms of an implementation in code. So let's get into that.

An execution engine would read the tree and find one object: an assignnent. The assignment class contains two properties: left and right. Left is a variable object, right is an expression object.

The expression object is a tree too which is basically a bunch of objects that contain other objects. These are typically processed recursively, as the processor hits the first object in the tree it does not know or care how deep the tree is, it just processes that one object and delegates the processing of contained objects to the same code. When that code hits a leaf (an object that is not complex, that has no contained objects) it performs an action and/or returns a result. Then everything bubbles back up the call stack until the top object gets its result and can do its thing, in your example add 10 and 2.

So what makes it easy is the ability to walk down a nested collection of objects and solve it recursively. Parsing text and acting on it in one go can get messy soon and may produce half-baked results because there may be syntax errors further down and you already started executing.