Visitor Pattern – Traversing an AST Using Visitors

cdata structuresdesign-patternsparsingvisitor-pattern

I'm writing a compiler for a C-like language, and I'm looking for an elegant way to traverse my abstract syntax tree. I'm trying to implement the Visitor pattern, although I'm not convinced that I'm doing it correctly.

struct Visitor {   
    // Expressions
    virtual void visit(AsgnExpression&);
    virtual void visit(ConstantExpression&);
    ...
    virtual void visit(Statement&);
    ...

    virtual void finished(ASTNode&);

protected:
    virtual void visit(ASTNode&) = 0;
};

visit is overloaded for each type, and by default each overload will call visit(ASTNode&) which subclasses are forced to implement. This makes it easier to do quick and dirty things, although defining a visit for each type is tedious. Each subclass of ASTNode must implement an accept method which is used to traverse the tree structure.

class ASTNode {
public:
    virtual ~ASTNode();
    virtual void accept(Visitor& visitor) = 0;
};

However, this design is quickly becoming tedious because the accept methods are often very similar.

Who should be responsible for traversing the structure, the nodes or the visitor? I'm leaning towards having ASTNode provide an iterator for accessing its children, and then having the visitor traverse the structure. If you have any experience designing Abstract Syntax Trees, please share your wisdom with me!

Best Answer

Who is responsible for the traversal depends for a large part on the analysis you want to do in your visitors, the details of the language structure and also a part personal preference.

In particular, if there are cases where the visitor to a parent node needs to take an action halfway through the processing of the children, then you must put the traversal logic in the visitor. For example, if your language has a construct where a newly introduced variable is available only in some of the child nodes of the node that introduces the variable.
Another case is when you need a mixture of pre-order and post-order traversal. With traversal in the nodes, each node must call the visitor twice, once before and once after the children. In that case, it might be easier to let the visitor do the traversal.

Otherwise, it is mostly a matter of preference. The traversal can be either in the nodes or in the visitor.

Do not use overloads in the interface of the visitor

Put the type into the method name, i.e use

IExpressionVisitor {
    void VisitPrimitive(IPrimitiveExpression expr);
    void VisitComposite(ICompositeExpression expr);
}

rather than

IExpressionVisitor {
    void Visit(IPrimitiveExpression expr);
    void Visit(ICompositeExpression expr);
}

Add an "catch unknown" method to your visitor interface.

It would make it possible for users who cannot modify your code:

IExpressionVisitor {
    void VisitPrimitive(IPrimitiveExpression expr);
    void VisitComposite(ICompositeExpression expr);
    void VisitExpression(IExpression expr);
};

This would let them build their own implementations of IExpression and IVisitor that "understands" their expressions by using run-time type information in the implementation of their catch-all VisitExpression method.

Provide a default do-nothing implementation of `IVisitor` interface

This would let users who need to deal with a subset of expression types build their visitors faster, and make their code immune to you adding more methods to IVisitor. For example, writing a visitor that harvests all variable names from your expressions becomes an easy task, and the code will not break even if you add a bunch of new expression types to your IVisitor later on.

Visitor Pattern – Implementing for an Abstract Syntax Tree

It is up to the visitor implementation to decide whether to visit child nodes and in which order. That's the whole point of the visitor pattern.

In order to adapt the visitor for more situations it is helpful (and quite common) to use generics like this (it's Java):

public interface ExpressionNodeVisitor<R, P> {
    R visitNumber(NumberNode number, P p);
    R visitBinary(BinaryNode expression, P p);
    // ...
}

And an accept method would look like this:

public interface ExpressionNode extends Node {
    <R, P> R accept(ExpressionNodeVisitor<R, P> visitor, P p);
    // ...
}

This allows to pass additional parameters to visitor and retrieve a result from it. So, the expression evaluation can be implemented like this:

public class EvaluatingVisitor
    implements ExpressionNodeVisitor<Double, Void> {
    public Double visitNumber(NumberNode number, Void p) {
        // Parse the number and return it.
        return Double.valueOf(number.getText());
    }
    public Double visitBinary(BinaryNode binary, Void p) {
        switch (binary.getOperator()) {
        case '+':
            return binary.getLeftOperand().accept(this, p)
                + binary.getRightOperand().accept(this, p);
        // More cases for other operators here.
        }
    }
}

The accept method parameter isn't used in the above example, but just believe me: it is quite useful to have one. For example, it can be a Logger instance to report errors to.

Best Answer

Related Solutions

C# Visitor Pattern – Using with Large Object Hierarchy

Do not use overloads in the interface of the visitor

Add an "catch unknown" method to your visitor interface.

Provide a default do-nothing implementation of IVisitor interface

Visitor Pattern – Implementing for an Abstract Syntax Tree

Related Topic

Provide a default do-nothing implementation of `IVisitor` interface