C++ – Visitor Pattern, replacing objects

cvisitor-pattern

I have a program that translates a DSL to C++, which uses a Visitor pattern on the intermediate representation.

I quite often need to replace the currently processed node with one of a different type (e.g. replacing the "unresolved type" with the type definition).

What is a good pattern to do that?

I've tried:

  1. storing a pointer to the place where the current object is referenced from

    This is massively ugly, but keeps a "standard" visitor pattern, where the visitor itself keeps all of its state as member variables, and the visit method does not return a value.

    The difficulty here is that this is fairly error prone — if I forget to store the pointer before descending in the tree, I may overwrite the wrong reference.

  2. returning a replacement object from the visit method

    The next layer up is responsible for replacing the current object with the newly returned object — deletion is represented by returning a NULL pointer, and no change is represented by returning the old object.

    This is significantly easier to get right, because I can get diagnostics if I accidentally drop a return code, but I think it is still somewhat ugly.

Are there better options?

Best Answer

It is just an implementation detail that the visit() method is usually void when described in C++. This is by no means a central part of the visitor Pattern. In the “Design Patterns” book by Gamma, Helm, Johnson, Vlissides, code examples are usually given in C++ and Smalltalk. Smalltalk is a very dynamic language. The example code for the Visitor Pattern in Smalltalk in that book actually returns values! It is a regex evaluator. I'll translate the code to JavaScript to make it more understandable.

The object structure (regular expressions) is made of four classes, and all of them have an accept() method that takes the visitor as an argument. In class SequenceExpression, the accept method is

function accept(aVisitor) {
  return aVisitor.visitSequence(this);
}

[…]

The ConcreteVisitor class is REMatchingVisitor. […] Its methods […] return the set of streams that the expression would match to identify the current state.

function visitSequence(sequenceExp) {
  this.inputState = sequenceExp.expression1.accept(this);
  return sequenceExp.expression2.accept(this);
}

...

function visitAlternation(alternateExp) {
  var originalState = this.inputState;
  var finalState = alternateExp.alternative1.accept(this);
  this.inputState = originalState;
  finalState.addAll(alternateExp.alternative2.accept(this));
  return finalState;
}

function visitLiteral(literalExp) {
  var finalState = new Set();
  this.inputState.foreach(function (stream) {
    var tStream = stream.copy();
    if (tStream.nextAvailable(literalExp.value.size) == literalExp.value)) {
      finalState.add(tStream);
    }
  });
  return finalState;
}

So why isn't this done in C++? Why are all visit() methods void? Because the virtual accept() methods wouldn't know what to return. Each Visitor may return a different type. While the accept method should just pass that value through, C++ would need the precise type. This can be expressed with templates, but virtual methods can't be templated methods. (The point of a virtual method is that the implementation only gets resolved at runtime (late binding), whereas templates must be fully evaluated at compile time. Since at compile time it wouldn't be known which method would be selected, the template wouldn't be invoked).

The usual workaround is to store the return value in an instance field of the visitor object. This is awkward, and either requires a default-constructible return value or pointer indirection. While this is semantically equivalent to directly returning a value, this makes using the visitor so awkward that I often create a wrapper function to run the visitor:

class Visitor;

class Base {
public:
  virtual void accept(Visitor&) const = 0;
};

class A;
class B;

class Visitor {
public:
  virtual void visitA(A const&) = 0;
  virtual void visitB(B const&) = 0;
};

class A { … };
class B { … };

class ConcreteVisitor : public Visitor final {
  int mResult;
public:
  ConcreteVisitor() : mResult(0) {}
  void visitA(A const& a) override { mResult = 1; }
  void visitB(B const& a) override { mResult = 2; }
  int result() const { return mResult; }
};

int runConcreteVisitor(Base const& base) {
  ConcreteVisitor v;
  base.accept(v);
  return v.result();
}

Each visitor effectively behaves like a (polymorphic) function on the given input hierarchy, or like extension methods. By stuffing additional arguments and return values into member variables, we can model arbitrary function signatures in C++. (Function arguments other than the element to be visited correspond to constructor arguments of the visitor).

But this is just a workaround. If all your visitors will have an effective signature Expression visitor(Expression const&), then you can write all your accept methods as

 virtual Expression accept(Visitor& v) {
   return v.acceptAddition(*this);
 }

For AST operations, this is sometimes all you need. But once you have different “return” types for your visitors, you will need to either duplicate all accept/visit methods for the other type (basically, manual templates), or will have to use the visitor member variable workaround. In that light, it might be sensible to use the workaround from the start. By using runVisitor(…) helpers, this becomes slightly less error prone because you don't have to remember to retrieve the return value, you are given one directly. If your visitor is recursive, this also means you should avoid unnecessary mutable state directly in your visitor since a new visitor is created for each accept/visit invocation.

Related Topic