C++ – Visitor Pattern, replacing objects

cvisitor-pattern

I have a program that translates a DSL to C++, which uses a Visitor pattern on the intermediate representation.

I quite often need to replace the currently processed node with one of a different type (e.g. replacing the "unresolved type" with the type definition).

What is a good pattern to do that?

I've tried:

storing a pointer to the place where the current object is referenced from

This is massively ugly, but keeps a "standard" visitor pattern, where the visitor itself keeps all of its state as member variables, and the visit method does not return a value.

The difficulty here is that this is fairly error prone — if I forget to store the pointer before descending in the tree, I may overwrite the wrong reference.
returning a replacement object from the visit method

The next layer up is responsible for replacing the current object with the newly returned object — deletion is represented by returning a NULL pointer, and no change is represented by returning the old object.

This is significantly easier to get right, because I can get diagnostics if I accidentally drop a return code, but I think it is still somewhat ugly.

Are there better options?

Best Answer

It is just an implementation detail that the visit() method is usually void when described in C++. This is by no means a central part of the visitor Pattern. In the “Design Patterns” book by Gamma, Helm, Johnson, Vlissides, code examples are usually given in C++ and Smalltalk. Smalltalk is a very dynamic language. The example code for the Visitor Pattern in Smalltalk in that book actually returns values! It is a regex evaluator. I'll translate the code to JavaScript to make it more understandable.

The object structure (regular expressions) is made of four classes, and all of them have an accept() method that takes the visitor as an argument. In class SequenceExpression, the accept method is
function accept(aVisitor) {
  return aVisitor.visitSequence(this);
}
[…]

The ConcreteVisitor class is REMatchingVisitor. […] Its methods […] return the set of streams that the expression would match to identify the current state.
function visitSequence(sequenceExp) {
  this.inputState = sequenceExp.expression1.accept(this);
  return sequenceExp.expression2.accept(this);
}

...

function visitAlternation(alternateExp) {
  var originalState = this.inputState;
  var finalState = alternateExp.alternative1.accept(this);
  this.inputState = originalState;
  finalState.addAll(alternateExp.alternative2.accept(this));
  return finalState;
}

function visitLiteral(literalExp) {
  var finalState = new Set();
  this.inputState.foreach(function (stream) {
    var tStream = stream.copy();
    if (tStream.nextAvailable(literalExp.value.size) == literalExp.value)) {
      finalState.add(tStream);
    }
  });
  return finalState;
}

So why isn't this done in C++? Why are all visit() methods void? Because the virtual accept() methods wouldn't know what to return. Each Visitor may return a different type. While the accept method should just pass that value through, C++ would need the precise type. This can be expressed with templates, but virtual methods can't be templated methods. (The point of a virtual method is that the implementation only gets resolved at runtime (late binding), whereas templates must be fully evaluated at compile time. Since at compile time it wouldn't be known which method would be selected, the template wouldn't be invoked).

The usual workaround is to store the return value in an instance field of the visitor object. This is awkward, and either requires a default-constructible return value or pointer indirection. While this is semantically equivalent to directly returning a value, this makes using the visitor so awkward that I often create a wrapper function to run the visitor:

class Visitor;

class Base {
public:
  virtual void accept(Visitor&) const = 0;
};

class A;
class B;

class Visitor {
public:
  virtual void visitA(A const&) = 0;
  virtual void visitB(B const&) = 0;
};

class A { … };
class B { … };

class ConcreteVisitor : public Visitor final {
  int mResult;
public:
  ConcreteVisitor() : mResult(0) {}
  void visitA(A const& a) override { mResult = 1; }
  void visitB(B const& a) override { mResult = 2; }
  int result() const { return mResult; }
};

int runConcreteVisitor(Base const& base) {
  ConcreteVisitor v;
  base.accept(v);
  return v.result();
}

Each visitor effectively behaves like a (polymorphic) function on the given input hierarchy, or like extension methods. By stuffing additional arguments and return values into member variables, we can model arbitrary function signatures in C++. (Function arguments other than the element to be visited correspond to constructor arguments of the visitor).

But this is just a workaround. If all your visitors will have an effective signature Expression visitor(Expression const&), then you can write all your accept methods as

 virtual Expression accept(Visitor& v) {
   return v.acceptAddition(*this);
 }

For AST operations, this is sometimes all you need. But once you have different “return” types for your visitors, you will need to either duplicate all accept/visit methods for the other type (basically, manual templates), or will have to use the visitor member variable workaround. In that light, it might be sensible to use the workaround from the start. By using runVisitor(…) helpers, this becomes slightly less error prone because you don't have to remember to retrieve the return value, you are given one directly. If your visitor is recursive, this also means you should avoid unnecessary mutable state directly in your visitor since a new visitor is created for each accept/visit invocation.

Related Solutions

C++ – Is This Observer Pattern Variant an Improvement?

Your idea was already invented 10 years ago by Herb Sutter. See this article: http://www.drdobbs.com/cpp/generalizing-observer/184403873, it contains a full explanation.

So the answer is

yes, it is an improvement over the original GOF variant of the pattern
no, it is not better than every other known implementation of the Observer pattern

Also note that what you suggested here is the "standard observer implementation" in functional languages or languages with functional elements where events/event sinks are typically implemented in a comparable way (for example, in C# using delegates).

Visitor Pattern – AST Processing and Usefulness

One thing the Visitor Pattern does that is often not talked about, is enabling to choose which side of the Expression Problem you want to tackle.

So, what is the Expression Problem? It refers to the basic problem of extensibility: our programs manipulate data types using operations. As our programs evolve, we need to extend them with new data types and new operations. And particularly, we want to be able to add new operations which work with the existing data types, and we want to add new data types which work with the existing operations. And we want this to be true extension, i.e. we don't want to modify the existing program, we want to respect the existing abstractions, we want our extensions to be separate modules, in separate namespaces, separately compiled, separately deployed, separately type checked. We want them to be type-safe.

The Expression Problem is, how do you actually provide such extensibility in a language?

It turns out that for typical naive implementations of procedural and/or functional programming, it is very easy to add new operations (procedures, functions), but very hard to add new data types, since basically the operations work with the data types using some sort of case discrimination (switch, case, pattern matching) and you need to add new cases to them, i.e. modify existing code:

func print(node):
  case node of:
    AddOperator => print(node.left) + '+' + print(node.right)
    NotOperator => '!' + print(node)

func eval(node):
  case node of:
    AddOperator => eval(node.left) + eval(node.right)
    NotOperator => !eval(node)

Now, if you want to add a new operation, say, type-checking, that's easy, but if you want to add a new node type, you have to modify all the existing pattern matching expressions in all operations.

And for typical naive OO, you have the exact opposite problem: it is easy to add new data types which work with the existing operations (either by inheriting or overriding them), but it is hard to add new operations, since that basically means modifying existing classes/objects.

class AddOperator(left: Node, right: Node) < Node:
  meth print:
    left.print + '+' + right.print

  meth eval
    left.eval + right.eval

class NotOperator(expr: Node) < Node:
  meth print:
    '!' + expr.print

  meth eval
    !expr.eval

Here, adding a new node type is easy, because you either inherit, override or implement all required operations, but adding a new operation is hard, because you need to add it either to all leaf classes or to a base class, thus modifying existing code.

Several languages have several constructs for solving the Expression Problem: Haskell has typeclasses, Scala has implicit arguments, Racket has Units, Go has Interfaces, CLOS and Clojure have Multimethods.

However, in an OO language that doesn't have a way of solving the Expression Problem (such as Java or C#), the Visitor Pattern at least allows you to "pick your poison". What the pattern does, is turn your design 90° to the side: the operations become classes (PrintVisitor, EvalVisitor) and conversely, the types become methods (visitAddOperator, visitNotOperator (or just visit, if your language supports argument-based overloading)). This does not solve the Expression Problem (i.e. how to make it easy to add both types and operations), but it does allow you to choose which one to make easy.

So, if your language does support a way to solve the Expression Problem, then you don't need this workaround.

Note, however, this is not the only thing the Visitor Pattern does.

Note: you will note the conspicuous absence of any mention of C++, whatsoever. Unfortunately, I simply don't know enough about it. I suspect that between its overloading and argument-based dispatch, virtual inheritance, free functions, macros, and most importantly compile-time template metaprogramming, the Expression Problem is solved in C++, but I don't know for sure.

The problem is that once someone finds a solution for the Expression Problem, they redefine it to make it even harder so solve, so that new solutions are even more powerful and expressive. For example, the original formulation by the Haskell community did not require modular typechecking, but the Scala community proposed that the Expression Problem should not only include modular extension (separate compilation etc.) of types and operations, but also modular typechecking and type inference of those extensions, which at the moment is something only Scala's implicits can do and Haskell's typeclasses and ML's functors can't.

Best Answer

Related Solutions

C++ – Is This Observer Pattern Variant an Improvement?

Visitor Pattern – AST Processing and Usefulness

Related Topic