C++ – Alternatives to the Visitor Design Pattern

cvisitor-pattern

I have been trying to come up with a method to "serialize" various objects into various different formats. For example:

class Shape {
public:
    virtual std::string_view name() const = 0;
    virtual double area() const = 0;
};

class Square : public Shape {
...
};

class Triangle: public Shape {
...
};

Suppose I have the two types of shapes above. Now, I want to be able to serialize (and eventually deserialize) these different class into different formats (e.g., string, JSON, bytes, …)

The first solution is to perform the serialization and deserialization within the class itself (this is what overriding the insertion operator does). However, if I start adding different types of serialization, I have to modify every single Shape class.

class Shape {
public:
    ...
    virtual std::string serializeToString() const = 0;
    virtual json_object serializeToJSON() const = 0;
    //Repeat for every type of serialized output...
};

The second solution I found was to use the visitor pattern. Using that pattern, I can create a different visitor for each type of serialized format. And, since I have much less serialized formats than visitors, I suppose it is acceptable that you have to modify every visitor class when a new Shape class is added.

class Shape {
public:
    virtual void accept(ShapeVisitor& v) = 0;
};

class Square : public Shape;
class Triangle : public Square;

class ShapeVisitor {
public:
    virtual void visit(const Square& s) = 0;
    virtual void visit(const Triangle& s) = 0;
};

class StringShapeVisitor : public ShapeVisitor {
public:
    void visit(const Square& s) const;
    void visit(const Triangle& s) const;
};

But of course, the problem with the visitor pattern is the visitors have no way to access the private data of each class. And since this is serialization I am talking about, I have to access every single private data member which I cannot see how to do without breaking encapsulation of the shapes completely.

A third option I thought of is just using some form of templates and template specialization to choose the correct function for serialization based upon the format and class. The problem is, this doesn't work at runtime on a generic Shape instance…

So my questions are:

Is there a modification to the visitor pattern which overcomes the private data problem?
Is there an alternative to the visitor pattern that would ideally not require updating a the same classes over and over?
Whatever method I use, is there ay way to make the process as "reversible" as possible (e.g, for deserialization)?

Best Answer

What you need is an intermediary unified representation. The problem now is that your serialization procedures need to understand the details/semantics of the various shape types. Instead, what you could do is provide the shapes with the ability to return a self-describing unified representation of some sort, that the serialization code can just treat as generic structured data, without needing to understand what the data means in the context of a specific shape.

Depending on what you're doing and on what exactly the data that's associated with the shapes is, you might come up with different schemes for this intermediary representation. E.g., it could just be some metadata followed by a list of key-value pairs ({ "width": 1.0, "height": 1.0 }, or perhaps you'd treat shapes as polygons and use a list of vertices and edges. Understand your goals and constraints and try to come up with some scheme that's suitable for what you're doing. Note that, for the shape polymorphism to be useful, there should be parts of your application that are able to work entirely through the abstract Shape interface, never requiring to know any details of the concrete shapes. If there are aspects of the application (other than serialization) for which this doesn't quite work, perhaps you can make use of this unified representation there too - if you design it well.

You'd then create various shape serializers (which may or may not form a hierarchy) that take the unified representation and output different formats. For deserialization, reconstitute the unified representation and pass it along to the shape, or a factory associated with the shape, or a shape prototype. A factory would have to be aware of different Shape subtypes, but this knowledge would be confined there (in a single place).

So, something like

class Shape {
public:
    virtual UnifiedRepr toUnifiedRepr() const = 0;
    ...
};

class Square : public Shape {
...
};

class Triangle: public Shape {
...
};

// ----------------
// Elsewhere:  
void JSONSerializer::serialize(const Shape& shape) {
    UnifiedRepr inputData = shape.toUnifiedRepr();
    // encode as JSON
    // ...
}

Related Solutions

Visitor Pattern Design – How to Implement Different Kinds of Traversal for a Tree

One thing about Visitor Patterns is misconception, that it is somehow tied to tree-like structure. Which is quite wrong. The question sounds as if it was doing just that. So first thing would be fixing this misconception. And then I would exactly like you said. Create 3 different iterators one for each type of traversal.

But this depends on complexity of the tree. If each node has same specified collection of children, then it is easy. The problems start when different types of nodes have different structure of children. Then visitor starts making sense. But the different types of traversal order stop making a sense, because they only work for n-arry trees, not for trees with arbitrary types of children in each node.

C++ – Visitor Pattern, replacing objects

It is just an implementation detail that the visit() method is usually void when described in C++. This is by no means a central part of the visitor Pattern. In the “Design Patterns” book by Gamma, Helm, Johnson, Vlissides, code examples are usually given in C++ and Smalltalk. Smalltalk is a very dynamic language. The example code for the Visitor Pattern in Smalltalk in that book actually returns values! It is a regex evaluator. I'll translate the code to JavaScript to make it more understandable.

The object structure (regular expressions) is made of four classes, and all of them have an accept() method that takes the visitor as an argument. In class SequenceExpression, the accept method is
function accept(aVisitor) {
  return aVisitor.visitSequence(this);
}
[…]

The ConcreteVisitor class is REMatchingVisitor. […] Its methods […] return the set of streams that the expression would match to identify the current state.
function visitSequence(sequenceExp) {
  this.inputState = sequenceExp.expression1.accept(this);
  return sequenceExp.expression2.accept(this);
}

...

function visitAlternation(alternateExp) {
  var originalState = this.inputState;
  var finalState = alternateExp.alternative1.accept(this);
  this.inputState = originalState;
  finalState.addAll(alternateExp.alternative2.accept(this));
  return finalState;
}

function visitLiteral(literalExp) {
  var finalState = new Set();
  this.inputState.foreach(function (stream) {
    var tStream = stream.copy();
    if (tStream.nextAvailable(literalExp.value.size) == literalExp.value)) {
      finalState.add(tStream);
    }
  });
  return finalState;
}

So why isn't this done in C++? Why are all visit() methods void? Because the virtual accept() methods wouldn't know what to return. Each Visitor may return a different type. While the accept method should just pass that value through, C++ would need the precise type. This can be expressed with templates, but virtual methods can't be templated methods. (The point of a virtual method is that the implementation only gets resolved at runtime (late binding), whereas templates must be fully evaluated at compile time. Since at compile time it wouldn't be known which method would be selected, the template wouldn't be invoked).

The usual workaround is to store the return value in an instance field of the visitor object. This is awkward, and either requires a default-constructible return value or pointer indirection. While this is semantically equivalent to directly returning a value, this makes using the visitor so awkward that I often create a wrapper function to run the visitor:

class Visitor;

class Base {
public:
  virtual void accept(Visitor&) const = 0;
};

class A;
class B;

class Visitor {
public:
  virtual void visitA(A const&) = 0;
  virtual void visitB(B const&) = 0;
};

class A { … };
class B { … };

class ConcreteVisitor : public Visitor final {
  int mResult;
public:
  ConcreteVisitor() : mResult(0) {}
  void visitA(A const& a) override { mResult = 1; }
  void visitB(B const& a) override { mResult = 2; }
  int result() const { return mResult; }
};

int runConcreteVisitor(Base const& base) {
  ConcreteVisitor v;
  base.accept(v);
  return v.result();
}

Each visitor effectively behaves like a (polymorphic) function on the given input hierarchy, or like extension methods. By stuffing additional arguments and return values into member variables, we can model arbitrary function signatures in C++. (Function arguments other than the element to be visited correspond to constructor arguments of the visitor).

But this is just a workaround. If all your visitors will have an effective signature Expression visitor(Expression const&), then you can write all your accept methods as

 virtual Expression accept(Visitor& v) {
   return v.acceptAddition(*this);
 }

For AST operations, this is sometimes all you need. But once you have different “return” types for your visitors, you will need to either duplicate all accept/visit methods for the other type (basically, manual templates), or will have to use the visitor member variable workaround. In that light, it might be sensible to use the workaround from the start. By using runVisitor(…) helpers, this becomes slightly less error prone because you don't have to remember to retrieve the return value, you are given one directly. If your visitor is recursive, this also means you should avoid unnecessary mutable state directly in your visitor since a new visitor is created for each accept/visit invocation.

Best Answer

Related Solutions

Visitor Pattern Design – How to Implement Different Kinds of Traversal for a Tree

C++ – Visitor Pattern, replacing objects

Related Topic