C++ – Implementing copy-on-write

cdesign

I have a rather large class that contains a number of related member variables. These member variables can be grouped into related sections. We've noticed that the pattern of use of the class is to make a copy of an existing instance, change only a few of the member variables, and then do work with the resulting instance. There are 2 problems we've hit.

  1. Instances of the class are large and often on the stack, so it would be nice to reduce its size
  2. Copying all the member variables is somewhat slow.

So one idea we had was to group related member variables into their own objects. This is just a good idea for making the code easier to understand. But if we did this and had each of these objects be merely a shared pointer to an existing instance, it would reduce the size of the main class as several member variables would be replaced with a single shared pointer. It would also speed up copying, as the copy would just be creating a new shared pointer pointing to the existing instance.

The problem is that now whenever we have to write to anything in one of these shared objects, we need to make a copy of it on first write. (And we need to check on every write whether it needs to be copied or not.) We don't write very often, so I don't think it will be a performance hit. But implementing this seems problematic.

Is there a way to make a class be copy-on-write when one of its member variables needs to be changed? The class itself is written in C++, though I could also use C or Objective-C to implement this. I don't want to have to manually write the check of the shared pointer's reference count in every setter, for example. Is there any way to avoid that?

Best Answer

C++ does not have built-in copy on write semantics. You can implement such semantics with a smart pointer class, or can choose a design-level approach that makes copies unnecessary, e.g. by using the decorator pattern. However, you may find that these optimizations are not worth their effort.

Copy on Write Smart Pointer

The idea is that your members are stored in a smart pointer that manages their ownership. When the smart pointer is copied, it will not copy the managed object. However, once write-access is required, a copy is made.

This is not entirely trivial to get right, especially if there may be other pointers into the managed object. Some CoW implementations trigger only when a copy wants to write to the object, other implementations also perform a copy when the original owner writes to the (shared) object.

You will therefore have to write your own smart pointer that implements your desired semantics. Likely, this can be implemented with moderate effort on top of std::shared_ptr.

A significant drawback of CoW is that these lazy copies can make your code much harder to debug. Copying a C++ object may have observable side effects, but may now happen at unexpected times. Also, every write access must first check the state of the smart pointer and must possibly dereference multiple pointers.

Decorator Pattern

The decorator pattern can be used to overlay parts of an object with a different implementation. For example, you might want to overlay parts of a struct { A a; B b; C c; }. First, we need to define an interface so that we can combine our Decorators:

class Data {
public:
  virtual A& get_a() = 0;
  virtual B& get_b() = 0;
  virtual C& get_c() = 0;
};

Now we can implement this interface with a base storage class:

class BaseStorage : public Data {
  A a; B b; C c;
public:
  BaseStorage(A const& a, B const& b, C const& c) : a(a), b(b), c(c) {}
  A& get_a() override { return a; }
  B& get_b() override { return b; }
  C& get_c() override { return c; }
};

If we want to overlay the value of A, we can define a class that delegates all calls to a base object, except for requests of the A data:

class OverlayA : public Data {
  Data& base;
  A a;
public:
  OverlayA(Data& base, A&& a) : base(base), a(std::move(a)) {}
  A& get_a() override { return a; }
  B& get_b() override { return base.get_b(); }
  C& get_c() override { return base.get_c(); }
};

Instead of performing a copy of some Data, we can now overlay the part of it we are going to change:

Data& orig = ...;
// call a function with a copied A
will_change_a(OverlayA(orig, A(orig.get_a())));

Unfortunately, these decorators make it really really difficult to write const-correct code – you may want the Data& base to be a const reference in order to prevent writes to the base storage, but the interface may also describe overlaid data where writes are necessary. This could possibly be expressed through templates.

If behaviour of any Data interface implementation changes the private fields directly rather than going through the virtual methods, that may not write to the overlaid data. Therefore, the Data interface should not contain extra behaviour. You can implement such behaviour in a separate class that wraps a Data&.

This technique requires you to know in advance which parts of the object must be overlaid with copied data. As such, it is potentially error-prone.

The decorator pattern constructs a linked list of overlays. If the list grows many levels deep, this pointer chasing may hurt performance.

Copies are good

In many cases, creating a copy is preferable to cleverness like CoW. In particular, if the objects in question are not very large and are trivially copyable, then doing a copy each time may turn out to be cheaper than the alternatives. Because of caching, chasing pointers tends to be expensive compared to operations on contiguous objects. Both CoW and Decorators have continuous per-read overhead, whereas copies only have a predictable per-copy overhead.

If your profiling determines that the copies are a performance problem, then it may be sensible to try out one of these approaches or a combination of them (e.g. performing a copy but using CoW for expensive to copy members like vectors might be sensible). Where possible, avoid ownership of expensive to copy data in your object.