C++ – Merits of Copy-on-Write Semantics

cqt

I am wondering what possible merits does copy-on-write have? Naturally, I don't expect personal opinions, but real-world practical scenarios where it can be technically and practically beneficial in a tangible way. And by tangible I mean something more than saving you the typing of a & character.

To clarify, this question is in the context of datatypes, where assignment or copy construction creates an implicit shallow copy, but modifications to it creates an implicit deep copy and applies the changes to it instead of the original object.

The reason I am asking is I don't seem to find any merits of having COW as a default implicit behavior. I use Qt, which has COW implemented for a lot of the datatypes, practically all which have some underlying dynamically allocated storage. But how does it really benefit the user?

An example:

QString s("some text");
QString s1 = s; // now both s and s1 internally use the same resource

qDebug() << s1; // const operation, nothing changes
s1[o] = z; // s1 "detaches" from s, allocates new storage and modifies first character
           // s is still "some text"

What do we win by using COW in this example?

If all we intend to do is use const operations, s1 is redundant, might as well use s.

If we intend to change the value, then COW only delays the resource copy until the first non-const operation, at the (albeit minimal) cost of incrementing the ref count for the implicit sharing and detaching from the shared storage. It does look like all the overhead involved in COW is pointless.

It is not much different in the context of parameter passing – if you don't intend to modify the value, pass as const reference, if you do want to modify, you either make an implicit deep copy if you don't want to modify the original object, or pass by reference if you want to modify it. Again COW seems like needless overhead that doesn't achieve anything, and only adds a limitation that you cannot modify the original value even if you want to, as any change will detach from the original object.

So depending on whether you know about COW or are oblivious to it, it may either result in code with obscure intent and needless overhead, or completely confusing behavior which doesn't match the expectations and leaves you scratching your head.

To me it seems that there are more efficient and more readable solutions whether you want to avoid an unnecessary deep copy, or you intend to make one. So where is the practical benefit from COW? I assume there must be some benefit since in it used in such a popular and powerful framework.

Furthermore, from what I've read, COW is now explicitly forbidden in the C++ standard library. Don't know whether the con's I see in it have something to do with it, but either way, there must be a reason for this.

Best Answer

Copy on write is used in situations where you very often will create a copy of the object and not modify it. In those situations, it pays for itself.

As you mentioned, you can pass a const object, and in many cases that is sufficient. However, const only guarantees that the caller can't mutate it (unless they const_cast, of course). It does not handle multithreading cases and it does not handle cases where there are callbacks (which might mutate the original object). Passing a COW object by value puts the challenges of managing these details on the API developer, rather than the API user.

The new rules for C+11 forbid COW for std::string in particular. Iterators on a string must be invalidated if the backing buffer is detached. If the iterator was being implemented as a char* (As opposed to a string* and an index), this iterators are no longer valid. The C++ community had to decide how often iterators could be invalidated, and the decision was that operator[] should not be one of those cases. operator[] on a std::string returns a char&, which may be modified. Thus, operator[] would need to detach the string, invalidating iterators. This was deemed to be a poor trade, and unlike functions like end() and cend(), there's no way to ask for the const version of operator[] short of const casting the string. (related).

COW is still alive and well outside of the STL. In particular, I have found it very useful in cases where it is unreasonable for a user of my APIs to expect that there's some heavyweight object behind what appears to be a very lightweight object. I may wish to use COW in the background to ensure they never have to be concerned with such implementation details.

Related Topic