Object Lifetime Invariants vs. Move Semantics in C++

cc++11invariantsobject-oriented-design

When I learned C++ long ago, it was strongly emphasized to me that part of the point of C++ is that just like loops have "loop-invariants", classes also have invariants associated to the lifetime of the object — things that should be true for as long the object is alive. Things that should be established by the constructors, and preserved by the methods. Encapsulation / access control is there to help you enforce the invariants. RAII is one thing that you can do with this idea.

Since C++11 we now have move semantics. For a class that supports moving, moving from an object does not formally end its life-time — the move is supposed to leave it in some "valid" state.

In designing a class, is it bad practice if you design it so that the invariants of the class are only preserved up to the point that it is moved from? Or is that okay if it will allow you to make it go faster.

To make it concrete, suppose I have a non-copyable but moveable resource type like so:

class opaque {
  opaque(const opaque &) = delete;

public:
  opaque(opaque &&);

  ...

  void mysterious();
  void mysterious(int);
  void mysterious(std::vector<std::string>);
};

And for whatever reason, I need to make a copyable wrapper for this object, so that it can be used, perhaps in some existing dispatch system.

class copyable_opaque {
  std::shared_ptr<opaque> o_;

  copyable_opaque() = delete;
public:
  explicit copyable_opaque(opaque _o)
    : o_(std::make_shared<opaque>(std::move(_o)))
  {}

  void operator()() { o_->mysterious(); }
  void operator()(int i) { o_->mysterious(i); }
  void operator()(std::vector<std::string> v) { o_->mysterious(v); }
};

In this copyable_opaque object, an invariant of the class established at construction is that the member o_ always points to a valid object, since there is no default ctor, and the only ctor that isn't a copy ctor guarantees these. All of the operator() methods assume that this invariant holds, and preserve it afterwards.

However, if the object is moved from, then o_ will then point to nothing. And after that point, calling any of the methods operator() will cause UB / a crash.

If the object is never moved from, then the invariant will be preserved right up to the dtor call.

Let's suppose that hypothetically, I wrote this class, and months later, my imaginary coworker experienced UB because, in some complicated function where lots of these objects were being shuffled around for some reason, he moved from one of these things and later called one of its methods. Clearly it's his fault at the end of the day, but is this class "poorly designed?"

Thoughts:

It's usually bad form in C++ to create zombie objects that explode if you touch them.
If you can't construct some object, can't establish the invariants, then throw an exception from the ctor. If you can't preserve the invariants in some method, then signal an error somehow and roll-back. Should this be different for moved-from objects?
Is it enough to just document "after this object has been moved from, it is illegal (UB) to do anything with it other than destroy it" in the header?
Is it better to continually assert that it is valid in each method call?

Like so:

class copyable_opaque {
  std::shared_ptr<opaque> o_;

  copyable_opaque() = delete;
public:
  explicit copyable_opaque(opaque _o)
    : o_(std::make_shared<opaque>(std::move(_o)))
  {}

  void operator()() { assert(o_); o_->mysterious(); }
  void operator()(int i) { assert(o_); o_->mysterious(i); }
  void operator()(std::vector<std::string> v) { assert(o_); o_->mysterious(v); }
};

The assertions don't substantially improve the behavior, and they cause a slow-down. If your project does use the "release build / debug build" scheme, rather than just always running with assertions, I guess this is more attractive, since you don't pay for the checks in the release build. If you don't actually have debug builds, this seems quite unattractive though.

Is it better to make the class copyable, but not movable?
This also seems bad and causes a performance hit, but it solves the "invariant" issue in a straightforward way.

What would you consider to be the relevant "best practices" here?

Best Answer

It's usually bad form in C++ to create zombie objects that explode if you touch them.

But that's not what you're doing. You're creating a "zombie object" that will explode if you touch it wrongly. Which is ultimately no different than any other state-based precondition.

Consider the following function:

void func(std::vector<int> &v)
{
  v[0] = 5;
}

Is this function safe? No; the user can pass an empty vector. So the function has a de-facto precondition that v has at least one element in it. If it does not, then you get UB when you call func.

So this function is not "safe". But that doesn't mean it's broken. It is only broken if the code using it violates the precondition. Maybe func is a static function used as a helper in the implementation of other functions. Localized in such a way, nobody would call it in a way that violates its preconditions.

Many functions, whether namespace-scoped or class members, will have expectations on the state of a value they operate on. If these preconditions are not met, then the functions will fail, typically with UB.

The C++ standard library defines a "valid-but-unspecified" rule. This says that, unless the standard says otherwise, every object which is moved from will be valid (it is a legal object of that type), but the specific state of that object is not specified. How many elements does a moved-from vector have? It doesn't say.

This means that you cannot call any function which has any precondition. vector::operator[] has the precondition that the vector has at least one element. Since you don't know the state of the vector, you cannot call it. It would be no better than calling func without first verifying that the vector is not empty.

But this also means that functions which do not have preconditions are fine. This is perfectly legal C++11 code:

vector<int> v1 = {1, 2, 3, 4, 5};
vector<int> v2{std::move(v1)};
v1.assign({6, 7, 8, 9, 10});

vector::assign has no preconditions. It will work with any valid vector object, even one that has been moved from.

So you're not creating an object that is broken. You're creating an object who's state is unknown.

If you can't construct some object, can't establish the invariants, then throw an exception from the ctor. If you can't preserve the invariants in some method, then signal an error somehow and roll-back. Should this be different for moved-from objects?

Throwing exceptions from a move constructor is generally considered... rude. If you move an object that owns memory, then you are transferring ownership of that memory. And that doesn't usually involve anything that can throw.

Sadly, we can't enforce this for various reasons. We have to accept that throwing-move is a possibility.

It should also be noted that you don't have to follow the "valid-yet-unspecified" language. That's simply the way the C++ standard library says that movement for standard types works by default. Certain standard library types have more strict guarantees. For example, unique_ptr is very clear about the state of a moved-from unique_ptr instance: it is equal to nullptr.

So you can choose to provide a stronger guarantee if you wish.

Just remember: movement is a performance optimization, one which is usually being done on objects that are about to be destroyed. Consider this code:

vector<int> func()
{
  vector<int> v;
  //fill up `v`.
  return v;
}

This will move from v into the return value (assuming the compiler doesn't elide it). And there is no way to reference v after the move has completed. So whatever work you did to put v into a useful state is meaningless.

In most code, the likelihood of using a moved-from object instance is low.

Is it enough to just document "after this object has been moved from, it is illegal (UB) to do anything with it other than destroy it" in the header?

Is it better to continually assert that it is valid in each method call?

The whole point of having preconditions is to not check such things. operator[] has a precondition that the vector have an element with the given index. You get UB if you try to access outside the size of the vector. vector::at does not have such a precondition; it explicitly throws an exception if the vector does not have such a value.

Preconditions exist for performance reasons. They're so that you don't have to check things that the caller could have verified for themselves. Every call to v[0] doesn't have to check if v is empty; only the first one does.

Is it better to make the class copyable, but not movable?

No. In fact, a class should never be "copyable but not movable". If it can be copied, then it should be able to be moved by calling the copy constructor. This is the standard behavior of C++11 if you declare a user-defined copy constructor but don't declare a move constructor. And it's the behavior you should adopt if you don't want to implement special move semantics.

Move semantics exist to solve a very specific problem: dealing with objects that have large resources where copying would be prohibitively expensive or meaningless (ie: file handles). If your object doesn't qualify, then copying and moving are the same to you.

Related Solutions

C++ Move Semantics – Move-Return of Local Variables

Scott Meyers posted to comp.lang.c++ (August 2010) about a problem where implicit generation of move constructors could break C++03 class invariants:

struct X
{
  // invariant: v.size() == 5
  X() : v(5) {}

  ~X() { std::cout << v[0] << std::endl; }

private:    
  std::vector<int> v;
};

int main()
{
    std::vector<X> y;
    y.push_back(X()); // X() rvalue: copied in C++03, moved in C++0x
}

Here the problem is that in C++03, X had an invariant that its v member always had 5 elements. X::~X() counted on that invariant, but the newly-introduced move constructor moved from v, thereby setting its length to zero.

This is related to your example since the broken invariant is only detected in the X's destructor (as you say it's possible for a destructor of a local object to reference the object that gets implicitly moved, and therefore unexpectedly see an empty object).

C++11 try to achieve a balance between breaking some of existing code and providing useful optimizations based on move constructors.

Committee initially decided that move constructors and move assignment operators should be generated by the compiler when not provided by the user.

Then decided that this was indeed cause for alarm and it restricted the automatic generation of move constructors and move assignment operators in such a way that it is much less likely, though not impossible, for existing code to break (e.g. explicitly defined destructor).

It’s tempting to think that preventing the generation of implicit move constructors when a user-defined destructor is present is enough but it's not true (N3153 - Implicit Move Must Go for further details).

In N3174 - To Move or not to Move Stroupstrup says:

I consider this a language design problem, rather than a simple backwards compatibility problem. It is easy to avoid breaking old code (e.g. just remove move operations from C++0x), but I see making C++0x a better language by making move operations pervasive a major goal for which it may be worth breaking some C++98 code.

C++ – Choosing Between Non-Copyable Non-Movable and Move-Only Types

The way to look at this is not a matter of copy vs. move vs. immobility. It's two separate questions:

Mobility vs. immobility. Do you want it to be able to transfer the contents of the object to another object in some way?
If it's mobile, then is it fully copyable or move-only?

The thing is, the answer to either question is not always a matter of design.

mutex is not immobile because we conceptually think of a mutex as something that cannot be moved. We know exactly what it ought to mean to move a mutex: the locks on that mutex still work. It isn't immobile by design; it is immobile by implementations. Some backends could make a mobile mutex, but others cannot (without memory allocations or other overhead). Thus, we get the lowest-common-denominator.

By contrast, lock_guard is immobile by design. That's its purpose. The same goes for types like boost::scoped_ptr; the whole point is that the lifetime of the managed construct will end at the termination of that scope. While at some times we may desire to move such objects, from the perspective of those who designed them, they did not want them to be mobile.

When it comes to the question of what form of mobility a type has, the same things apply. unique_ptr is move-only because that's its job. That's why it exists: to permit the transfer of unique ownership from one location to another.

By contrast, one could conceive of the ability to copy a std::thread. You would just pause the thread's execution, copy its call stack, and start a new thread of execution with the copied call stack. That's quite simple conceptually, yet it is impossible to implement with C++'s object model. After all, you might have a scoped_ptr on the stack, a type you're not allowed to copy. You might have a pointer/reference to a stack object on the stack; how do you update it to point to the correct object?

Implementations inform interfaces. The problem with your "conceptual criteria" is that it makes you think that you can build a couple of simple rules to categorize the world, and if you stick to those categories, you'll be alright.

Programming is not that simple. You're going to have to learn how to exercise judgment; it's one of the skills that any good programmer needs.

Best Answer

Related Solutions

C++ Move Semantics – Move-Return of Local Variables

C++ – Choosing Between Non-Copyable Non-Movable and Move-Only Types

Related Topic