C++ Move Semantics – Move-Return of Local Variables

cc++11

My understanding is that in C++11, when you return a local variable from a function by value, the compiler is allowed to treat that variable as an r-value reference and 'move' it out of the function to return it (if RVO/NRVO doesn't happen instead, of course).

My question is, can't this break existing code?

Consider the following code:

#include <iostream>
#include <string>

struct bar
{
  bar(const std::string& str) : _str(str) {}
  bar(const bar&) = delete;
  bar(bar&& other) : _str(std::move(other._str)) {other._str = "Stolen";}
  void print() {std::cout << _str << std::endl;}

  std::string _str;
};

struct foo
{
  foo(bar& b) : _b(b) {}
  ~foo() {_b.print();}

  bar& _b;
};

bar foobar()
{
  bar b("Hello, World!");
  foo f(b);

  return std::move(b);
}

int main()
{
  foobar();
  return EXIT_SUCCESS;
}

My thoughts were that it would be possible for a destructor of a local object to reference the object that gets implicitly moved, and therefore unexpectedly see an 'empty' object. I tried to test this (see http://ideone.com/ZURoeT ), but I got the 'correct' result without the explicit std::move in foobar(). I'm guessing that was due to NRVO, but I didn't try to rearrange the code to disable that.

Am I correct in that this transformation (causing a move out of the function) happens implicitly and could break existing code?

UPDATE
Here is an example which illustrates what I'm talking about. The following two links are for the same code.
http://ideone.com/4GFIRu – C++03
http://ideone.com/FcL2Xj – C++11

If you look at the output, it's different.

So, I guess this question now becomes, was this considered when adding implicit move to the standard, and it was decided that it was OK to add this breaking change as this kind of code is rare enough? I also wonder if any compilers will warn in cases like this…

Best Answer

Scott Meyers posted to comp.lang.c++ (August 2010) about a problem where implicit generation of move constructors could break C++03 class invariants:

struct X
{
  // invariant: v.size() == 5
  X() : v(5) {}

  ~X() { std::cout << v[0] << std::endl; }

private:    
  std::vector<int> v;
};

int main()
{
    std::vector<X> y;
    y.push_back(X()); // X() rvalue: copied in C++03, moved in C++0x
}

Here the problem is that in C++03, X had an invariant that its v member always had 5 elements. X::~X() counted on that invariant, but the newly-introduced move constructor moved from v, thereby setting its length to zero.

This is related to your example since the broken invariant is only detected in the X's destructor (as you say it's possible for a destructor of a local object to reference the object that gets implicitly moved, and therefore unexpectedly see an empty object).

C++11 try to achieve a balance between breaking some of existing code and providing useful optimizations based on move constructors.

Committee initially decided that move constructors and move assignment operators should be generated by the compiler when not provided by the user.

Then decided that this was indeed cause for alarm and it restricted the automatic generation of move constructors and move assignment operators in such a way that it is much less likely, though not impossible, for existing code to break (e.g. explicitly defined destructor).

It’s tempting to think that preventing the generation of implicit move constructors when a user-defined destructor is present is enough but it's not true (N3153 - Implicit Move Must Go for further details).

In N3174 - To Move or not to Move Stroupstrup says:

I consider this a language design problem, rather than a simple backwards compatibility problem. It is easy to avoid breaking old code (e.g. just remove move operations from C++0x), but I see making C++0x a better language by making move operations pervasive a major goal for which it may be worth breaking some C++98 code.

Related Solutions

C++11 Parameter Passing – Simplifying Optimal Parameter Passing When a Copy is Needed

Sadly, no, because there's too many cases. In your sample, you use std::string@ to represent the perfectly forwarded type of an object that should be perfectly forwarded to a std::string constructor, and say "A similar code could be written for setters.". But you're wrong. You'd need another seperate syntax for assignment. For instance, I can construct a std::vector<anything> from an int, but I can't assign an int to a std::vector<anything>. So I'd need like std::vector<anything># for assignments. And what about the + operator? If I want to perfect forward a RHS to a member's operator+, then I'd need a notation for that too. And it can't be an existing symbol like + or that would make C++ much harder to parse than it already is! So you can see that this doens't apply universally how you appear to think it does.

Secondly, I disagree that the existing boilerplate doesn't scale well. It scales linearly, which is pretty well I think. (Note that the members and the mem-init-list boilerplate is required in any case and is thus not part of the scaling. Even if it were, that's still linear)

class Person
{
    std::string m_name;
    std::string m_address;
    std::string m_nickname;
    std::string m_phonenumber;
    std::string m_comment;

public:   
    template <class T, class U, class V, class W, class X,
              class = typename std::enable_if <
                  std::is_constructible<std::string, T>::value &&
                  std::is_constructible<std::string, U>::value &&
                  std::is_constructible<std::string, V>::value &&
                  std::is_constructible<std::string, W>::value &&
                  std::is_constructible<std::string, X>::value
              >::type>
    explicit Person(T&& name, U&& addr, V&& nick, W&& phone, X&& comment) 
        : m_name(std::forward<T>(name)), 
          m_address(std::forward<T>(addr)),
          m_nickname(std::forward<T>(nick)),
          m_phonenumber(std::forward<T>(phone)),
          m_comment(std::forward<T>(comment)),
    {
    }

    ...
};

Third: This is only needed when you need to pass an unknown type perfectly to the member, which is very rare. Normally, you'd just take all the members as std::string by value, and move them into the members, which is amazingly close to optimal considering how amazingly easy it is.

C++ Move Semantics – Real World Performance Improvements

If you need an overview of the benefits and best-practices on move semantics, please watch some of the conference recordings on the isocpp website.
(At the bottom there's a link to older recordings.)

Bjarne Stroustrup provide a prime motivating example on his website.

http://www.stroustrup.com/C++11FAQ.html#rval

Just consider the typical implementation of std::swap, assuming that this method does not have special access to the type.
_{The sample code and comments below are copied verbatim from the link above.}

template<class T> swap(T& a, T& b)      // "old style swap"
{
    T tmp(a);   // now we have two copies of a
    a = b;      // now we have two copies of b
    b = tmp;    // now we have two copies of tmp (aka a)
}

When new objects are created, it incurs the cost of copying that object. Most of the time, this implies deep copying - share nothing, because each object must be prepared to be independently modifiable, because there's nothing to imply otherwise.

But in this example, it is clear that tmp is a temporary. What can we do to avoid the cost of deep copying in this case?

As @DocBrown points out in comment, the benefits of move semantics is dependent on:

The coding style
The implementation of data structures used most heavily in the code

In object oriented programming, there is a contentious issue: copying or sharing? (Another contentious issue is the mutable or immutable.)

Most software programs will spend time copying stuff. The questions are:

Does the situation require copying?
Is there a cheaper way of copying?

If two or more instances of code need access to the same object, and if all of these instances promise they will never modify the object (i.e. cause its states to change), then perhaps sharing the object reference (by pointer or other means) may be sufficient.

If one instance of code needs to make a copy so that the object can be modified, it will not benefit from most "make copy cheap" scheme.

Sometimes it is a middle ground. An object has multiple properties, and the code wants to make a copy so that one or several properties can be modified. In this case, "make copy cheap" would require one to allow sharing of unchanged properties between the old and new object. (Note: move semantics does not enable this. I mention this because move semantics have to face a number of competing other kinds of semantics.)

C++ code that is written to a C style, with its heavy use of pointers, may not see any benefit, because such code already freely share any data structure by sharing pointers, and do so without much syntactic safeguards.

C++ code that already implements reference counting (such as OpenCV's Mat class), Microsoft COM pointers (com_ptr_t), etc., allow multiple instances of code to share the same piece of data.

The kind of C++ code that may benefit from move semantics are those that

Mainly rely on STL data structures (most importantly std::vector),
Uses "value semantics" heavily (makes objects immutable, makes copies of objects heavily, prefers copying values to sharing references), and
In order for its performance improvements to be measurable,
- It should be doing some heavy-lifting (i.e. the amount of data and computation should be reasonably big to be measurable)
- It should not be dominated by other types of bottlenecks (such as disk, IO, database, etc.).

One may say that each of those factors are questionable, and rightly so.

There are C++ programs that implement their own reference counting, reference-sharing schemes, lazy (on-demand) evaluation, asynchronous operations or promise-futures, etc., long before C++11 was conceived. These C++ programming environments chose a trajectory that make them largely independent of the evolutions of C++. From a historical perspective, they might be right, because the evolutions of C++ had apparently been stagnant for a decade or so, where most of the innovations are thought to be doable with library code (such as the Boost Libraries) without requiring changes to the language standard.

Best Answer

Related Solutions

C++11 Parameter Passing – Simplifying Optimal Parameter Passing When a Copy is Needed

C++ Move Semantics – Real World Performance Improvements

Related Topic