C++ – Is this correct usage of C++ ‘move’ semantics

cc++-standard-libraryc++11move-semantics

Tonight I've been taking a look at some code I've been working on over the last few days, and began reading up on move semantics, specifically std::move. I have a few questions to ask you pros to ensure that I am going down the right path and not making any stupid assumptions!

Firstly:

1) Originally, my code had a function that returned a large vector:

template<class T> class MyObject
{
public:
    std::vector<T> doSomething() const;
    {
        std::vector<T> theVector;

        // produce/work with a vector right here

        return(theVector);
    }; // eo doSomething
};  // eo class MyObject

Given "theVector" is temporary in this and "throw-away", I modified the function to:

    std::vector<T>&& doSomething() const;
    {
        std::vector<T> theVector;

        // produce/work with a vector right here

        return(static_cast<std::vector<T>&&>(theVector));
    }; // eo doSomething

Is this correct? Any pitfalls in doing it this way?

2) I noticed in a function I have that returns std::string that it automatically called the move constructor. Debugging in to Return of the String (thankyou, Aragorn), I noticed it called an explicit move constructor. Why is there one for the string class and not vector?

I didn't have to make any modifications to this function to take advantage of move semantics:

// below, no need for std::string&& return value?
std::string AnyConverter::toString(const boost::any& _val) const
{
    string ret;
    // convert here
    return(ret); // No need for static_cast<std::string&&> ?
}; // eo toString

3) Finally, I wanted to do some performance tests, is the amazingly-fast results I got because of std::move semantics or did my compiler (VS2010) do some optimizing too?

(Implementation of _getMilliseconds() omitted for brevity)

std::vector<int> v;
for(int a(0); a < 1000000; ++a)
    v.push_back(a);

std::vector<int> x;
for(int a(0); a < 1000000; ++a)
    x.push_back(a);

    int s1 = _getMilliseconds();
std::vector<int> v2 = v;
    int s2 =  _getMilliseconds();
std::vector<int> v3 = std::move(x);
    int s3 =  _getMilliseconds();

    int result1 = s2 - s1;
    int result2 = s3 - s2;

The results were, obviously, awesome. result1, a standard assignment, took 630ms. The second result, was 0ms. Is this a good performance test of these things?

I know some of this is obvious to a lot of you, but I want to make sure I understand the semantics right before I go blazer on my code.

Thanks in advance!

Best Answer

A reference is still a reference. In the same way you cannot return a reference to a local in C++03 (or you get UB), you can't in C++0x. You'll end up with a reference to a dead object; it just happens to be an rvalue reference. So this is wrong:

std::vector<T>&& doSomething() const
{
    std::vector<T> local;

    return local; // oops
    return std::move(local); // also oops
}

You should just do what you saw in number two:

// okay, return by-value 
std::vector<T> doSomething() const
{
    std::vector<T> local;

    return local; // exactly the same as:
    return std::move(local); // move-construct value
}

Variables local to a function are temporary when you return, so there's no need to change any of your code. The return type is the thing responsible for implementing move semantics, not you.

You want to use std::move to explicitly move something, when it wouldn't be done normally, like in your test. (Which seems to be fine; was that in Release? You should output the contents of the vector, or the compiler will optimize it away.)

If you want to learn about rvalue references, read this.

Related Solutions

C++ – move semantics

I find it easiest to understand move semantics with example code. Let's start with a very simple string class which only holds a pointer to a heap-allocated block of memory:

#include <cstring>
#include <algorithm>

class string
{
    char* data;

public:

    string(const char* p)
    {
        size_t size = std::strlen(p) + 1;
        data = new char[size];
        std::memcpy(data, p, size);
    }

Since we chose to manage the memory ourselves, we need to follow the rule of three. I am going to defer writing the assignment operator and only implement the destructor and the copy constructor for now:

    ~string()
    {
        delete[] data;
    }

    string(const string& that)
    {
        size_t size = std::strlen(that.data) + 1;
        data = new char[size];
        std::memcpy(data, that.data, size);
    }

The copy constructor defines what it means to copy string objects. The parameter const string& that binds to all expressions of type string which allows you to make copies in the following examples:

string a(x);                                    // Line 1
string b(x + y);                                // Line 2
string c(some_function_returning_a_string());   // Line 3

Now comes the key insight into move semantics. Note that only in the first line where we copy x is this deep copy really necessary, because we might want to inspect x later and would be very surprised if x had changed somehow. Did you notice how I just said x three times (four times if you include this sentence) and meant the exact same object every time? We call expressions such as x "lvalues".

The arguments in lines 2 and 3 are not lvalues, but rvalues, because the underlying string objects have no names, so the client has no way to inspect them again at a later point in time. rvalues denote temporary objects which are destroyed at the next semicolon (to be more precise: at the end of the full-expression that lexically contains the rvalue). This is important because during the initialization of b and c, we could do whatever we wanted with the source string, and the client couldn't tell a difference!

C++0x introduces a new mechanism called "rvalue reference" which, among other things, allows us to detect rvalue arguments via function overloading. All we have to do is write a constructor with an rvalue reference parameter. Inside that constructor we can do anything we want with the source, as long as we leave it in some valid state:

    string(string&& that)   // string&& is an rvalue reference to a string
    {
        data = that.data;
        that.data = nullptr;
    }

What have we done here? Instead of deeply copying the heap data, we have just copied the pointer and then set the original pointer to null (to prevent 'delete[]' from source object's destructor from releasing our 'just stolen data'). In effect, we have "stolen" the data that originally belonged to the source string. Again, the key insight is that under no circumstance could the client detect that the source had been modified. Since we don't really do a copy here, we call this constructor a "move constructor". Its job is to move resources from one object to another instead of copying them.

Congratulations, you now understand the basics of move semantics! Let's continue by implementing the assignment operator. If you're unfamiliar with the copy and swap idiom, learn it and come back, because it's an awesome C++ idiom related to exception safety.

    string& operator=(string that)
    {
        std::swap(data, that.data);
        return *this;
    }
};

Huh, that's it? "Where's the rvalue reference?" you might ask. "We don't need it here!" is my answer :)

Note that we pass the parameter that by value, so that has to be initialized just like any other string object. Exactly how is that going to be initialized? In the olden days of C++98, the answer would have been "by the copy constructor". In C++0x, the compiler chooses between the copy constructor and the move constructor based on whether the argument to the assignment operator is an lvalue or an rvalue.

So if you say a = b, the copy constructor will initialize that (because the expression b is an lvalue), and the assignment operator swaps the contents with a freshly created, deep copy. That is the very definition of the copy and swap idiom -- make a copy, swap the contents with the copy, and then get rid of the copy by leaving the scope. Nothing new here.

But if you say a = x + y, the move constructor will initialize that (because the expression x + y is an rvalue), so there is no deep copy involved, only an efficient move. that is still an independent object from the argument, but its construction was trivial, since the heap data didn't have to be copied, just moved. It wasn't necessary to copy it because x + y is an rvalue, and again, it is okay to move from string objects denoted by rvalues.

To summarize, the copy constructor makes a deep copy, because the source must remain untouched. The move constructor, on the other hand, can just copy the pointer and then set the pointer in the source to null. It is okay to "nullify" the source object in this manner, because the client has no way of inspecting the object again.

I hope this example got the main point across. There is a lot more to rvalue references and move semantics which I intentionally left out to keep it simple. If you want more details please see my supplementary answer.

C++ – std::move(), and when should it be used

Wikipedia Page on C++11 R-value references and move constructors

In C++11, in addition to copy constructors, objects can have move constructors.
(And in addition to copy assignment operators, they have move assignment operators.)
The move constructor is used instead of the copy constructor, if the object has type "rvalue-reference" (Type &&).
std::move() is a cast that produces an rvalue-reference to an object, to enable moving from it.

It's a new C++ way to avoid copies. For example, using a move constructor, a std::vector could just copy its internal pointer to data to the new object, leaving the moved object in an moved from state, therefore not copying all the data. This would be C++-valid.

Try googling for move semantics, rvalue, perfect forwarding.

Best Answer

Related Solutions

C++ – move semantics

C++ – std::move(), and when should it be used

Related Topic