C++ Move Semantics – Real World Performance Improvements

cc++11memoryperformance

(I've asked a similar question on SO but unfortunately it might not be proper, so I also put here; please kindly point out if you think it's a duplicate.)

I've heard many words about the move semantics (essentially rvalue) introduced in C++11. In theory, it should bring much performance improvement due to the fact it avoids unnecessary copies.

However, there have already been some optimizations for legacy code during compilation to deal with the inefficient temporary copies, such as:

And additionally for frequently used data structures, some C++ standard libraries use special optimizations (e.g., small string optimization for std::string).

More importantly, although some pieces of legacy code are really inefficient they don't result in much latency since they are

  • not frequently invoked
  • modern computers have enough physical memories for them

So I'm asking: are there real-world examples that greatly accelerate the performance when using modern C++ (C++11/14/17) syntax, or improve the performance by a reasonable percentage (e.g., >10%) overall?

I expect that the answer can be any of the 3 categories:

#include <vector>
using std::vector;
using std::size_t;

size_t const MAX = NNN;  // NNN is specified by -DNNN=xxx option
size_t const NUM = NNN/100;

vector<int> factory(size_t size) {
  vector<int> v;
  for (size_t i = 0u; i < size; ++i) {
    v.push_back(i);
  }
  return v;
}

// Version 1
/// void doubles(vector<int> & v) {
// Version 2
void doubles(vector<int> && v) {
  for (size_t i = 0u; i < v.size(); ++i) {
    v[i] = v[i] * 2;
  }
}

int main() {
// Version 1
/// vector<int> v = factory(MAX);
/// doubles(v);
// Version 2
  doubles(factory(MAX));
}
  1. Some performance bugs/bottlenecks existing in real world repositories that can be handled modern C++ well.
  2. Some profiling for a piece of code that show the improvement.
  3. Starting by modifying some of the above trivial code to help me get an example that can bring the performance benefit.

And the improvement can still be viewed with default (e.g., -O0) compilation options (so -fno-elide-constructors is not allowed during compilation) by gcc or clang or MSVC.

I ask this question because I was doing a survey on the move semantics impact on performance for real world programs but after I tried some (trivial) code and did some basic profiling myself, I found that I simply cannot find the significant differences. So please forgive me if you feel it stupid/pedantic.

Best Answer

If you need an overview of the benefits and best-practices on move semantics, please watch some of the conference recordings on the isocpp website.
(At the bottom there's a link to older recordings.)


Bjarne Stroustrup provide a prime motivating example on his website.

http://www.stroustrup.com/C++11FAQ.html#rval

Just consider the typical implementation of std::swap, assuming that this method does not have special access to the type.
The sample code and comments below are copied verbatim from the link above.

template<class T> swap(T& a, T& b)      // "old style swap"
{
    T tmp(a);   // now we have two copies of a
    a = b;      // now we have two copies of b
    b = tmp;    // now we have two copies of tmp (aka a)
} 

When new objects are created, it incurs the cost of copying that object. Most of the time, this implies deep copying - share nothing, because each object must be prepared to be independently modifiable, because there's nothing to imply otherwise.

But in this example, it is clear that tmp is a temporary. What can we do to avoid the cost of deep copying in this case?


As @DocBrown points out in comment, the benefits of move semantics is dependent on:

  • The coding style
  • The implementation of data structures used most heavily in the code

In object oriented programming, there is a contentious issue: copying or sharing? (Another contentious issue is the mutable or immutable.)

Most software programs will spend time copying stuff. The questions are:

  • Does the situation require copying?
  • Is there a cheaper way of copying?

If two or more instances of code need access to the same object, and if all of these instances promise they will never modify the object (i.e. cause its states to change), then perhaps sharing the object reference (by pointer or other means) may be sufficient.

If one instance of code needs to make a copy so that the object can be modified, it will not benefit from most "make copy cheap" scheme.

Sometimes it is a middle ground. An object has multiple properties, and the code wants to make a copy so that one or several properties can be modified. In this case, "make copy cheap" would require one to allow sharing of unchanged properties between the old and new object. (Note: move semantics does not enable this. I mention this because move semantics have to face a number of competing other kinds of semantics.)


C++ code that is written to a C style, with its heavy use of pointers, may not see any benefit, because such code already freely share any data structure by sharing pointers, and do so without much syntactic safeguards.

C++ code that already implements reference counting (such as OpenCV's Mat class), Microsoft COM pointers (com_ptr_t), etc., allow multiple instances of code to share the same piece of data.


The kind of C++ code that may benefit from move semantics are those that

  1. Mainly rely on STL data structures (most importantly std::vector),
  2. Uses "value semantics" heavily (makes objects immutable, makes copies of objects heavily, prefers copying values to sharing references), and
  3. In order for its performance improvements to be measurable,
    • It should be doing some heavy-lifting (i.e. the amount of data and computation should be reasonably big to be measurable)
    • It should not be dominated by other types of bottlenecks (such as disk, IO, database, etc.).

One may say that each of those factors are questionable, and rightly so.

There are C++ programs that implement their own reference counting, reference-sharing schemes, lazy (on-demand) evaluation, asynchronous operations or promise-futures, etc., long before C++11 was conceived. These C++ programming environments chose a trajectory that make them largely independent of the evolutions of C++. From a historical perspective, they might be right, because the evolutions of C++ had apparently been stagnant for a decade or so, where most of the innovations are thought to be doable with library code (such as the Boost Libraries) without requiring changes to the language standard.