C++ Move Semantics – Real World Performance Improvements

cc++11memoryperformance

(I've asked a similar question on SO but unfortunately it might not be proper, so I also put here; please kindly point out if you think it's a duplicate.)

I've heard many words about the move semantics (essentially rvalue) introduced in C++11. In theory, it should bring much performance improvement due to the fact it avoids unnecessary copies.

However, there have already been some optimizations for legacy code during compilation to deal with the inefficient temporary copies, such as:

(named) return value optimizations
constructor copy elision

And additionally for frequently used data structures, some C++ standard libraries use special optimizations (e.g., small string optimization for std::string).

More importantly, although some pieces of legacy code are really inefficient they don't result in much latency since they are

not frequently invoked
modern computers have enough physical memories for them

So I'm asking: are there real-world examples that greatly accelerate the performance when using modern C++ (C++11/14/17) syntax, or improve the performance by a reasonable percentage (e.g., >10%) overall?

I expect that the answer can be any of the 3 categories:

#include <vector>
using std::vector;
using std::size_t;

size_t const MAX = NNN;  // NNN is specified by -DNNN=xxx option
size_t const NUM = NNN/100;

vector<int> factory(size_t size) {
  vector<int> v;
  for (size_t i = 0u; i < size; ++i) {
    v.push_back(i);
  }
  return v;
}

// Version 1
/// void doubles(vector<int> & v) {
// Version 2
void doubles(vector<int> && v) {
  for (size_t i = 0u; i < v.size(); ++i) {
    v[i] = v[i] * 2;
  }
}

int main() {
// Version 1
/// vector<int> v = factory(MAX);
/// doubles(v);
// Version 2
  doubles(factory(MAX));
}

Some performance bugs/bottlenecks existing in real world repositories that can be handled modern C++ well.
Some profiling for a piece of code that show the improvement.
Starting by modifying some of the above trivial code to help me get an example that can bring the performance benefit.

And the improvement can still be viewed with default (e.g., -O0) compilation options (so -fno-elide-constructors is not allowed during compilation) by gcc or clang or MSVC.

I ask this question because I was doing a survey on the move semantics impact on performance for real world programs but after I tried some (trivial) code and did some basic profiling myself, I found that I simply cannot find the significant differences. So please forgive me if you feel it stupid/pedantic.

Best Answer

If you need an overview of the benefits and best-practices on move semantics, please watch some of the conference recordings on the isocpp website.
(At the bottom there's a link to older recordings.)

Bjarne Stroustrup provide a prime motivating example on his website.

http://www.stroustrup.com/C++11FAQ.html#rval

Just consider the typical implementation of std::swap, assuming that this method does not have special access to the type.
_{The sample code and comments below are copied verbatim from the link above.}

template<class T> swap(T& a, T& b)      // "old style swap"
{
    T tmp(a);   // now we have two copies of a
    a = b;      // now we have two copies of b
    b = tmp;    // now we have two copies of tmp (aka a)
}

When new objects are created, it incurs the cost of copying that object. Most of the time, this implies deep copying - share nothing, because each object must be prepared to be independently modifiable, because there's nothing to imply otherwise.

But in this example, it is clear that tmp is a temporary. What can we do to avoid the cost of deep copying in this case?

As @DocBrown points out in comment, the benefits of move semantics is dependent on:

The coding style
The implementation of data structures used most heavily in the code

In object oriented programming, there is a contentious issue: copying or sharing? (Another contentious issue is the mutable or immutable.)

Most software programs will spend time copying stuff. The questions are:

Does the situation require copying?
Is there a cheaper way of copying?

If two or more instances of code need access to the same object, and if all of these instances promise they will never modify the object (i.e. cause its states to change), then perhaps sharing the object reference (by pointer or other means) may be sufficient.

If one instance of code needs to make a copy so that the object can be modified, it will not benefit from most "make copy cheap" scheme.

Sometimes it is a middle ground. An object has multiple properties, and the code wants to make a copy so that one or several properties can be modified. In this case, "make copy cheap" would require one to allow sharing of unchanged properties between the old and new object. (Note: move semantics does not enable this. I mention this because move semantics have to face a number of competing other kinds of semantics.)

C++ code that is written to a C style, with its heavy use of pointers, may not see any benefit, because such code already freely share any data structure by sharing pointers, and do so without much syntactic safeguards.

C++ code that already implements reference counting (such as OpenCV's Mat class), Microsoft COM pointers (com_ptr_t), etc., allow multiple instances of code to share the same piece of data.

The kind of C++ code that may benefit from move semantics are those that

Mainly rely on STL data structures (most importantly std::vector),
Uses "value semantics" heavily (makes objects immutable, makes copies of objects heavily, prefers copying values to sharing references), and
In order for its performance improvements to be measurable,
- It should be doing some heavy-lifting (i.e. the amount of data and computation should be reasonably big to be measurable)
- It should not be dominated by other types of bottlenecks (such as disk, IO, database, etc.).

One may say that each of those factors are questionable, and rightly so.

There are C++ programs that implement their own reference counting, reference-sharing schemes, lazy (on-demand) evaluation, asynchronous operations or promise-futures, etc., long before C++11 was conceived. These C++ programming environments chose a trajectory that make them largely independent of the evolutions of C++. From a historical perspective, they might be right, because the evolutions of C++ had apparently been stagnant for a decade or so, where most of the innovations are thought to be doable with library code (such as the Boost Libraries) without requiring changes to the language standard.

Related Solutions

Performance Benefits of Using Immutable Objects in Scripting Languages

In CPython, allocation of tuples (basically immutable lists) can be slightly faster than an allocation of the equivalent mutable type with the same items. I vaguely remember similar rumors about immuable sets, but timeit didn't confirm it. Tuples can also be smaller than lists, from a few bytes to 50% (if you hit a spot where the list had to resize, and doubled its capacity to speed up future growth). Moreover, tuples and sets are - in some contexts - subject to peephole optimizations which allow creating the object once and re-using it instead of re-building it every time.

But that's mostly peanuts. It's not why we use these types, at least usually. If ever, these differences are only invoked when seriously optimizing. It's not an aspect I usually consider when deciding for either, especially since there are other benefits, not to mention the enormous semantic impact.

Alternatives to Optional Types in Multithreaded C++ Environments

As long as the task is not to design a 100% generic lib for each and every case, only a queue for types "T" under your teams control, I would prefer the bool try_dequeue(T* t) approach, or even better bool try_dequeue(T& t) (since T* t must never be NULL). By this design, the usage will typically look like this:

  T t;
  if(try_dequeue(t))
  {
     // do something  with t;
  }

and the implementation will look like this:

  bool try_dequeue(Type &t)
  {
      // ... make things thread-safe
      // ... return false if queue is empty
      // ... find index to your storage array
      t=queueStorage[foundIndex];
      // ... remove element at "foundIndex" from "queueStorage"
      return true;
  }

This imposes some requirements on your type T: it will need a "simple" (probably parameterless) ctor, and it needs a "copy assignment operator". A copy constructor won't be enough. For many real-world types, these requirements are easy to fulfill (especially when they are under your control). Note, that according to the rule of three it is most times good idea to implement not only a user defined copy ctor alone, so if your current type T has only such a copy ctor, consider to add a copy assignment operator as well.

Best Answer

Related Solutions

Performance Benefits of Using Immutable Objects in Scripting Languages

Alternatives to Optional Types in Multithreaded C++ Environments

Related Topic