I did not write much SIMD code for myself, but a lot of assembler code some decades ago. AFAIK using SIMD intrinsics is essentially assembler programming, and your whole question could be rephrased just by replacing "SIMD" by the word "assembly". For example, the points you already mentioned, like
the code takes 10x to 100x to develop than "high level code"
it is tied to a specific architecture
the code is never "clean" nor easy to refactor
you need experts for writing and maintaining it
debugging and maintaining is hard, evolving really hard
are in no way "special" to SIMD - these points are true for any kind of assembly language, and they are all "industry consensus". And the conclusion in the software industry is also pretty much the same as for assembler:
don't write it if you don't have to - use a high level language whereever possible and let compilers do the hard work
if the compilers are not sufficient, at least encapsulate the "low level" parts in some libraries, but avoid to spread the code all over your program
since it is almost impossible to write "self-documenting" assembler or SIMD code, try to balance this by lots of documentation.
Of course, there is indeed a difference to the situation with "classic" assembly or machine code: today, modern compilers typically produce high quality machine code from a high level language, which is often better optimized than assembler code written manually. For the SIMD architectures which are popular today, the quality of the available compilers is AFAIK far below that - and maybe it will never reach that, since automatic vectorization is still a topic of scientific research. See, for example, this article which describes the differences in opimization between a compiler and a human, giving a notion that it might be very hard to create good SIMD compilers.
As you described in your question already, there exist also a quality problem with current state-of-the-art libraries. So IMHO best we can hope is that in the next years the quality of the compilers and libraries will increase, maybe the SIMD hardware will have to change to become more "compiler friendly", maybe specialized programming languages supporting easier vectorization (like Halide, which you mentioned twice) will become more popular (wasn't that already a strength of Fortran?). According to Wikipedia, SIMD became "a mass product" around 15 to 20 years ago (and Halide is less than 3 years old, when I interpret the docs correctly). Compare this to the time compilers for "classic" assembly language needed to become mature. According to this Wikipedia article it took almost 30 years (from ~1970 to the end of the 1990s) until compilers exceeded the performance of human experts (in producing non-parallel machine code). So we may just have to wait more 10 to 15 years until the same happens to SIMD-enabled compilers.
If you need an overview of the benefits and best-practices on move semantics, please watch some of the conference recordings on the isocpp website.
(At the bottom there's a link to older recordings.)
Bjarne Stroustrup provide a prime motivating example on his website.
http://www.stroustrup.com/C++11FAQ.html#rval
Just consider the typical implementation of std::swap
, assuming that this method does not have special access to the type.
The sample code and comments below are copied verbatim from the link above.
template<class T> swap(T& a, T& b) // "old style swap"
{
T tmp(a); // now we have two copies of a
a = b; // now we have two copies of b
b = tmp; // now we have two copies of tmp (aka a)
}
When new objects are created, it incurs the cost of copying that object. Most of the time, this implies deep copying - share nothing, because each object must be prepared to be independently modifiable, because there's nothing to imply otherwise.
But in this example, it is clear that tmp
is a temporary. What can we do to avoid the cost of deep copying in this case?
As @DocBrown points out in comment, the benefits of move semantics is dependent on:
- The coding style
- The implementation of data structures used most heavily in the code
In object oriented programming, there is a contentious issue: copying or sharing? (Another contentious issue is the mutable or immutable.)
Most software programs will spend time copying stuff. The questions are:
- Does the situation require copying?
- Is there a cheaper way of copying?
If two or more instances of code need access to the same object, and if all of these instances promise they will never modify the object (i.e. cause its states to change), then perhaps sharing the object reference (by pointer or other means) may be sufficient.
If one instance of code needs to make a copy so that the object can be modified, it will not benefit from most "make copy cheap" scheme.
Sometimes it is a middle ground. An object has multiple properties, and the code wants to make a copy so that one or several properties can be modified. In this case, "make copy cheap" would require one to allow sharing of unchanged properties between the old and new object. (Note: move semantics does not enable this. I mention this because move semantics have to face a number of competing other kinds of semantics.)
C++ code that is written to a C style, with its heavy use of pointers, may not see any benefit, because such code already freely share any data structure by sharing pointers, and do so without much syntactic safeguards.
C++ code that already implements reference counting (such as OpenCV's Mat
class), Microsoft COM pointers (com_ptr_t
), etc., allow multiple instances of code to share the same piece of data.
The kind of C++ code that may benefit from move semantics are those that
- Mainly rely on STL data structures (most importantly
std::vector
),
- Uses "value semantics" heavily (makes objects immutable, makes copies of objects heavily, prefers copying values to sharing references), and
- In order for its performance improvements to be measurable,
- It should be doing some heavy-lifting (i.e. the amount of data and computation should be reasonably big to be measurable)
- It should not be dominated by other types of bottlenecks (such as disk, IO, database, etc.).
One may say that each of those factors are questionable, and rightly so.
There are C++ programs that implement their own reference counting, reference-sharing schemes, lazy (on-demand) evaluation, asynchronous operations or promise-futures, etc., long before C++11 was conceived. These C++ programming environments chose a trajectory that make them largely independent of the evolutions of C++. From a historical perspective, they might be right, because the evolutions of C++ had apparently been stagnant for a decade or so, where most of the innovations are thought to be doable with library code (such as the Boost Libraries) without requiring changes to the language standard.
Best Answer
The bigger increase in performance definitely comes from hardware.
In terms of software, one of the biggest changes in the past 30 years is that we don't write nearly as much low level code as we used to. For example, software now relies on automatic compiler optimizations as opposed to hand written assembly, and makes extensive use of existing frameworks and patterns which have matured in the past few decades. On the other hand, software has become increasingly complex, and there have been corresponding performance hits.
However, hardware capabilities have improved mostly in accordance to Moore’s Law, and CPU speeds and memory bandwidth have increased hundreds of times over the past 30 years. Manufacturing processes have improved, allowing components to become smaller and faster because more transistors can be packed together. One of the biggest things which has sped up computers is memory access and usage of caching. CPU cache sizes are now bigger than total RAM used to be, and low level programs have shifted to make better use of this. Also, when 64 bit CPUs became commonplace, a corresponding instruction set (i.e. x86-64, the use of which might still qualify as “software”) was required to take proper advantage of this. In that way, it is a combination of improvements in hardware, that are taken better advantage of by shifts in software design.
In short, the biggest incremental strides in performance come from hardware – however changes to software are often required to make optimal use of new hardware. Either one doesn’t really work without the other!