I did not write much SIMD code for myself, but a lot of assembler code some decades ago. AFAIK using SIMD intrinsics is essentially assembler programming, and your whole question could be rephrased just by replacing "SIMD" by the word "assembly". For example, the points you already mentioned, like
the code takes 10x to 100x to develop than "high level code"
it is tied to a specific architecture
the code is never "clean" nor easy to refactor
you need experts for writing and maintaining it
debugging and maintaining is hard, evolving really hard
are in no way "special" to SIMD - these points are true for any kind of assembly language, and they are all "industry consensus". And the conclusion in the software industry is also pretty much the same as for assembler:
don't write it if you don't have to - use a high level language whereever possible and let compilers do the hard work
if the compilers are not sufficient, at least encapsulate the "low level" parts in some libraries, but avoid to spread the code all over your program
since it is almost impossible to write "self-documenting" assembler or SIMD code, try to balance this by lots of documentation.
Of course, there is indeed a difference to the situation with "classic" assembly or machine code: today, modern compilers typically produce high quality machine code from a high level language, which is often better optimized than assembler code written manually. For the SIMD architectures which are popular today, the quality of the available compilers is AFAIK far below that - and maybe it will never reach that, since automatic vectorization is still a topic of scientific research. See, for example, this article which describes the differences in opimization between a compiler and a human, giving a notion that it might be very hard to create good SIMD compilers.
As you described in your question already, there exist also a quality problem with current state-of-the-art libraries. So IMHO best we can hope is that in the next years the quality of the compilers and libraries will increase, maybe the SIMD hardware will have to change to become more "compiler friendly", maybe specialized programming languages supporting easier vectorization (like Halide, which you mentioned twice) will become more popular (wasn't that already a strength of Fortran?). According to Wikipedia, SIMD became "a mass product" around 15 to 20 years ago (and Halide is less than 3 years old, when I interpret the docs correctly). Compare this to the time compilers for "classic" assembly language needed to become mature. According to this Wikipedia article it took almost 30 years (from ~1970 to the end of the 1990s) until compilers exceeded the performance of human experts (in producing non-parallel machine code). So we may just have to wait more 10 to 15 years until the same happens to SIMD-enabled compilers.
The selling points of the Pimpl pattern are:
- total encapsulation: there are no (private) data members mentioned in the header file of the interface object.
- stability: until you break the public interface (which in C++ includes private members), you'll never have to recompile code that depends on the interface object. This makes the Pimpl a great pattern for libraries that don't want their users to recompile all code on every internal change.
- polymorphism and dependency injection: the implementation or behaviour of the interface object can be easily swapped out at runtime, without requiring dependent code to be recompiled. Great if you need to mock something for an unit test.
To this effect, the classic Pimpl consists of three parts:
An interface for the implementation object, which must be public, and use virtual methods for the interface:
class IFrobnicateImpl
{
public:
virtual int frobnicate(int) const = 0;
};
This interface is required to be stable.
An interface object that proxies to the private implementation. It does not have to use virtual methods. The only allowed member is a pointer to the implementation:
class Frobnicate
{
std::unique_ptr<IFrobnicateImpl> _impl;
public:
explicit Frobnicate(std::unique_ptr<IFrobnicateImpl>&& impl = nullptr);
int frobnicate(int x) const { return _impl->frobnicate(x); }
};
...
Frobnicate::Frobnicate(std::unique_ptr<IFrobnicateImpl>&& impl /* = nullptr */)
: _impl(std::move(impl))
{
if (!_impl)
_impl = std::make_unique<DefaultImplementation>();
}
The header file of this class must be stable.
At least one implementation
The Pimpl then buys us a great deal of stability for a library class, at the cost of one heap allocation and additional virtual dispatch.
How does your solution measure up?
- It does away with encapsulation. Since your members are protected, any subclass can mess with them.
- It does away with interface stability. Whenever you change your data members – and that change is just one refactoring away – you'll have to recompile all dependent code.
- It does away with the virtual dispatch layer, preventing easy swapping of the implementation.
So for every objective of the Pimpl pattern, you fail to fulfil this objective. It is therefore not reasonable to call your pattern a variation of the Pimpl, it is much more an ordinary class. Actually, it's worse than an ordinary class because your member variables are private. And because of that cast which is a glaring point of fragility.
Note that the Pimpl pattern is not always optimal – there's a tradeoff between stability and polymorphism on the one hand, and memory compactness on the other. It is semantically impossible for a language to have both (without JIT compilation). So if you're micro-optimizing for memory compactness, clearly the Pimpl is not a suitable solution for your use case. You'll also probably stop using half the standard library, since these awful string and vector classes involve dynamic memory allocations ;-)
Best Answer
Short answer: no.
Long answer: I assume when you write "size" of an object, you mean size (in bytes) of the member variables, and when you write "complexity", you mean number of members, number of different types used, and so on. Access to an object through a member function will result in IL code which contains
(This is a simplified model, not taking things like inlining into account, but it is sufficient for answering your question). So as you see, in none of these operations the object size itself has an influence. And neither the number and structure of other member functions.
The size only matters from a performance perspective when you copy objects around, when you store them to disk or a database, or when you are accessing the data of so many different objects that your CPU cache is not sufficiently large to hold all the data accessed. The latter might happen when the "number of objects x size of object" exceeds a certain limit, but it depends also on what other things your program does, which other data is involved. So is an accumulated effect, and I doubt it counts as "performance impact caused by access to one of its members through a reference".