Algorithms – Performant Algorithm for 4×4 Matrix Multiplication of Affine Transformations

algorithmsmatrix

I am wondering what is a good, performant algorithm for matrix multiplication of 4×4 matrices. I am implementing some affine transformations and I am aware that there are several algorithms for efficient matrix multiplication, like Strassen. But are there some algorithms that are especially efficient for matrices that small? Most sources I had a glance at are looking at which are asymptotically the most efficient.

Best Answer

Wikipedia lists four algorithms for matrix multiplication of two nxn matrices.

The classic one that a programmer would write is O(n3) and is listed as the "Schoolbook matrix multiplication". Yep. O(n3) is a bit of a hit. Lets look at the next best one.

The Strassen algorithim is O(n2.807). This one would work - it has some restrictions to it (such as the size is a power of two) and it has a caveat in the description:

Compared to conventional matrix multiplication, the algorithm adds a considerable O(n2) workload in addition/subtractions; so below a certain size, it will be better to use conventional multiplication.

For those who are interested in this algorithm and its origins, looking at How did Strassen come up with his matrix multiplication method? can be a good read. It gives a hint at the complexity of that initial O(n2) workload that is added and why this would be more expensive than just doing the classic multiplication.

So it really is O(n2 + n2.807) with that bit about lower exponent n being ignored when writing out big O. It appears that if you are working on a nice 2048x2048 matrix, this could be useful. For a 4x4 matrix, its probably going to find it slower as that overhead eats up all the other time.

And then there's the Coppersmith–Winograd algorithm which is O(n2.373) with quite a few bits of improvements. It also comes with a caveat:

The Coppersmith–Winograd algorithm is frequently used as a building block in other algorithms to prove theoretical time bounds. However, unlike the Strassen algorithm, it is not used in practice because it only provides an advantage for matrices so large that they cannot be processed by modern hardware.

So, its better when you are working on super large matrixes, but again, not useful for a 4x4 matrix.

This is again reflected in wikipedia's page in Matrix multiplication: Sub-cubic algorithms which gets at the why of things running faster:

Algorithms exist that provide better running times than the straightforward ones. The first to be discovered was Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication". It is based on a way of multiplying two 2 × 2-matrices which requires only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this recursively gives an algorithm with a multiplicative cost of O(nlog27) ≈ O(n2.807). Strassen's algorithm is more complex, and the numerical stability is reduced compared to the naïve algorithm, but it is faster in cases where n > 100 or so and appears in several libraries, such as BLAS.

And that gets to the core of why the algorithms are faster - you trade off some numerical stability and some additional setup. That additional setup for a 4x4 matrix is much more than the cost of doing the more multiplication.

And now, to answer your question:

But are there some algorithms that are especially efficient for matrices that small?

No, there are no algorithms that are optimized for 4x4 matrix multiplication because the O(n3) one works quite reasonably until you start finding that you are willing to take a big hit for overhead. For your specific situation, there may be some overhead that you could incur knowing specific things beforehand about your matrixes (like how much some data will be reused), but really the easiest thing to do is write good code for the O(n3) solution, let the compiler handle it, and profile it later on to see if you actually have the code being the slow spot in the matrix multiplication.

Related on Math.SE: Minimal number of multiplications required to invert a 4x4 matrix

Related Topic