Strassen’s Algorithm – How the Matrix Multiplication Method Was Developed

algorithmsmatrix

The famous Strassen's matrix multiplication algorithm is a real treat for us, as it reduces the time complexity from the traditional O(n3) to O(n2.8).

But of all the resources I have gone through, even Cormen and Steven Skienna's book, they clearly do not state of how Strassen thought about it.

What is the rationale of Strassen's matrix multiplication algorithm? Is this a lucky accident or is there something deeper in it?

Best Answer

Apart from Strassen, nobody is able to tell you how Strassen has got his idea. Howeber¹, I can tell you, how you could have found that formula yourself—provided that you are interested in algebraic geometry and representation theory. This also gives you the tools to show that Strassen's formula is as good as it can, or more precisely, that there is no formula computing the product of two 2×2 matrices that uses fewer than 7 multiplications.

Since you are interested by matrices I assume you know basic linear algebra and will be a bit blurry for the more advanced details.

First let be E the set of all linear maps from a plane to a plane. This is basically the set of all 2×2 matrices, but we forget about a particular coordinate system—because, if there were a better coordinate system than the “default one” we could have interest in using it for matrix multiplication. We also denote by E† the dual space of E and by X = P(E⊗E†⊗E†) the projective space associated to the tensor product E⊗E†⊗E†.

An element of X = P(E⊗E†⊗E†) of the special form [c⊗α⊗β] can be interpreted as an elementary operation on matrices, which, in some appopriate coordinate systems, reads a coefficient of a matrix A and a coefficient of a matrix B and writes the product of these coefficients in some matrix C. A general element of X is a combination of these elementary operations, so the product π of two matrices, understood as a map from P(E)×P(E) to P(E), is a point in X.

The usual matrix product formula and Strassen's formula can be expressed as combinations of these linear operations, so let me denote by W₁ the set of these elementary operations [c⊗α⊗β] and let me describe geometrically their combinations.

Let W₂ be the variety of secants of W₁ in X. It is obtained by taking the (closure of the) union of all lines going through two (generic) points of W₁. We can think of a it as of the set of all combinations of two elemetary operations.

Let W₃ be the variety of secant planes of W₁ in X. It is obtained by taking the (closure of the) union of all planes going through three (generic) points of W₁. We can think of a it as of the set of all combinations of three elemetary operations.

Similarly, we define secant varieties for greater indices. Note that these varieties grow larger and larger, that is W₁⊂W₂⊂W₃⊂⋯ Hence the classical matrix product formula shows that the product of matrices is a point of W₈. Actually

PROPOSITION(Strassen) — The product of matrices π lies in W₇.

As far as I know, Strassen did not put things that way, however this is a geometric point of view on this question. This point of view is very useful, because it also lets you prove that Strassen's formula is the best, that is, that π does not lie in W₆. Geometric methods developped here can also be used for a broader range of problems.

I hope, I caught your curiosity. You can go further by reading this article by Landsberg and Manivel:

http://arxiv.org/abs/math/0601097

¹ I will not fix this typo, because I caught a cold.