 # Designing Strassen's algorithm

In 1969, Strassen shocked the world by showing that two n x n matrices could be multiplied in time asymptotically less than O(n^3). While the recursive construction in his algorithm is very clear, the key gain was made by showing that 2 x 2 matrix multiplication could be performed with only 7 multiplications instead of 8. The latter construction was arrived at by a process of elimination and appears to come out of thin air. Here, we give the simplest and most transparent proof of Strassen's algorithm that we are aware of, using only a simple unitary 2-design and a few easy lines of calculation. Moreover, using basic facts from the representation theory of finite groups, we use 2-designs coming from group orbits to generalize our construction to all n (although the resulting algorithms aren't optimal for n at least 3).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The complexity of matrix multiplication is a central question in computational complexity, bearing on the complexity not only of most problems in linear algebra, but also of myriad combinatorial problems, e.g., various shortest path problems [Zwi02] and bipartite matching problems [San09]. The main question around matrix multiplication is whether two matrices can be multiplied in time for every . The current best upper bound on this exponent is  [LG14], narrowly beating [DS13, Wil12]. The best known lower bound is still only  [Lan14].

Since Strassen’s 1969 paper [Str69], which showed how to beat the standard time algorithm, it has been understood that one way to get asymptotic improvements in algorithms for matrix multiplication is to find algebraic algorithms for multiplying small matrices using only a few multiplications, and then to apply these algorithms recursively.

While the recursive construction in Strassen’s algorithm is very clear—treat a matrix as a matrix each of whose entries is an matrix—the base case, which accounts for how Strassen was able to beat , seems to come out of thin air. Indeed, Strassen was trying to prove, by process of (intelligently exhaustive) elimination, that such an algorithm could not exist (e.g., [Lan08, Remark 1.1.1] or [LR13]). In his paper it is presented as follows, which “one easily sees” [Str69, p. 355] correctly computes matrix multiplication :

 C11=I+IV−V+VIIC12=III+VC21=II+IVC22=I+III−II+VI,

where

 I=(A11+A22)(B11+B22)V=(A11+A12)B22II=(A21+A22)B11VI=(−A11+A21)(B11+B12)III=A11(B12−B22)VII=(A12−A22)(B21+B22)IV=A22(−B11+B21).

While verifying the above by calculation is not difficult—after all, it’s only seven multiplications and four linear combinations—it is rather un-illuminating. In particular, the verification gives no sense of why such a decomposition exists.

In this paper, we give a proof of Strassen’s algorithm that is the most transparent we are aware of. The basic idea is to note that Strassen’s algorithm has a symmetric group of vectors lurking in it, which form what is known as a (unitary) 2-design. Using the representation theory of finite groups, we obtain generalizations to higher dimensions, which suggest further directions to explore in our hunt for efficient algorithms.

### 1.1 Other explanations of Strassen’s algorithm

Landsberg [Lan08, Section 3.8] points out that Strassen’s algorithm could have been anticipated because the border-rank of any tensor is at most seven. Although this may lead one to suspect the existence of an algorithm such as Strassen’s, it does not give an explanation for the fact that the rank (rather than border-rank) of matrix multiplication is at most seven, nor does it give an explanation of Strassen’s particular construction.

Several authors have tried to make Strassen’s construction more transparent through various calculations, e. g., [Gas71, Yuv78, Cha86, Ale97, Pat09, GK00, Min15, CILO16]. While these lend some insight, and some provide proofs that are perhaps easier to remember (and teach) than Strassen’s original presentation, each of them either involves some ad hoc constructions or some un-illuminating calculations, which are often left to the reader. We feel that they do not really offer conceptual explanations for the fact that the rank of is at most 7.

Clausen [Cla88] (see [BCS97, pp. 11–12] for a more widely available explanation in English) showed how one can use group orbits to show that the rank of is at most 7. In fact, Clausen’s beautiful construction was one of the starting points of our investigation. However, that construction relies on a seemingly magical property of a certain multiplication table. More recently, Ikenmeyer and Lysikov [IL17] gave a beautiful explanation of Clausen’s construction, but ultimately their proof for Strassen’s algorithm still relies on the same magical property of the same multiplication table, and it is not immediately obvious how to generalize to all . In contrast, our result easily generalizes to all , and more generally to orbits of any irreducible representation of any finite group.

### 1.2 Related work

This paper is a simplified and self-contained version of Section 5 of [GM16], in which we explored highly symmetric algorithms for multiplying matrices. Recently, there have been several papers analyzing the geometry and symmetries of algebraic algorithms for small matrices [Bur14, Bur15, LR16, LM16, CILO16]. In [GM16], we tried to take this line of research one step further by using symmetries to discover new algorithms for multiplying matrices of small size. While those algorithms did not improve the state-of-the-art bounds on the matrix multiplication exponent, they suggested that we can use group symmetries and group orbits to find new algorithms for matrix multiplication. In addition to their potential value for future endeavors, we believe that these highly symmetric matrix multiplication algorithms are beautiful in their own right, and deserve to be shared simply for their beauty.

Although the method of construction suggested in [GM16], and independently in [CILO16], is more general than this, the constructions we ended up finding in [GM16] were in fact all instances of a single design-based construction yielding multiplications for matrix multiplication. The proof that this construction works is the simplest and most transparent proof of Strassen’s algorithm that we are aware of.

One may also reasonably wonder whether there is any relationship between our group-based construction and the family of group-based constructions suggested by Cohn and Umans [CU03], including the constructions given in [CKSU05] and generalizations in [CU13]

. While there may be a common generalization that captures both methods, at the moment we don’t know of any direct relationship between the two. Indeed, one cannot use the group-theoretic approach of

[CU03] to explain Strassen’s result, even though the constructions of [CKSU05] get a better exponent: The only way to use their approach for the case is to embed into the cyclic group , but Cohn and Umans showed that one could not beat using only abelian groups. (Some of their more complicated constructions can beat in abelian groups, but those involve embedding multiple copies of MM into the same group simultaneously, whereas here we are explicitly talking about embedding a single copy of .)

## 2 Complexity, symmetry, and designs

For general background on algebraic complexity, we refer the reader to the book [BCS97]. Bläser’s survey article [Blä13], in addition to excellent coverage around matrix multiplication, has a nice tutorial on tensors and their basic properties.

In the algebraic setting, since matrix multiplication is a bilinear map, it is known that it can be reformulated as a tensor, and that the algebraic complexity of matrix multiplication is within a factor of 2 of the rank of this tensor. The matrix multiplication tensor for matrices is

 MMabcdef=δaeδbfδcd, (1)

where the indices range from to , and where is the Kronecker delta, if and if . This is also defined by the inner product

 ⟨MM∣A⊗B⊗C⟩=trABC.

Given vector spaces , a vector is said to have tensor rank one if it is a separable tensor, that is, of the form for some . The tensor rank of is the smallest number of rank-one tensors whose sum is . In the case of , we have and .

The matrix multiplication tensor MM is characterized by its symmetries (e.g., [BI11]). That is, up to a constant, it is the unique operator fixed under the following action of : given , we have

 MM=(X⊗Y⊗Z)MM(Z−1⊗X−1⊗Y−1), (2)

where, if the notation isn’t already clear, it will become so in the next equation. To see that MM has this symmetry, note that

 trABC=tr(Z−1AX)(X−1BY)(Y−1CZ).

The fact that MM is the only such operator up to a constant comes from a simple representation-theoretic argument, which generalizes the fact that the only matrices which are invariant under conjugation are scalar multiples of the identity.

This suggests that a good way to search for matrix multiplication algorithms is to start with sums of separable tensors where the sum has some symmetry built in from the beginning. As we will see, one useful kind of symmetry is the following. We say that a set of -dimensional vectors is a unitary 2-design if it has the following two properties:

 ∑v∈Sv=0and1|S|∑v∈S|v⟩⟨v|=1n1, (3)

where

denotes the identity matrix. Here we use the Dirac notation

for the outer product of and , i.e., the matrix whose entry is where denotes the complex conjugate.

The following theorem shows how 2-designs can be used to construct matrix multiplication algorithms.

###### Theorem 2.1.

Let be a unitary 2-design, and let . Then the tensor rank of is at most .

###### Proof.

Let . We will show that the following is a decomposition of :

 MMn=1⊗3+n3s3∑i,j,k % distinct|wi⟩⟨wj−wi|⊗|wj⟩⟨wk−wj|⊗|wk⟩⟨wi−wk|. (4)

Since there are distinct ordered triples , this decomposition has terms.

To prove (4), we use the fact (1) that MM can be written as a kind of twisted tensor product of identity matrices or Kronecker deltas. By the second property in the definition (3) of a 2-design, we have

 MMn=n3s3∑i,j,k|wi⟩⟨wj|⊗|wj⟩⟨wk|⊗|wk⟩⟨wi| (5)

At the same time, the un-twisted version of this identity is

 1⊗3=n3s3∑i,j,k|wi⟩⟨wi|⊗|wj⟩⟨wj|⊗|wk⟩⟨wk|. (6)

Now we expand (4). We can sum over all triples , since if any of these are equal the summand is zero. Then

 ∑i,j,k |wi⟩⟨wj−wi|⊗|wj⟩⟨wk−wj|⊗|wk⟩⟨wi−wk| =∑i,j,k|wi⟩⟨wj|⊗|wj⟩⟨wk|⊗|wk⟩⟨wi| −∑i,j,k[|wi⟩⟨wi|⊗|wj⟩⟨wk|⊗|wk⟩⟨wi|+|wi⟩⟨wj|⊗|wj⟩⟨wj|⊗|wk⟩⟨wi|+|wi⟩⟨wj|⊗|wj⟩⟨wk|⊗|wk⟩⟨wk|] (7) +∑i,j,k[|wi⟩⟨wj|⊗|wj⟩⟨wj|⊗|wk⟩⟨wk|+|wi⟩⟨wi|⊗|wj⟩⟨wk|⊗|wk⟩⟨wk|+|wi⟩⟨wi|⊗|wj⟩⟨wj|⊗|wk⟩⟨wi|] (8) −∑i,j,k|wi⟩⟨wi|⊗|wj⟩⟨wj|⊗|wk⟩⟨wk|

The mixed terms in lines (7) and (8) disappear because each product has an index that appears only once, and the first property in the definition (3) of a 2-design implies that summing over that index gives zero. Combining this with (5) and (6) leaves us with

 n3s3∑i,j,k|wi⟩⟨wj−wi|⊗|wj⟩⟨wk−wj|⊗|wk⟩⟨wi−wk|=MMn−1⊗3,

which completes the proof. ∎

Now, in dimensions the three corners of an equilateral triangle form a 2-design:

 S={(1,0),(−1/2,√3/2),(−1/2,−√3/2)}. (9)

The outer products of these vectors with themselves are

 (1000),(1/4−√3/4−√3/43/4),(1/4√3/4√3/43/4),

and the average of these is . This design (9) has size , in which case Theorem 2.1 shows that has a tensor rank of at most .

The reader might object that we haven’t really re-derived Strassen’s algorithm since our algorithm doesn’t seem to yield the same equations as Strassen’s. However, de Groote [dG78] has shown that all 7-term decompositions of are equivalent up to a change of basis, i.e., an instance of the action (2). Thus the algorithm based on the triangular design (9) is in fact isomorphic to Strassen’s algorithm, and in any case, it gives a conceptual explanation for the fact that has tensor rank . (For the reader wondering about algorithms for matrix multiplication over rings other than , see Section 4.)

## 3 Generalizations to larger n from group orbits

The triangular design (9) has a pleasing symmetry. In this section we show how to find similar designs in higher dimensions as the orbits of group actions. We assume basic familiarity with finite groups, for which we refer the reader to any standard textbook such as [Art91]. We need a few facts from representation theory, which we spell out for completeness in the hopes of making the paper self-contained for a larger audience. Everything we do will be over the complex numbers , but generalizes to other fields, with some modifications.

A representation of a finite group is a vector space together with a group homomorphism , where denotes the general linear group of

, namely, the group of all invertible linear transformations from

to itself. By choosing a basis for , we identify , and each becomes a matrix such that for all .

When the homomorphism is understood from context, we refer to as a representation of . In this case, for and we write instead of . The trivial representation is the identity map on , where for all .

A representation of is called unitary if each

is a unitary matrix. Any representation of a finite group over

is equivalent, up to change of basis, to a unitary representation. In this basis we define the inner product and the norm as usual. Note that and .

Given two representations and of the same group , their direct sum is a representation given by the matrices

 (ρ⊕ρ′)(g)=(ρ(g)00ρ′(g)).

A representation is irreducible if the only subspaces that are sent to themselves by every , i.e., such that for all and , are the trivial subspaces or . In fields of characteristic zero such as or , a representation is a direct sum if and only if it is not irreducible.

Given two representations of , a homomorphism of representations is a linear map that commutes with the action of , in the sense that for all .

###### Lemma (Schur’s Lemma).

If and are two irreducible representations of a group , then every nonzero homomorphism is invertible. In particular, over an algebraically closed field, every homomorphism is a scalar multiple of the identity .

Schur’s Lemma implies that the orbit of any unit-length vector in an irreducible representation is a 2-design in the sense defined above. We include a proof of this classical fact for completeness.

###### Corollary 3.1.

If is a nontrivial irreducible representation of , and with , then the orbit is a 2-design.

###### Proof.

First, the vector always spans a 1-dimensional trivial sub-representation , since . If is irreducible, then either or , but if is nontrivial we cannot have . Thus .

Second, let . Then for any we have

Thus is a homomorphism of representations, and by Schur’s Lemma is a multiple of . We obtain the scaling factor by taking traces:

 1|G|trφ=1|G|∑g|gv|2=|v|2=1=1dimVtr1V,

so . ∎

We can now combine Corollary 3.1 with Theorem 2.1 to produce matrix multiplication algorithms in all dimensions, by considering families of finite groups and their irreducible representations. In particular, we have the following.

###### Corollary 3.2.

For every , the tensor rank of is at most .

###### Proof.

Let be the symmetric group acting by permuting the coordinates on . As a representation of , this splits into a direct sum of the trivial representation (spanned by the all-ones vector) and the so-called “standard representation” of , of dimension (consisting of the vectors whose coordinates sum to zero). Let be the unit vector

 w=1√n(n+1)(n,−1,…,−1)

The orbit of has size , and consists of unit vectors pointing to the corners of a simplex. Now apply Theorem 2.1. ∎

## 4 Future directions

#### Highly symmetric algorithms?

Since any design must span , it has size . Indeed, since its elements sum to zero, they are linearly dependent, so . Thus the simplex designs of Corollary 3.2 are optimal in this context, and applying Theorem 2.1 to larger cannot directly improve the matrix multiplication exponent. This leaves open the question of whether there are other families of highly symmetric algorithms. See [GM16, CILO16] for work in this direction, as well as [Bur14, Bur15, LR16, LM16, IL17].

#### Using t-designs for t>2?

The key fact we used was that any orbit in an irreducible representation of a finite group is a unitary 2-design. Similarly, a unitary -design in a vector space is a set of vectors such that, for every polynomial on of degree at most , the average of over is the same as the average of over the unit sphere in . (Over the reals, these are traditionally called “spherical -designs,” but we are working in complex vector spaces.) Another open question is then

###### Question 4.1.

Can -designs for help us construct efficient matrix multiplication algorithms?

For example, one might hope for a similar construction to Theorem 2.1, in which one could leverage the -design property to get even more terms to cancel.

#### Working over arbitrary rings?

One fact which is obvious from Strassen’s original construction, but not from ours, is that the rank of is at most 7 over any ring. Strassen’s construction works in all rings since it only uses coefficients in ; ours uses coefficients in , so it works in any ring where the elements exist. Note that and exist in any ring of characteristic coprime to , since any such ring contains as a subring, in which and are units. If doesn’t exist in , we can formally adjoin it by considering ; by a standard trick (e.g. [BCS97, Section 15.3]), this implies the same exponent over itself (although it may not actually yield an algorithm for over ).

###### Question 4.2.

Is there a similarly transparent and conceptual proof of Strassen’s result that works over arbitrary rings?

## Acknowledgments

Parts of this project were inspired in 2015 by discussions with Jonah Blasiak, Thomas Church, Henry Cohn, and Chris Umans, via a collaboration funded by the AIM SQuaRE program, with an additional visit hosted by the Santa Fe Institute. J.A.G. was funded by an Omidyar Fellowship from the Santa Fe Institute during this work and by NSF grant DMS-1620484, and C.M. was funded partly by the John Templeton Foundation. C.M. also thanks École Normale Supérieure for providing a visiting position during which some of this work was carried out.

## References

• [Ale97] V. B. Alekseyev. Maximal extensions with simple multiplication for the algebra of matrices of the second order. Disc. Math. Appl., 7:89–101, 1997.
• [Art91] Michael Artin. Algebra. Prentice Hall, Inc., Englewood Cliffs, NJ, 1991.
• [BCS97] Peter Bürgisser, Michael Clausen, and M. Amin Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1997. With the collaboration of Thomas Lickteig.
• [BI11] Peter Bürgisser and Christian Ikenmeyer. Geometric complexity theory and tensor rank. In

STOC ’11: 43rd Annual ACM Symposium on Theory of Computing

, pages 509–518. ACM, New York, 2011.
Preprint of the full version available as arXiv:1011.1350 [cs.CC].
• [Blä13] Markus Bläser. Fast Matrix Multiplication. Number 5 in Graduate Surveys. Theory of Computing Library, 2013.
• [Bur14] Vladimir P. Burichenko. On symmetries of the Strassen algorithm. arXiv:1408.6273 [cs.CC], 2014.
• [Bur15] Vladimir P. Burichenko. Symmetries of matrix multiplication algorithms. I. arXiv:1508.01110 [cs.CC], 2015.
• [Cha86] Philippe Chatelin. On transformations of algorithms to multiply matrices. Inform. Process. Lett., 22(1):1–5, 1986.
• [CILO16] Luca Chiantini, Christian Ikenmeyer, J.M. Landsberg, and Giorgio Ottaviani. The geometry of rank decompositions of matrix multiplication I: matrices. arXiv:1610.08364 [cs.CC], 2016.
• [CKSU05] Henry Cohn, Robert Kleinberg, Balazs Szegedy, and Christopher Umans. Group-theoretic algorithms for matrix multiplication. In FOCS ’05: 46th Annual IEEE Symposium on Foundations of Computer Science, pages 379–388. IEEE Computer Society, 2005. Preprint available as arXiv:math.GR/0511460.
• [Cla88] M. Clausen. Beiträge zum Entwurf schneller Spektraltransformationen. Habilitation, Universität Karlsruhe, 1988.
• [CU03] Henry Cohn and Christopher Umans. A group-theoretic approach to fast matrix multiplication. In FOCS ’03: 44th Annual IEEE Symposium on Foundations of Computer Science, pages 438–449. IEEE Computer Society, 2003. Preprint available as arXiv:math.GR/0307321.
• [CU13] Henry Cohn and Christopher Umans. Fast matrix multiplication using coherent configurations. In SODA ’13: 24th ACM–SIAM Symposium on Discrete Algorithms, pages 1074–1087, 2013. Preprint available as arXiv:1207.6528 [math.NA].
• [dG78] Hans F. de Groote. On varieties of optimal algorithms for the computation of bilinear mappings. II. Optimal algorithms for -matrix multiplication. Theoret. Comput. Sci., 7(2):127–148, 1978.
• [DS13] A. M. Davie and A. J. Stothers. Improved bound for the complexity of matrix multiplication. Proc. Roy. Soc. Edinburgh Sect. A, 143:351–369, 2013.
• [Gas71] N. Gastinel. Sur le calcul des produits de matrices. Numer. Math., 17:222–229, 1971.
• [GK00] Ann Q. Gates and Vladik Kreinovich. Strassen’s algorithm made (somewhat) more natural: A pedagogical remark. Technical Report 502, University of Texas at El Paso, Department of Computer Science, 2000.
• [GM16] Joshua A. Grochow and Cristopher Moore. Matrix multiplication algorithms from group orbits. arXiv:1612.01527 [cs.CC], 2016.
• [IL17] Christian Ikenmeyer and Vladimir Lysikov. Strassen’s matrix multiplication algorithm: a conceptual perspective. arXiv:1708.08083v1 [csDS], 2017.
• [Lan08] J. M. Landsberg. Geometry and the complexity of matrix multiplication. Bull. Amer. Math. Soc. (N.S.), 45(2):247–284, 2008.
• [Lan14] J. M. Landsberg. New lower bounds for the rank of matrix multiplication. SIAM J. Comput., 43:144–149, 2014.
• [LG14] François Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC ’14, pages 296–303, New York, NY, USA, 2014. ACM. Full version available as arXiv:1401.7714 [cs.DS].
• [LM16] J. M. Landsberg and Mateusz Michałek. On the geometry of border rank algorithms for matrix multiplication and other tensors with symmetry. arXiv:1601.08229 [math.AG], 2016.
• [LR13] Richard J. Lipton and Kenneth W. Regan. Volker Strassen: Amazing results. In People, Problems, and Proofs: Essays from Gödel’s Lost Letter: 2010, pages 75–78. Springer Berlin Heidelberg, 2013. Based on blog post available at https://rjlipton.wordpress.com/2010/03/27/fast-matrix-products-and-other-amazing-results/.
• [LR16] J. M. Landsberg and Nicholas Ryder. On the geometry of border rank algorithms for by matrix multiplication. Exper. Math., 2016. In press. Preprint available as arXiv:1509.08323 [cs.NA].
• [Min15] Jacob Minz. Derivation of Strassen’s algorithm for the multiplication of matrices.
• [Pat09] Mike Paterson. Strassen symmetries. Presentation at Leslie Valiant’s 60th birthday celebration, May 2009.
• [San09] Piotr Sankowski. Maximum weight bipartite matching in matrix multiplication time. Theoret. Comput. Sci., 410(44):4480–4488, 2009.
• [Str69] Volker Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354–356, 1969.
• [Wil12] Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith–Winograd. In STOC ’12: 44th Annual ACM Symposium on Theory of Computing, pages 887–898, New York, NY, USA, 2012. ACM.
• [Yuv78] Gideon Yuval. A simple proof of Strassen’s result. Inform. Process. Lett., 7(6):285–286, 1978.
• [Zwi02] Uri Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM, 49(3):289–317 (electronic), 2002.