Designing Strassen's algorithm

08/30/2017 ∙ by Joshua A. Grochow, et al. ∙ University of Colorado Boulder Santa Fe Institute 0

In 1969, Strassen shocked the world by showing that two n x n matrices could be multiplied in time asymptotically less than O(n^3). While the recursive construction in his algorithm is very clear, the key gain was made by showing that 2 x 2 matrix multiplication could be performed with only 7 multiplications instead of 8. The latter construction was arrived at by a process of elimination and appears to come out of thin air. Here, we give the simplest and most transparent proof of Strassen's algorithm that we are aware of, using only a simple unitary 2-design and a few easy lines of calculation. Moreover, using basic facts from the representation theory of finite groups, we use 2-designs coming from group orbits to generalize our construction to all n (although the resulting algorithms aren't optimal for n at least 3).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The complexity of matrix multiplication is a central question in computational complexity, bearing on the complexity not only of most problems in linear algebra, but also of myriad combinatorial problems, e.g., various shortest path problems [Zwi02] and bipartite matching problems [San09]. The main question around matrix multiplication is whether two matrices can be multiplied in time for every . The current best upper bound on this exponent is  [LG14], narrowly beating [DS13, Wil12]. The best known lower bound is still only  [Lan14].

Since Strassen’s 1969 paper [Str69], which showed how to beat the standard time algorithm, it has been understood that one way to get asymptotic improvements in algorithms for matrix multiplication is to find algebraic algorithms for multiplying small matrices using only a few multiplications, and then to apply these algorithms recursively.

While the recursive construction in Strassen’s algorithm is very clear—treat a matrix as a matrix each of whose entries is an matrix—the base case, which accounts for how Strassen was able to beat , seems to come out of thin air. Indeed, Strassen was trying to prove, by process of (intelligently exhaustive) elimination, that such an algorithm could not exist (e.g., [Lan08, Remark 1.1.1] or [LR13]). In his paper it is presented as follows, which “one easily sees” [Str69, p. 355] correctly computes matrix multiplication :

where

While verifying the above by calculation is not difficult—after all, it’s only seven multiplications and four linear combinations—it is rather un-illuminating. In particular, the verification gives no sense of why such a decomposition exists.

In this paper, we give a proof of Strassen’s algorithm that is the most transparent we are aware of. The basic idea is to note that Strassen’s algorithm has a symmetric group of vectors lurking in it, which form what is known as a (unitary) 2-design. Using the representation theory of finite groups, we obtain generalizations to higher dimensions, which suggest further directions to explore in our hunt for efficient algorithms.

1.1 Other explanations of Strassen’s algorithm

Landsberg [Lan08, Section 3.8] points out that Strassen’s algorithm could have been anticipated because the border-rank of any tensor is at most seven. Although this may lead one to suspect the existence of an algorithm such as Strassen’s, it does not give an explanation for the fact that the rank (rather than border-rank) of matrix multiplication is at most seven, nor does it give an explanation of Strassen’s particular construction.

Several authors have tried to make Strassen’s construction more transparent through various calculations, e. g., [Gas71, Yuv78, Cha86, Ale97, Pat09, GK00, Min15, CILO16]. While these lend some insight, and some provide proofs that are perhaps easier to remember (and teach) than Strassen’s original presentation, each of them either involves some ad hoc constructions or some un-illuminating calculations, which are often left to the reader. We feel that they do not really offer conceptual explanations for the fact that the rank of is at most 7.

Clausen [Cla88] (see [BCS97, pp. 11–12] for a more widely available explanation in English) showed how one can use group orbits to show that the rank of is at most 7. In fact, Clausen’s beautiful construction was one of the starting points of our investigation. However, that construction relies on a seemingly magical property of a certain multiplication table. More recently, Ikenmeyer and Lysikov [IL17] gave a beautiful explanation of Clausen’s construction, but ultimately their proof for Strassen’s algorithm still relies on the same magical property of the same multiplication table, and it is not immediately obvious how to generalize to all . In contrast, our result easily generalizes to all , and more generally to orbits of any irreducible representation of any finite group.

1.2 Related work

This paper is a simplified and self-contained version of Section 5 of [GM16], in which we explored highly symmetric algorithms for multiplying matrices. Recently, there have been several papers analyzing the geometry and symmetries of algebraic algorithms for small matrices [Bur14, Bur15, LR16, LM16, CILO16]. In [GM16], we tried to take this line of research one step further by using symmetries to discover new algorithms for multiplying matrices of small size. While those algorithms did not improve the state-of-the-art bounds on the matrix multiplication exponent, they suggested that we can use group symmetries and group orbits to find new algorithms for matrix multiplication. In addition to their potential value for future endeavors, we believe that these highly symmetric matrix multiplication algorithms are beautiful in their own right, and deserve to be shared simply for their beauty.

Although the method of construction suggested in [GM16], and independently in [CILO16], is more general than this, the constructions we ended up finding in [GM16] were in fact all instances of a single design-based construction yielding multiplications for matrix multiplication. The proof that this construction works is the simplest and most transparent proof of Strassen’s algorithm that we are aware of.

One may also reasonably wonder whether there is any relationship between our group-based construction and the family of group-based constructions suggested by Cohn and Umans [CU03], including the constructions given in [CKSU05] and generalizations in [CU13]

. While there may be a common generalization that captures both methods, at the moment we don’t know of any direct relationship between the two. Indeed, one cannot use the group-theoretic approach of 

[CU03] to explain Strassen’s result, even though the constructions of [CKSU05] get a better exponent: The only way to use their approach for the case is to embed into the cyclic group , but Cohn and Umans showed that one could not beat using only abelian groups. (Some of their more complicated constructions can beat in abelian groups, but those involve embedding multiple copies of MM into the same group simultaneously, whereas here we are explicitly talking about embedding a single copy of .)

2 Complexity, symmetry, and designs

For general background on algebraic complexity, we refer the reader to the book [BCS97]. Bläser’s survey article [Blä13], in addition to excellent coverage around matrix multiplication, has a nice tutorial on tensors and their basic properties.

In the algebraic setting, since matrix multiplication is a bilinear map, it is known that it can be reformulated as a tensor, and that the algebraic complexity of matrix multiplication is within a factor of 2 of the rank of this tensor. The matrix multiplication tensor for matrices is

(1)

where the indices range from to , and where is the Kronecker delta, if and if . This is also defined by the inner product

Given vector spaces , a vector is said to have tensor rank one if it is a separable tensor, that is, of the form for some . The tensor rank of is the smallest number of rank-one tensors whose sum is . In the case of , we have and .

The matrix multiplication tensor MM is characterized by its symmetries (e.g., [BI11]). That is, up to a constant, it is the unique operator fixed under the following action of : given , we have

(2)

where, if the notation isn’t already clear, it will become so in the next equation. To see that MM has this symmetry, note that

The fact that MM is the only such operator up to a constant comes from a simple representation-theoretic argument, which generalizes the fact that the only matrices which are invariant under conjugation are scalar multiples of the identity.

This suggests that a good way to search for matrix multiplication algorithms is to start with sums of separable tensors where the sum has some symmetry built in from the beginning. As we will see, one useful kind of symmetry is the following. We say that a set of -dimensional vectors is a unitary 2-design if it has the following two properties:

(3)

where

denotes the identity matrix. Here we use the Dirac notation

for the outer product of and , i.e., the matrix whose entry is where denotes the complex conjugate.

The following theorem shows how 2-designs can be used to construct matrix multiplication algorithms.

Theorem 2.1.

Let be a unitary 2-design, and let . Then the tensor rank of is at most .

Proof.

Let . We will show that the following is a decomposition of :

(4)

Since there are distinct ordered triples , this decomposition has terms.

To prove (4), we use the fact (1) that MM can be written as a kind of twisted tensor product of identity matrices or Kronecker deltas. By the second property in the definition (3) of a 2-design, we have

(5)

At the same time, the un-twisted version of this identity is

(6)

Now we expand (4). We can sum over all triples , since if any of these are equal the summand is zero. Then

(7)
(8)

The mixed terms in lines (7) and (8) disappear because each product has an index that appears only once, and the first property in the definition (3) of a 2-design implies that summing over that index gives zero. Combining this with (5) and (6) leaves us with

which completes the proof. ∎

Now, in dimensions the three corners of an equilateral triangle form a 2-design:

(9)

The outer products of these vectors with themselves are

and the average of these is . This design (9) has size , in which case Theorem 2.1 shows that has a tensor rank of at most .

The reader might object that we haven’t really re-derived Strassen’s algorithm since our algorithm doesn’t seem to yield the same equations as Strassen’s. However, de Groote [dG78] has shown that all 7-term decompositions of are equivalent up to a change of basis, i.e., an instance of the action (2). Thus the algorithm based on the triangular design (9) is in fact isomorphic to Strassen’s algorithm, and in any case, it gives a conceptual explanation for the fact that has tensor rank . (For the reader wondering about algorithms for matrix multiplication over rings other than , see Section 4.)

3 Generalizations to larger from group orbits

The triangular design (9) has a pleasing symmetry. In this section we show how to find similar designs in higher dimensions as the orbits of group actions. We assume basic familiarity with finite groups, for which we refer the reader to any standard textbook such as [Art91]. We need a few facts from representation theory, which we spell out for completeness in the hopes of making the paper self-contained for a larger audience. Everything we do will be over the complex numbers , but generalizes to other fields, with some modifications.

A representation of a finite group is a vector space together with a group homomorphism , where denotes the general linear group of

, namely, the group of all invertible linear transformations from

to itself. By choosing a basis for , we identify , and each becomes a matrix such that for all .

When the homomorphism is understood from context, we refer to as a representation of . In this case, for and we write instead of . The trivial representation is the identity map on , where for all .

A representation of is called unitary if each

is a unitary matrix. Any representation of a finite group over

is equivalent, up to change of basis, to a unitary representation. In this basis we define the inner product and the norm as usual. Note that and .

Given two representations and of the same group , their direct sum is a representation given by the matrices

A representation is irreducible if the only subspaces that are sent to themselves by every , i.e., such that for all and , are the trivial subspaces or . In fields of characteristic zero such as or , a representation is a direct sum if and only if it is not irreducible.

Given two representations of , a homomorphism of representations is a linear map that commutes with the action of , in the sense that for all .

Lemma (Schur’s Lemma).

If and are two irreducible representations of a group , then every nonzero homomorphism is invertible. In particular, over an algebraically closed field, every homomorphism is a scalar multiple of the identity .

Schur’s Lemma implies that the orbit of any unit-length vector in an irreducible representation is a 2-design in the sense defined above. We include a proof of this classical fact for completeness.

Corollary 3.1.

If is a nontrivial irreducible representation of , and with , then the orbit is a 2-design.

Proof.

First, the vector always spans a 1-dimensional trivial sub-representation , since . If is irreducible, then either or , but if is nontrivial we cannot have . Thus .

Second, let . Then for any we have

Thus is a homomorphism of representations, and by Schur’s Lemma is a multiple of . We obtain the scaling factor by taking traces:

so . ∎

We can now combine Corollary 3.1 with Theorem 2.1 to produce matrix multiplication algorithms in all dimensions, by considering families of finite groups and their irreducible representations. In particular, we have the following.

Corollary 3.2.

For every , the tensor rank of is at most .

Proof.

Let be the symmetric group acting by permuting the coordinates on . As a representation of , this splits into a direct sum of the trivial representation (spanned by the all-ones vector) and the so-called “standard representation” of , of dimension (consisting of the vectors whose coordinates sum to zero). Let be the unit vector

The orbit of has size , and consists of unit vectors pointing to the corners of a simplex. Now apply Theorem 2.1. ∎

4 Future directions

Highly symmetric algorithms?

Since any design must span , it has size . Indeed, since its elements sum to zero, they are linearly dependent, so . Thus the simplex designs of Corollary 3.2 are optimal in this context, and applying Theorem 2.1 to larger cannot directly improve the matrix multiplication exponent. This leaves open the question of whether there are other families of highly symmetric algorithms. See [GM16, CILO16] for work in this direction, as well as [Bur14, Bur15, LR16, LM16, IL17].

Using -designs for ?

The key fact we used was that any orbit in an irreducible representation of a finite group is a unitary 2-design. Similarly, a unitary -design in a vector space is a set of vectors such that, for every polynomial on of degree at most , the average of over is the same as the average of over the unit sphere in . (Over the reals, these are traditionally called “spherical -designs,” but we are working in complex vector spaces.) Another open question is then

Question 4.1.

Can -designs for help us construct efficient matrix multiplication algorithms?

For example, one might hope for a similar construction to Theorem 2.1, in which one could leverage the -design property to get even more terms to cancel.

Working over arbitrary rings?

One fact which is obvious from Strassen’s original construction, but not from ours, is that the rank of is at most 7 over any ring. Strassen’s construction works in all rings since it only uses coefficients in ; ours uses coefficients in , so it works in any ring where the elements exist. Note that and exist in any ring of characteristic coprime to , since any such ring contains as a subring, in which and are units. If doesn’t exist in , we can formally adjoin it by considering ; by a standard trick (e.g. [BCS97, Section 15.3]), this implies the same exponent over itself (although it may not actually yield an algorithm for over ).

Question 4.2.

Is there a similarly transparent and conceptual proof of Strassen’s result that works over arbitrary rings?

Acknowledgments

Parts of this project were inspired in 2015 by discussions with Jonah Blasiak, Thomas Church, Henry Cohn, and Chris Umans, via a collaboration funded by the AIM SQuaRE program, with an additional visit hosted by the Santa Fe Institute. J.A.G. was funded by an Omidyar Fellowship from the Santa Fe Institute during this work and by NSF grant DMS-1620484, and C.M. was funded partly by the John Templeton Foundation. C.M. also thanks École Normale Supérieure for providing a visiting position during which some of this work was carried out.

References