Barriers for fast matrix multiplication from irreversibility

The determination of the asymptotic algebraic complexity of matrix multiplication, succinctly represented by the matrix multiplication exponent ω, is a central problem in algebraic complexity theory. The best upper bounds on ω, leading to the state-of the-art ω≤ 2.37.., have been obtained via the laser method of Strassen and its generalization by Coppersmith and Winograd. Recent barrier results show limitations for these and related approaches to improve the upper bound on ω. We introduce a new and more general barrier, providing stronger limitations than in previous work. Concretely, we introduce the notion of "irreversibility" of a tensor and we prove (in some precise sense) that any approach that uses an irreversible tensor in an intermediate step (e.g., as a starting tensor in the laser method) cannot give ω = 2. In quantitative terms, we prove that the best upper bound achievable is lower bounded by two times the irreversibility of the intermediate tensor. The quantum functionals and Strassen support functionals give (so far, the best) lower bounds on irreversibility. We provide lower bounds on the irreversibility of key intermediate tensors, including the small and big Coppersmith--Winograd tensors, that improve limitations shown in previous work. Finally, we discuss barriers on the group-theoretic approach in terms of "monomial" irreversibility.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/06/2020

Barriers for rectangular matrix multiplication

We study the algorithmic problem of multiplying large matrices that are ...
10/19/2018

Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication

We study the known techniques for designing Matrix Multiplication algori...
11/16/2021

Larger Corner-Free Sets from Combinatorial Degenerations

There is a large and important collection of Ramsey-type combinatorial p...
02/03/2020

The Computational Complexity of Plethysm Coefficients

In two papers, Bürgisser and Ikenmeyer (STOC 2011, STOC 2013) used an ad...
12/20/2018

Limits on the Universal Method for Matrix Multiplication

In a recent work, Alman and Vassilevska Williams [FOCS 2018, arXiv:1810....
12/27/2017

Tensor network complexity of multilinear maps

We study tensor networks as a model of arithmetic computation for evalua...
04/02/2021

Communication Complexity, Corner-Free Sets and the Symmetric Subrank of Tensors

We develop and apply new combinatorial and algebraic tools to understand...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Determining the asymptotic algebraic complexity of matrix multiplication is a central open problem in algebraic complexity theory. Recent “barrier results” show various limitations for certain approaches to yield fast matrix multiplication algorithms [AFLG15, BCC17a, BCC17b, AW18a, AW18b]. Constructions of fast matrix multiplication algorithms typically (and this is in particular true for the most successful approaches) consist of two components: an efficient reduction of matrix multiplication to an intermediate problem and an efficient algorithm for the intermediate problem. We introduce a general barrier for such constructions based on a new notion called irreversibility, providing stronger limitations than in previous work. The barrier we introduce in this paper is build on the framework of Strassen developed in  [Str87, Str88, Str91, CVZ18].

The asymptotic algebraic complexity of matrix multiplication is succinctly represented by the matrix multiplication exponent , which is the infimum over all real numbers such that matrices can be multiplied with algebraic operations. The state-of-the-art is  and there has been tremendous effort to obtain better upper bounds on  [CW90, Sto10, Wil12, LG14, CU03, CU13].

An intuitive explanation of our barrier is as follows. In the language of tensors, the matrix multiplication exponent  is the optimal “rate of transformation” from the “unit tensor” to the “matrix multiplication tensor”,

(1)

The rate of transformation naturally satisfies a triangle inequality and thus upper bounds on  can be obtained by combining the rate of transformation from the unit tensor to some intermediate tensor and the rate of transformation from the intermediate tensor to the matrix multiplication tensor; this is the two-component approach alluded to earlier,

(2)

We define the irreversibility of the intermediate tensor as the necessary “loss” that will occur when transforming the unit tensor to the intermediate tensor followed by transforming the intermediate tensor back to the unit tensor. It is well-know that the transformation rate from the matrix multiplication tensor to the unit tensor is , so we can extend (2) to

(3)

We thus see that is directly related to the irreversibility of the intermediate tensor, and hence the irreversibility of the intermediate tensor provides limitations on the upper bounds on that can be obtained from (2). In particular, any fixed irreversible intermediate tensor cannot show via (2), since the matrix multiplication tensor is reversible when .

To exemplify our barrier we show that the support functionals [Str91] and quantum functionals [CVZ18] give (so far, the best) lower bounds on the irreversibility of the following families of tensors:

  • [leftmargin=2em]

  • the small Coppersmith–Winograd tensors

  • the big Coppersmith–Winograd tensors

  • the reduced polynomial multiplication tensors

which for small parameters leads to the following explicit barriers:

-barrier 2 2 3 2.02.. 4 2.06.. 5 2.09.. 6 2.12.. 7 2.15.. -barrier 1 2.16.. 2 2.17.. 3 2.19.. 4 2.20.. 5 2.21.. 6 2.23.. -barrier 1 2.17.. 2 2.16.. 3 2.15.. 4 2.15.. 5 2.14.. 6 2.14..

Indeed, as suggested by the values in the above tables, the -barrier and -barrier increase with (converging to 3), whereas the -barrier decreases with (converging to 2).

Compared to Ambainis, Filmus and Le Gall [AFLG15] our barriers are valid for a larger class of approaches (and naturally we obtain lower barriers). Compared to Alman and Williams [AW18b] our barriers are valid for a larger class of approaches but our barriers are also higher. As a variation on our barrier we introduce a “monomial” version. Compared to Blasiak, Church, Cohn, Grochow, Naslund, Sawin and Umans [BCC17a], and Blasiak, Church, Cohn, Grochow and Umans [BCC17b] our monomial barriers are valid for a larger class of approaches. We have not tried to optimise the barriers that we obtain here, but focus instead on introducing the barrier itself.

It will become clear to the reader during the development of our ideas that they not only apply to the problem of fast matrix multiplication, but extend to give barriers for the more general problem of constructing fast rectangular matrix multiplication algorithms or even transformations between arbitrary powers of tensors. Such transformations may represent, for example, asymptotic slocc (stochastic local operations and classical communication) reductions among multipartite quantum states [BPR00, DVC00, VDDMV02, HHHH09].

We define irreversibility in Section 2. In Section 3 we introduce the irreversibility barrier. Finally, in Section 4 we present explicit irreversibility barriers.

2. Irreversibility

We begin by introducing some standard notation and terminology. Then we discuss a useful notion called the relative exponent and we define the irreversibility of a tensor. After that we introduce the monomial versions of these ideas and discuss so-called balanced tensors.

2.1. Standard definitions

We assume familiarity with tensors and with the tensor Kronecker product and direct sum. All our tensors will be 3-tensors over some fixed but arbitrary field . For two tensors and  we write  and say restricts to  if there are linear maps such that . For we define the diagonal tensor (also called the rank- unit tensor) . The tensor rank of  is defined as  (this coincides with the definition that  is the smallest size of any decomposition of into a sum of simple tensors) and the subrank of  is defined as . The asymptotic rank of  is defined as

(4)

and the asymptotic subrank of  is defined as

(5)

The above limits exist and equal the respective infimum and supremum by Fekete’s lemma. For  the matrix multiplication tensor is defined as

(6)

The matrix multiplication exponent is defined as . The meaning of in terms of algorithms is: for any there is an algorithm that for any multiplies two matrices using scalar additions and multiplications. The difficulty of determining the asymptotic rank of is to be contrasted with the situation for the asymptotic subrank; to put it in Strassen’s words: Unlike the cynic, who according to Oscar Wilde knows the price of everything and the value of nothing, we can determine the asymptotic value of precisely [Str88],

(7)

2.2. Relative exponent

For a clean exposition of our barrier we will use the notion of relative exponent, which we will define in this section. This notion is inspired by the notion of rate from information theory and alternatively can be seen as a versatile version of the notion of the asymptotic preorder for tensors of Strassen. In the context of tensors, the relative exponent previously appeared in [VC15].

Assumption 1.

To avoid irrelevant technicalities, we will from now on, without further mentioning, only consider tensors that are not of the form .

Definition 2.

For two tensors and we define the relative exponent from to  as

(8)
(9)

The limit is a supremum by Fekete’s lemma. Let us briefly relate the relative exponent to the basic notions and results stated earlier. The reader verifies directly that the identities

(10)
(11)

hold. By definition of the matrix multiplication exponent holds

(12)

We know from (7) that

(13)

The relative exponent has the following two basic properties that the reader verifies directly.

Proposition 3.

Let , and be tensors.

  1. [label=()]

  2. .

  3.  (triangle inequality).

2.3. Irreversibility.

Our barrier framework relies crucially on the irreversibility of a tensor, a new notion that we define now.

Definition 4.

We define the irreversibility of a tensor as the product of the relative exponent from to and the relative exponent from to , i.e.

(14)

Thus measures the extent to which the asymptotic conversion from to is irreversible, explaining the name. Equivalently, the irreversibility is the ratio of the logarithms of the asymptotic rank and the asymptotic subrank, i.e.

(15)

From the basic properties of the relative exponent (Proposition 3) follows directly the inequality .

Proposition 5.

For any tensor holds that

(16)
Definition 6.

Let be a tensor.

  • If , then we say that is reversible.

  • If , then we say that  is irreversible.

For example, for any the diagonal tensor is reversible. In fact, we do not know of any other reversible tensors.

For the matrix multiplication tensor we have (using (13)). Thus if , then is reversible (and also any other ). As we will see in Section 3, this is ultimately the source of our barrier.

Irreversible tensors exist. For example, is irreversible. Namely, it is well-known that and that [Str91, Theorem 6.7], so . In Section 4 we will compute lower bounds on the irreversibility of the small and big Coppersmith–Winograd tensors (that play a crucial role in the best upper bounds on ).

2.4. Monomial relative exponent and monomial irreversibility

The following restrained version of relative exponent and irreversibility will be relevant. For two tensors and  we write  and say monomially restricts to  if there are linear maps , the corresponding matrices of which are generalised sub-permutation matrices in the standard basis, such that [Str87, Section 6]. Replacing the preorder by in Section 2 gives the notions of monomial subrank , monomial asymptotic subrank and monomial relative exponent . (For simplicity we will use monomial restriction here, but our results will also hold with replaced by monomial degeneration defined in [Str87, Section 6].) Note that the notions and only depend on the support of the tensor, and not on the particular values of the nonzero coefficients. We define the monomial irreversibility of as the product of the (normal) relative exponent from to and the monomial relative exponent from to ,

(17)

Equivalently, we have

(18)

(This notion may depend on the tensor and not only on the support.)

Proposition 7.

Let , and be tensors.

  1. [label=()]

  2. .

  3.  (triangle inequality).

  4. .

  5. .

Definition 8.

Let be a tensor.

  • If , then we say that is monomially reversible.

  • If , then we say that  is monomially irreversible.

There exist tensors that are reversible and monomially irreversible. For example, let be the structure tensor of the algebra in the natural basis,

(19)

Then we have , and (this is proven in [EG17, Tao16], see also [CVZ18] for the connection to [Str91]), so that and 

With regards to matrix multiplication, the standard construction for (13) in fact shows that

(20)

2.5. Balanced tensors

We finish this section with a general comment on upper bounds on irreversibility. A tensor with is called balanced if the corresponding maps , and (called flattenings) are full-rank and for each there is an element such that has full-rank [Str88, page 121]. For any tensor space with cubic format over an algebraically closed field , being balanced is a generic condition, i.e. almost all elements in such a space are balanced. Balanced tensors are called 1-generic tensors in [LM17]. Let be balanced. Then [Str88, Proposition 3.6]

(21)
(22)

and so

(23)

If moreover , then

(24)

3. Barriers through irreversibility

With the new notion of irreversibility available, we present a barrier for approaches to upper bound via an intermediate tensor .

3.1. The irreversibility barrier

For any tensor the inequality

(25)

holds by the triangle inequality. Any such approach to upper bound respects the following barrier in terms of the irreversibility of .

Theorem 9.

For any tensor holds

(26)
Proof.

By the triangle inequality (Proposition 3),

(27)

Therefore, using the fact from (13), we have

(28)

This proves the claim. ∎

Theorem 9, in particular, implies that if , then , i.e. one cannot prove via any fixed irreversible intermediate tensor. (Of course one can consider sequences of intermediate tensors with irreversibility converging to 1.)

3.2. Better barriers through more structure

Naturally, we should expect that imposing more structure on the approach to upper bound leads to stronger barriers. In this section we impose that the final step of the approach is an application of the Schönhage -theorem. The Schönhage -theorem implies that for any  and any tensor holds that

(29)

We prove the following barrier in terms of and the irreversibility of .

Theorem 10.

For any tensor and holds

(30)
Proof.

By the triangle inequality,

(31)

Therefore,

(32)

Subtracting , dividing by and using that (Proposition 5) gives the barrier

(33)

This proves the claim. ∎

As a corollary of the above theorem we present a barrier on any approach of the following form. The Schönhage -theorem implies that for any and any tensor holds

(34)

We prove the following barrier in terms of , and the irreversibility of the cyclically symmetrized .

Corollary 11.

For any tensor and and holds

(35)
(36)

One verifies that . If is cyclically symmetric, then and we have the equality .

Proof.

One verifies directly that and

(37)

Note that we are using rational powers here, which is justified by taking large enough powers of the relevant tensors. Using both inequalities and then applying Theorem 10 gives

(38)
(39)

This proves the statement of the theorem. ∎

Remark 12.

For cyclically symmetric tensors  our Corollary 11 implies the lower bound

(40)

on the parameter (and the “universal” version ) studied in [AW18b], which is a significant improvement over the barrier

(41)

proven in [AW18b, Theorem IV.1].

3.3. Better barriers through monomial irreversibility

Finally, we impose as an extra constraint that the transformation from the intermediate tensor to the matrix multiplication tensor happens via monomial restriction (Section 2.4), i.e. we consider the approach

(42)

and the more structured approaches

(43)

and

(44)

The proofs in the previous sections can be directly adapted to prove:

Theorem 13.

For any tensor holds

(45)
Theorem 14.

For any tensor and holds

(46)
Corollary 15.

For any tensor and and holds

(47)
(48)

4. Explicit irreversibility lower bounds

We have seen how barriers arise from lower bounds on irreversibility. In this section we compute lower bounds on the irreversibility of two well-known intermediate tensors that play a crucial role in the best upper bounds on : the small and big Coppersmith–Winograd tensors.

4.1. Irreversibility and the asymptotic spectrum of tensors

We begin with a general discussion of how to compute irreversibility. The asymptotic spectrum of tensors is the set of -monotone semiring homomorphisms from the semiring of tensors (with tensor product and direct sum as multiplication and addition) to the nonnegative reals,

(49)

Strassen proves in [Str88] that and and he also proves (implicitly) that . From this we directly obtain:

Proposition 16.

Let be a tensor. Then

(50)

In an ideal world we would know and use it to compute (or better, we would use it to compute ). In practice we currently only have partial knowledge of . This partial knowledge is easiest to describe in terms of the best known lower bounds on and the best known upper bounds on . The best known lower bounds on  are simply the matrix ranks of each of the three flattenings of  as described in Section 2.5. For arbitrary fields, the best general upper bounds on that we are aware of are the Strassen upper support functionals from [Str91], which we will define and use in the next section. They relate asymptotically to slice rank via [CVZ18]

(51)

We are not aware of any example for which any of the inequalities in (51) is strict. For oblique tensors the right inequality is an equality [CVZ18] and for tight tensors both inequalities are equalities [Str91]. We thus have:

Proposition 17.

Let be a tensor. Then

(52)

For complex tensors we have a deeper understanding of the theory of upper bounds on the asymptotic subrank, via the quantum functionals introduced in [CVZ18]. The quantum functionals satisfy and their minimum equals the asymptotic slice rank [CVZ18], i.e.

(53)

For free tensors the right inequality in (53) is an equality [CVZ18]. We thus have:

Proposition 18.

If is complex, then

(54)

4.2. Irreversibility of Coppersmith–Winograd tensors

We now compute lower bounds for the irreversibility of the Coppersmith–Winograd tensors. As mentioned, we will use the support functionals of Strassen [Str91] in our computation to upper bound the asymptotic subrank. For any with the upper support functional is defined as

(55)
(56)

where the minimum is over all tensors isomorphic to

, the maximum is over all probability distributions on the support of

in the standard basis, and denotes the Shannon entropy of the th marginal of . Strassen proves in [Str91] that .

(Besides from the Strassen support functionals, upper bound on the asymptotic subrank of complex tensors may be obtained from the quantum functionals. For the tensors in Theorem 19 and Theorem 21, however, this will give the same bound, since these tensors are free [CVZ18, Section 4.3].)

Theorem 19 (Small Coppersmith–Winograd tensors [Cw90, Section 6]).

For the small Coppersmith–Winograd tensor

(57)

the lower bound

(58)

holds.

Proof.

The rank of each flattening of equals . Therefore, . To upper bound the asymptotic subrank one can upper bound the Strassen upper support functional with  as in [CVZ18, Example 4.22] by

(59)

We find that

(60)

This proves the theorem. ∎

Remark 20.

If , then the right-hand side of (58) is at least  See the table in Section 1 for more values. If , however, then the right-hand side of (58) equals 2. Theorem 19 thus does not rule out using to prove that . Indeed, as observed in [CW90, Section 11]), if , then .

Currently, the best upper bound we have on is . If , then instead of (58) we get the better barrier

(61)

The right-hand side of (61) has a minimum value of

(62)

attained at .

Theorem 21 (Big Coppersmith–Winograd tensors [Cw90, Section 7]).

For the big Coppersmith–Winograd tensor

(63)

the lower bound

(64)

holds, where

(65)
Proof.

The rank of each flattening of equals , which coincides with the well-known border rank upper bound . Therefore, .

To upper bound the asymptotic subrank we use the Strassen upper support functional with . In the standard basis, the support of is the set

(66)

The symmetry implies that we can assign probability to each of , and , and  to , and . This leads to an average marginal entropy of  as defined in the theorem statement. The maximum of  is attained at

(67)

This proves the theorem. ∎

Remark 22.

The lowest value of the right-hand side of (64) is attained at . See the table in Section 1 for more values.

Remark 23.

A lower bound on the irreversibility of the tensors mentioned in the introduction follows directly from the results in [Str91, Theorem 6.7].

4.3. Monomial irreversibility of structure tensors of finite group algebras

We now discuss irreversibility and monomial irreversibility in the context of the group-theoretic approach developed in [CU03]. This approach produces upper bounds on  via intermediate tensors that are structure tensors of complex group algebras of finite groups. Let denote the structure tensor of the complex group algebra of the finite group , in the standard basis. The group-theoretic approach (in particular [CU03, Theorem 4.1]) produces an inequality of the form

(68)

which ultimately leads to the bound

(69)

where  and  are the monomial restriction and monomial relative exponent defined in Section 2.4.

Now the monomial irreversibility barrier from Section 3.3 comes into play. Upper bounds on the monomial asymptotic subrank of have (using different terminology) been obtained in [BCC17a, BCC17b, Saw17]. Those upper bounds imply that is monomially irreversible for every nontrivial finite group . Together with our results in Section 3.3 and the fact that the tensor  is symmetric up to a permutation of the basis of one of the tensor legs, this directly leads to nontrivial barriers for the left-hand side of (69) for any fixed nontrivial group , thus putting the work of [BCC17a, BCC17b, Saw17] in a broader context. We have not tried to numerically optimise the monomial irreversibility barriers for group algebras.

Finally we mention that the irreversibility barrier (rather than the monomial irreversibility barrier) does not rule out obtaining  via . Namely, is isomorphic to a direct sum of matrix multiplication tensors, and, therefore, . Thus, if , then  is reversible,

Acknowledgements

MC acknowledges financial support from the European Research Council (ERC Grant Agreement No. 337603) and VILLUM FONDEN via the QMATH Centre of Excellence (Grant No. 10059). PV acknowledges support from the Hungarian National Research, Development and Innovation Office (NKFIH) grant no. K124152 and the Hungarian Academy of Sciences Lendület-Momentum grant for Quantum Information Theory, no. 96 141. This research was supported by the National Research, Development and Innovation Fund of Hungary within the Quantum Technology National Excellence Program (Project Nr. 2017-1.2.1-NKP-2017-00001). This material is based upon work supported by the National Science Foundation under Grant No. DMS-1638352.

References