1 Introduction
Given two large matrices, how many arithmetic operations, plus and times, are required to compute their matrix product?
The high school algorithm for multiplying two square matrices of shape costs roughly arithmetic operations. On the other hand, we know that at least operations are required. Denoting by the optimal exponent of in the number of operations required by any arithmetic algorithm, we thus have . What is the value of ? Since Strassen published his matrix multiplication algorithm in 1969 we know that [Str69]. Over the years, more constructions of faster matrix multiplication algorithms, relying on insights involving direct sum algorithms, approximative algorithms and asymptotic induced matchings, lead to the current upper bound [CW90, Sto10, Wil12, LG14].
In applications the matrices to be multiplied are often very rectangular instead of square; see the examples in [LU18]. For any nonnegative real , given an matrix and an matrix, how many arithmetic operations are required? Denoting, similarly as in the square case, by the optimal exponent of in the number of operations required by any arithmetic algorithm, we a priori have the bounds . (Formally speaking, is the infimum over all real numbers so that the product of any matrix and any matrix can be computed in arithmetic operations. Of course, , and if , then .) What is the value of ? Parallel to the developments in upper bounding , the upper bound was improved drastically over the years for the several regimes of [HP98, KZHP08, LG12, LU18]. The best lower bound on , however, has remained .
So the matrix multiplication exponent characterises the complexity of square matrix multiplication and, for every nonnegative real , the rectangular matrix multiplication exponent characterises the complexity of rectangular matrix multiplication. Coppersmith proved that there exists a value such that [Cop82]. The largest such that is denoted by . We will refer to as the dual matrix multiplication exponent. The algorithms constructed in [LU18] give the currently best bound . If , then of course . In fact, (Remark 3.20). Thus we study not only to understand rectangular matrix multiplication, but also as a means to prove . The value of
appears explicitly in various applications, for example in the recent work on solving linear programs
[CLS19] and empirical risk minimization [LSZ19].The goal of this paper is to understand why current techniques have not closed the gap between the best lower bound on and the best upper bound on , and to thus understand where to find faster rectangular matrix multiplication algorithms. We prove a barrier for current techniques to give much better upper bounds than the current ones. Our work gives a very precise picture of the limitations of current techniques used to obtain the best upper bounds on and the best lower bounds on .
Our ideas apply as well to by matrix multiplication for different and . We focus on for simplicity.
1.1 How are algorithms constructed?
To understand what are the current techniques that we prove barriers for, we explain how the fastest algorithms for matrix multiplication are constructed, on a high level. An algorithm for matrix multiplication should be thought of as a reduction of the “matrix multiplication problem” to the natural “unit problem” that corresponds to multiplying numbers,
Mathematically, problems correspond to families of tensors. Several different notions of reduction are used in this context. We will discuss tensors and reductions in more detail later.
In practice, the fastest matrix multiplication algorithms, for square or rectangular matrices, are obtained by a reduction of the matrix multiplication problem to some intermediate problem and a reduction of the intermediate problem to the unit problem,
The intermediate problems that have been used so far to obtain the best upper bounds on correspond to the socalled small and big Coppersmith–Winograd tensors and .
Depending on the intermediate problem and the notion of reduction, we prove a barrier on the best upper bound on that can be obtained in the above way. Before we say something about our new barrier, we discuss the history of barriers for matrix multiplication.
1.2 History of matrix multiplication barriers
We call a lower bound for all upper bounds on or that can be obtained by some method, a barrier for that method. We give a highlevel historical account of barriers for square and rectangular matrix multiplication.
Ambainis, Filmus and Le Gall [AFLG15] were the first to prove a barrier in the context of matrix multiplication. They proved that a variety of methods applied to the Coppersmith–Winograd intermediate tensors (which gave the best upper bounds on ) cannot give and in fact cannot give .
Alman and Vassilevska Williams [AW18a, AW18b] proved barriers for a notion of reduction called monomial degeneration, extending the realm of barriers beyond the scope of the Ambainis et al. paper. They prove that some collections of intermediate tensors, including the Coppersmith–Winograd intermediate tensors, cannot be used to prove . Their analysis is based on studying the socalled asymptotic independence number of the intermediate problem (also called monomial asymptotic subrank). This paper also for the first time studies barriers for rectangular matrix multiplication, for and monomial degeneration. For example, they prove that the intermediate tensor can only give .
Blasiak et al. [BCC17a, BCC17b] did a study of barriers for square matrix multiplication algorithms obtained with a subset of the grouptheoretic method, which is a monomial degeneration applied to certain group algebra tensors.
Christandl, Vrana and Zuiddam [CVZ19] proved barriers that apply more generally than the previous one, namely for a type of reduction called degeneration. Their barrier is given in terms of the irreversibility of the intermediate tensor. Irreversibility can be thought of as an asymptotic measure of the failure of Gaussian elimination to bring tensors into diagonal form. To compute irreversibility, they used the asymptotic spectrum of tensors and in particular two families of real tensor parameters with special algebraic properties: the quantum functionals [CVZ18] and support functionals [Str91], although one can equivalently use asymptotic slice rank to compute the barriers for the Coppersmith–Winograd intermediate tensors.
Alman [Alm19] simultaneously and independently obtained the same barrier, relying on a study of asymptotic slice rank.
1.3 New barriers for rectangular matrix multiplication
We prove new barriers for rectangular matrix multiplication using the quantum functionals and support functionals.
We first set up a general barrier framework that encompasses all previously used notions of reductions and then numerically compute barriers for the degeneration notion of reduction and the Coppersmith–Winograd intermediate problems. We also discuss barriers for “mixed” intermediate problems, which covers a method used by, for example, Coppersmith [Cop97].
We will explain our barrier in more detail in the language of tensors, but first we will give a numerical illustration of the barriers.
1.3.1 Numerical illustration of the barriers
For the popular intermediate tensor our barrier to get upper bounds on via degeneration looks as follows. In Fig. 1, the horizontal axis goes over all . The blue line is the upper bound on obtained via as in [LG12]. The yellow line is the barrier and the red line is the best lower bound on . (In [LG12] the best upper bounds on are obtained using with for , for and for .)
How about the barrier for for other values of ? To see what happens there, we give in Fig. 2 the barrier for several values of in terms of the dual matrix multiplication exponent . (We recall that is the largest value of such that .) For this barrier corresponds to the smallest value of in Fig. 1 where the yellow line goes above .
1.3.2 The barrier in tensor language
Let us continue the discussion that we started in Section 1.1 of how algorithms are constructed, but now in the language of tensors. The goal is to explain our barrier in more detail.
As we mentioned, algorithms correspond to reductions from the matrix multiplication problem to some natural unit problem and the problems correspond to tensors. Let be our base field. (The value of may in fact depend on the characteristic of the base field.) A tensor is a trilinear map . The problem of multiplying an matrix and an matrix corresponds to the matrix multiplication tensor
The unit problem corresponds to the family of diagonal tensors
There are several notions of reduction that one can consider, but the following is the most natural one. For two tensors and we say is a restriction of and write if there are three linear maps of appropriate formats such that is obtained from by precomposing with , and , that is, .
A very important observation (see, e.g., [BCS97] or [Blä13]) is that any matrix multiplication algorithm corresponds to an inequality
Square matrix multiplication algorithms look like
and rectangular matrix multiplication, of the form that we study, look like
In general, faster algorithms correspond to having smaller on the righthand side. In fact, if
then , and similarly for any , if
then . For example, if
then .
Next we utilise a natural product structure on matrix multiplication tensors which is well known as the fact that block matrices can be multiplied blockwise. For tensors and one naturally defines a Kronecker product generalizing the matrix Kronecker product. Then the matrix multiplication tensors multiply like and the diagonal tensors multiply like .
We can thus say: if
then . We now think of our problem as the problem of determining the optimal asymptotic rate of transformation from to . Of course we can do similarly for values of other than , if we deal carefully with that are noninteger. For clarity we will in this section stick to .
In practice, as mentioned before, algorithms are obtained by reductions via intermediate problems. This works as follows. Let be any tensor, the intermediate tensor. Then clearly, if
(1) 
then . The barrier we prove is a lower bound on depending on and the notion of reduction used in the inequality , which in this section we take to be restriction.
We obtain the barrier as follows. Imagine that is a map from the set of tensors to the nonnegative real numbers that is monotone, multiplicative and normalised, meaning that for any tensors and the following holds: if then ; and . We apply to both sides of the first inequality to get
and so
Let be another map from tensors to reals that is monotone, multiplicative and normalised. We apply to both sides of the second inequality to get
and so
We conclude that
Our barrier is thus
where the maximisation is over the monotone, multiplicative and normalised maps from tensors to reals.
For tensors over the complex numbers, we know a family of monotone, multiplicative and normalised maps from tensors to reals, the quantum functionals. For tensors over other fields, we know a family of maps with slightly weaker properties, that are still sufficient to prove the barrier, the support functionals.
Theorem.
Upper bounds on obtained via the intermediate tensor are at least
where the maximisation is over all support functionals, or all quantum functionals.
See Theorem 3.13 for the precise statement of the result and Section 1.3.1 for illustrations.
1.3.3 Catalyticity
We discussed that, in practice, the best upper bound on, say, is obtained by a chain of inequalities of the form
(2) 
We utilised this structure to obtain the barrier. A closer look reveals that the methods used in practice have even more structure. Namely, they give an inequality that also has diagonal tensors on the lefthand side:
(3) 
Part of the tensor on the far righthand side acts as a catalyst since is returned on the far lefthand side. We obtain better barriers when we have a handle on the amount of catalyticity that is used in the method (see the schematic Fig. 3), again by applying maps and to both sides of the two inequalities and deducing a lower bound on . The precise statement appears in Theorem 3.13.
1.4 Overview of the next sections
In Section 2 we discuss in more detail the methods that are used to construct rectangular matrix multiplication algorithms and the different notions of reduction.
In Section 3 we introduce and prove our barriers in the form of a general framework, dealing formally with noninteger . We also discuss how to analyse “mixed” intermediate tensors.
In Section 4 we discuss how to compute the barriers explicitly using the support functionals and we compute them for the Coppersmith–Winograd tensors .
2 Algorithms
At the core of the methods that give the best upper bounds on lies the following theorem, which can be proven using the asymptotic sum inequality for rectangular matrix multiplication [LR83] and the monotonicity of .
Theorem 2.1.
Let . If , then .
Here denotes the naturally defined direct sum for tensors. The rank of a tensor is the smallest number such that , or equivalently, the smallest number such that where are linear. The asymptotic rank is defined as the limit , which equals the infimum since tensor rank is submultiplicative under and bounded.
Equivalently, phrased in the language of the introduction, for , if
(4) 
then . In practice, the upper bound is obtained from a restriction for some intermediate tensor and an upper bound on . The restriction in may be replaced by other types of reductions that we will discuss below.
Reductions.
We say is a monomial restriction of and write if can be obtained from by setting some variables to zero. We say is a monomial degeneration of and write if can be obtained from by multiplying the variables by integer powers of so that appears in the lowest degree. Strassen’s application of the laser method uses monomial degenerations and the modification of Coppersmith and Winograd [CW90] uses combinatorial restrictions where the variables zeroed out are chosen using a certain combinatorial gadget (a Salem–Spencer set). Degeneration is a very general reduction that generalises the above reductions. We say is a degeneration of and write if appears in the lowest degree in for some linear maps whose matrices have coefficients that are Laurent polynomials in . Restriction is the special case of degeneration where the Laurent polynomials are constant.
Coppersmith–Winograd intermediate tensors.
All improvements on since Coppersmith and Winograd use the Coppersmith–Winograd tensors defined by as intermediate tensors. Degeneration methods give .
Mixed Coppersmith–Winograd tensors.
3 Barriers
Let denote restriction on tensors as defined in the introduction. We remark that everything we discuss in this section also holds if is replaced with degeneration or monomial degeneration or monomial restriction.
Let be a map from all tensors to the reals. The statements we prove in this section hold for with certain special properties. Two families that satisfy the properties are the quantum functionals (which we will not explicitly use in this paper — we refer to [CVZ18] for the definition) and the (upper) support functionals. For concreteness, we will think of as the support functionals. We will define the support functionals in the next section. For now, we will use the following properties.
Lemma 3.1 (Strassen [Str91]).
Any support functional is

[label=()]

monotone,

submultiplicative,

mamumultiplicative: is multiplicative for any two matrix multiplication tensors,

additive,

at most .
More is known about the support functionals than Lemma 3.1. For example, they are multiplicative not only on the matrix multiplication tensors, but also on a larger family of tensors called oblique tensors.
Remark 3.2.
The statements in this section can be proven more generally for certain preorders (including degeneration, monomial degeneration and monomial restriction) and certain maps . Here for concreteness we discuss everything in terms of restriction and the support functionals. A precise discussion will appear in the full version.
3.1 Noninteger
Recall that is a nonnegative real number. To deal with that are not integer we will define a notational shorthand. We first observe the following.
Lemma 3.3.
Let . Suppose that is an integer such that is integer. Then
Proof.
For every rational number we have
From Lemma 3.3 follows that is the same for any with integer power . We introduce a notation for dealing with this value without referring to the set of possible values of
Definition 3.4.
We introduce a formal symbol for each real , which we call a quasitensor. If for integers and , then we define
Otherwise, we define
If is integer, then the value of on as a tensor and as a quasitensor coincide. Thus we identify the quasitensor with the tensor when the latter exists.
Using this notation, Lemma 3.3 can be rephrased as follows.
Lemma 3.5.
If , then .
Lemma 3.6.
.
Proof.
We have because if , then and , and if , then . Analogous results hold for and .
Suppose . Then
For arbitrary the result follows by a continuity argument. ∎
Lemma 3.7.
If , then .
Proof.
We have
and so
3.2 method
For any tensor we define the notion of a method for upper bounds on as follows.
Definition 3.8 (method).
Suppose . Suppose we are given a collection of inequalities with . Then Theorem 2.1 gives the upper bound where where the infimum is taken over all appearing in the collection of inequalities. We then say is obtained by a method.
We say that the method is catalytic if the set of values of is unbounded, the bound is not attained on any one reduction of the method (so ), and in any reduction we have for some constant .
Theorem 3.9.
Any upper bound on obtained by a method satisfies
Moreover, if the method is catalytic, then
Proof.
It is enough to prove the inequality for one reduction with , which gives an upper bound .
Using Lemma 3.5 and superadditivity of , we have
Therefore . For we get
Since , we have and therefore
If the method is catalytic, then , and as we have
This concludes the proof. ∎
3.3 Asymptotic method
To cover the method that are used in practice we need the following notion.
Definition 3.10 (Asymptotic method.).
Let be a tensor. Suppose . Suppose we are given a collection of inequalities where the values of are unbounded and for some function . Then is at most where where the limit is taken over all appearing in the collection of inequalities as . We say is obtained by an asymptotic method.
We say that the asymptotic method is catalytic if in any inequality we have for some constant .
Remark 3.11.
This class of methods works because each reduction gives an upper bound where . As the function is continuous [LR83], we get the required bound on in the limit.
Remark 3.12.
The usual descriptions of the laser method applied to rectangular matrix multiplication result in an asymptotic method because the construction involves an approximation of a certain probability distribution by a rational probability distribution. As a result of this approximation, the matrix multiplication tensor constructed may have format slightly smaller than
.Theorem 3.13.
Any upper bound obtained by an asymptotic method satisfies
and for catalytic methods,
Proof.
Suppose . Then is an upper bound on . Then, as in Theorem 3.9, we have
Because , both fractions are greater than and for it is true that
As , we have and, if the method is catalytic, then . The upper bound given by the method is the limit . Taking , we get the required inequalities. ∎
3.4 Mixed method
Coppersmith [Cop97] uses a combination of Coppersmith–Winograd tensors of different format to get an upper bound on the rectangular matrix multiplication exponent. More specifically, he considers a sequence of tensors . Our analysis applies to tensor sequences of this kind because their asymptotic behaviour is similar to sequence of the form in the sense of the following two lemmas.
Lemma 3.14.
Let and be some tensors. Given functions such that for some positive real numbers , define a sequence of tensors . Then for each the sequence is bounded from above.
Proof.
We have
The righthand side converges to and, therefore, is bounded. ∎
Lemma 3.15.
Let and be some tensors. Given functions such that for some positive real numbers , define a sequence of tensors . Then the sequence converges.
Proof.
For this, we need Strassen’s spectral characterization of the asymptotic rank [Str88]. Strassen defines the asymptotic spectrum of tensors as the set of all monotone, multiplicative, additive maps from tensors to positive reals such that . Then can be made into a compact Hausdorff topological space such that the evaluation map is continuous for all , and
For we have
Because of compactness of this convergence is uniform in . Therefore
Definition 3.16.
A sequence of tensors is called almost exponential if the sequences converges and is bounded for each . Abusing the notation, we write and .
Definition 3.17 (Asymptotic mixed method).
Let be an almost exponential sequence of tensors with . Suppose we are given a collection of inequalities where the values of are unbounded and for some . Then is at most where the limit is taken over all appearing in the collection of inequalities as . We say that is obtained by an asymptotic mixed method.
We say that the asymptotic mixed method is catalytic if in each inequality we have for some constant .
Lemma 3.18.
Asymptotic mixed methods give true upper bounds on .
Proof.
Note that for a fixed tensor there are only a finite number of restrictions possible as the left tensor is of format , which should be no greater than the format of . Thus, because in an asymptotic mixed method the set of values of is unbounded, so is the set of values of .
For one restriction we have the inequality , that is, . Since and is a continuous function and , we get in the limit the required inequality. ∎
Comments
There are no comments yet.