New ways to multiply 3 x 3-matrices

05/24/2019 ∙ by Marijn J. H. Heule, et al. ∙ Johannes Kepler University Linz The University of Texas at Austin 0

It is known since the 1970s that no more than 23 multiplications are required for computing the product of two 3 x 3-matrices. It is not known whether this can also be done with fewer multiplications. However, there are several mutually inequivalent ways of doing the job with 23 multiplications. In this article, we extend this list considerably by providing more than 13 000 new and mutually inequivalent schemes for multiplying 3 x 3-matrices using 23 multiplications. Moreover, we show that the set of all these schemes is a manifold of dimension at least 17.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The classical algorithm for multiplying two matrices performs additions and multiplications. Strassen’s algorithm (Strassen, 1969) does the job with only additions and multiplications, by recursively applying a certain scheme for computing the product of two -matrices with only 7 instead of the usual 8 multiplications. The discovery of Strassen’s algorithm has initiated substantial work during the past 50 years on finding the smallest exponent such that matrix multiplication costs operations in the coefficient ring. The current record is and was obtained by Le Gall (2014). It improves the previous record of Williams (2012) by just . Extensive background in this direction is available in text books (Bürgisser et al., 2013; Landsberg, 2017) and survey articles (Bläser, 2013; Pan, 2018). Contrary to wide-spread belief, Strassen’s algorithm is not only efficient in theory but also in practice. Special purpose software for exact linear algebra, such as the FFLAS and FFPACK packages (Dumas et al., 2008), have been using it since long, and there are also reports that its performance in a numerical context is not as bad as its reputation (Huang et al., 2016).

Besides the quest for the smallest exponent, which only concerns the asymptotic complexity for asymptotically large , it is also interesting to know how many multiplications are needed for a specific (small) to compute the product of two -matrices. Thanks to Strassen, we know that the answer is at most 7 for , and it can be shown (Winograd, 1971) that there is no way to do it with 6 multiplications. It can further be shown that, in a certain sense, Strassen’s scheme is the only way of doing it with 7 multiplications (de Groote, 1978).

Already for , the situation is not completely understood. Laderman (1976) showed that 23 multiplications suffice, and Bläser (2003) showed that at least 19 multiplications are needed. For larger sizes as well as rectangular matrices, many people have been searching for new schemes using fewer and fewer coefficient multiplications. For , the best we know is to apply Strassen’s scheme recursively, which requires 49 multiplications. For , the record of 100 multiplications was held Makarov (1987) for 30 years until it was improved to 99 by Sedoglavic (2017b). For , there is a recent scheme by Smirnov (2013) which needs only 160 multiplications. For , Sedoglavic (2017c) found a way to compute the product with 250 multiplications. For larger sizes and rectangular matrices, see the extensive tables compiled by Smirnov (2013, 2017) and Sedoglavic (2019). Many of the schemes for larger matrix sizes are obtained by combining multiplication schemes for smaller matrices (Drevet et al., 2011).

Although nobody knows whether there is a scheme using only 22 multiplications for  (in an exact and non-commutative setting), 23 multiplications can be achieved in many different ways. Johnson and McLoughlin (1986) have in fact found infinitely many ways. They presented a family of schemes involving three free parameters. However, their families involve fractional coefficients and therefore do not apply to arbitrary coefficient rings 

. Many others have reported isolated schemes with fractional or approximate coefficients. Such schemes can be constructed for example by numerically solving a certain optimization problem, or by genetic algorithms. In Laderman’s multiplication scheme, all coefficients are

, or , which has the nice feature that it works for any coefficient ring. As far as we know, there are so far only three other schemes with this additional property, they are due to Smirnov (2013), Oh et al. (2013), and Courtois et al. (2011), respectively. We add more than 13 000 new schemes to this list.

The isolated scheme presented by Courtois et al. was not found numerically but with the help of a SAT solver. SAT (Biere et al., 2009) refers to the decision problem of propositional logic: given a Boolean formula in conjunctive normal form, is there an assignment of the Boolean variables such that the formula evaluates to true under this assignment? Although SAT is a prototypical example of an NP-complete problem, modern SAT solvers are able to solve very large instances. In addition to various industrial applications, they have recently also contributed to the solution of difficult mathematical problems, see Heule et al. (2016) and Heule (2018) for two examples. SAT solvers also play a central role in our approach. As explained in Section 3, we first use a SAT solver to find multiplication schemes for the coefficient ring , starting from some known solutions. In a second step, explained in Section 4, we discard solutions that are equivalent to solutions found earlier. Next, we simplify the new solutions (Sect. 5), and use them as starting points for a new round of searching. Altogether about 35 years of computation time were spent in several iterations of this process. In the end, we lifted the solutions from to arbitrary coefficient rings (Sect. 6), and we extracted families with up to 17 free parameters from them (Sect. 7). Our 13,000 isolated schemes and our parameterized families are provided in various formats on our website (Heule et al., 2019b).

2 The Brent Equations

The general pattern of a matrix multiplication scheme consists of two sections. In the first section, several auxiliary quantities are computed, each of which is a product of a certain linear combination of the entries of the first matrix with a certain linear combination of the entries of the second matrix. In the second section, the entries of the resulting matrix are obtained as certain linear combinations of the auxiliary quantities .

For example, writing

Strassen’s multiplication scheme proceeds as follows:

First section.






Second section.



.

Observe that the number of multiplications is exactly the number of ’s. Also observe that while it is not obvious how to construct such a scheme from scratch, checking that a given scheme is correct is an easy and straightforward calculation. For example, .

In order to search for a multiplication scheme for a prescribed shape of matrices (e.g., ) and a prescribed number of multiplications (e.g., ), we can make an ansatz for the coefficients of the various linear combinations,

and then compare coefficients such as to enforce . Doing so leads to a system of polynomial equations for the undetermined coefficients . The equations in this system are known as the Brent equations (Brent, 1970). For -matrices and 23 multiplications, the equations turn out to be

for , i.e., there are 621 variables and 729 cubic equations. The on the right refer to the Kronecker-delta, i.e., if and otherwise.

The equations become a bit more symmetric if we connect the matrices through rather than . In the version with the transposition, which we shall use from now on, and which is also more common in the literature, the right hand side has to be replaced with .

In any case, the problem boils down to finding a solution of the Brent equations. In principle, this system could be solved using Gröbner bases (Buchberger, 1965; Cox et al., 1992; Buchberger and Kauers, 2010), but doing so would require an absurd amount of computation time. Some of the solutions reported in the literature have been found using numerical solvers (Smirnov, 2013; Oh et al., 2013), and Laderman (1976) claims that his solution was found by solving the Brent equations by hand. He writes that he would explain in a later paper how exactly he did this, but apparently this later paper has never been written. Only recently, Sedoglavic (2017a) has given a convincing explanation of how Laderman’s scheme can be derived from Strassen’s scheme for matrices. Courtois et al. (2011) found their solution using a SAT solver. We also start our search using SAT solvers.

3 SAT Encoding and Streamlining

In order to encode the problem as a SAT problem, we view the Brent equations as equations for the finite field , interpret its elements as truth values, its addition as exclusive or (), and its multiplication as conjunction (). These propositional formulas cannot be directly be processed by most state-of-the-art SAT solvers, because they require the formulas in conjunctive normal form (CNF). A formula is in CNF if it is a conjunction of clauses, where a clause is a disjunction () of literals and a literal is a Boolean variable or the negation of a Boolean variable (). For avoiding an exponential blow-up when transforming an arbitrary structured formula to CNF, auxiliary variables are introduced that abbreviate certain subformulas. For every and every , we introduce a fresh variable and impose the condition

whose translation to CNF requires three clauses. Similarly, for every and every , we introduce a fresh variable and impose the condition

whose translation to CNF costs again three clauses.

For each fixed choice , there is a Brent equation which says that the number of ’s for which is set to true should be even (if

) or that it should be odd (if

). It therefore remains to encode the condition that an even number (or an odd number) of a given set of variables should be true, i.e., we need to construct a formula which is true if and only if an even number among the variables is true. Such a formula can again be constructed using auxiliary variables. Note that is true if and only if is true, because this is the case if and only if both and contain an even number of variables set to true (and then is set to false) or both sets contain an odd number of variables set to true (and then is set to true). Applying this principle recursively for (the number of summands in each Brent equation), the problem can be broken down to chunks of size four:

The small chunks can be encoded directly by observing that is equivalent to

For the cases where an odd number of the variables must be true, we can apply the encoding described above to .

The SAT problems obtained in this way are very hard. In order to make the problems more tractable, we added further constraints in order to simplify the search performed by the solver. This approach is known as streamlining (Gomes and Sellmann, 2004). The following restrictions turned out to be successful:

  • Instead of a faithful encoding of the sums in the Brent equations using the even predicate as described above, we also used a more restrictive sufficient condition which instead of requiring an even number of arguments to be true enforces that zero or two arguments should be true. This predicate zero-or-two can be broken into at-most-two and not-exactly-one, which can be efficiently encoded as

    where and are fresh variables. The first two lines of at-most-two assert that at most two variables of are true. If two or more of those variables are true then the new variables and have to be both true, if one variable is true, then either or has to be true, and if all four variables are false, then also and can be both false. Encoding this information in and allows to recursively apply at-most-two with two arguments less. A straightforward direct encoding as in the first two lines is used when .

  • We selected a certain portion, say 50%, of the variables , , and instantiate them with the values they have in one of the known solutions. The SAT solver then has to solve for the remaining variables. It turns out that in many cases, it does not just rediscover the known solution but finds a truly different one that only happens to have an overlap with the original solution.

  • Another approach was to randomly set half of the terms with and and to zero. This strategy was motivated by the observation that in most of the known solutions, almost all these terms are zero.

  • A third approach concerns the terms with and and . Again motivated by the inspection of known solutions, we specified that for each either one or two such terms should be one. More precisely, we randomly chose a distribution of the 27 terms with and and to the 23 summands of the scheme, with the condition that 19 summands should contain one term each and the remaining four summands should contain two terms each.

Each of the latter three approaches was used in combination with both the ‘even’ and the ‘zero-or-two’ encoding of the Brent equations. The resulting instances were presented to the SAT solver yalsat by Biere (2018). When it didn’t find a solution for an instance within a few minutes, the instance was discarded and a new instance with another random choice was tried. A detailed analysis of the effect of our optimizations on the performance of the solver is provided in a separate paper (Heule et al., 2019a).

4 Recognizing Equivalences

From any given solution of the Brent equations we can generate many equivalent solutions. For example, exchanging with and flipping all indices maps a solution to another solution. This operation corresponds to the fact that . It is also clear from the equations that replacing by , by , and by 

maps a solution to another solution, although this operation is less obvious in terms of matrix multiplication. Finally, for any fixed invertible matrix 

, we can exploit the fact to map solutions to other solutions.

The operations just described form a group of symmetries of matrix multiplication which was introduced by de Groote (1978), who used them for showing that Strassen’s scheme for

matrices is essentially unique: it is unique modulo the action of this symmetry group. To describe the group more formally, it is convenient to express matrix multiplication schemes as tensors,

A scheme is correct if and only if it is equal, as element of , to , where refers to the matrix which has a at position and zeros everywhere else.

A permutation acts on a tensor by permuting the three factors, and transposing each of them if . For example, and . A triple of invertible matrices acts via

A tuple acts on a tensor by first letting the permutation act as described above, and then applying the matrices as described above. The set is turned into a group by defining the multiplication in such a way that the operation described above becomes a group action. The action of the group defined on tensors is extended to the whole space by linearity. In other words, elements of act on sums of tensors by acting independently on all summands.

Two matrix multiplication schemes are called equivalent if they belong to the same orbit under the action of . Whenever a new matrix multiplication scheme is discovered, the question is whether it is equivalent to a known scheme, for if it is, it should not be considered as new. A common test for checking that two schemes are not equivalent proceeds by computing certain invariants of the group action. For example, since permutation and multiplication by invertible matrices do not change the rank of a matrix, we can count how many matrices of rank 1, 2, and 3 appear in the scheme. If the counts differ for two schemes, then these schemes cannot be equivalent. For example, Courtois et al. (2011) and Oh et al. (2013) proved in this way that their schemes were indeed new. Writing a scheme in the form , we can encode this invariant as the polynomial . Similarly, also the polynomials

are invariants, because changing the order of summation does not affect the relative order of the factors in the tensor, and applying a permutation changes the relative order of the factors in every summand in the same way.

When we have two schemes for which all three invariants match, they may nevertheless be inequivalent. For checking whether a solution found by the SAT solver is really new, comparing invariants is useful as a first step, but it is not sufficient. In fact, many solutions found by the SAT solver were inequivalent although all three invariants stated above agreed. Fortunately, it is not too hard to decide the equivalence of two given schemes by constructing, whenever possible, a group element that maps one to the other. We can proceed as follows.

Suppose we are given two multiplication schemes and we want to decide whether there exists a tuple such that . As far as the permutation is concerned, there are only six candidates, so we can simply try each of them. Writing and , it remains to find that map all the summands of to the summands of , albeit possibly in a different order. We search for a suitable order by the following recursive algorithm, which is initially called with being full space .

Input: as above, a basis of a subspace of
Output: A triple with , or if no such triple exists.

1if and are empty, then:

2return any element of with , or if no such element exists.

3for all summands of , do:

4if and and , then:

5compute a basis of the space of all such that , , by making an ansatz, comparing coefficients, and solving a homogeneous linear system.

6compute a basis of .

7if contains at least one triple with , then:

8call the algorithm recursively with the first summand of and the th summand of removed, and with in place of .

9if the recursive call yields a triple , return it.

10return .

The algorithm terminates because each recursive call is applied to a sum with strictly fewer summands. The correctness of the algorithm is clear because it essentially performs an exhaustive search through all options. In order to perform the check in step 7, we can consider a generic linear combination of the basis elements of , with variables as coefficients. Then is a polynomial in these variables, and the question is whether this polynomial vanishes identically on . Since we are interested in the case , we can answer this by an exhaustive search.

The recursive structure of the algorithm with up to 23 recursive calls at every level may seem prohibitively expensive. However, the two filters in lines 4 and 7 turn out to cut down the number of recursive calls considerably. A straightforward implementation in Mathematica needs no more than about one second of computation time to decide whether or not two given schemes are equivalent. Of course, we first compare the invariants, which is almost for free and suffices to settle many cases.

For each scheme found by the SAT solver we have checked whether it is equivalent (for ) to one of the schemes found earlier, or to one of the four known schemes found by Laderman, Smirnov, Oh et al., and Courtois et al., respectively. From the roughly solutions found by the SAT solver that were distinct modulo the order of the summands, we isolated about schemes that were distinct modulo equivalence. In the appendix, we list the number of schemes we found separated by invariant.

5 Simplifying Solutions

We can use the symmetries introduced in the previous section not only to recognize that a seemingly new scheme is not really new. We can also use them for simplifying schemes. A scheme can for example be regarded as simpler than another scheme if the number of terms in it which evaluate to is smaller. Calling this number the weight of a scheme, we prefer schemes with smaller weight.

Ideally, we would like to replace every scheme by an equivalent scheme with smallest possible weight. In principle, we could find such a minimal equivalent element by applying all elements of to and taking the smallest result. Unfortunately, even for , the group has elements, so trying them all might be feasible if we had to do it for a few schemes, but not for thousands of them. If we do not insist in the smallest possible weight, we can take a pragmatic approach and just spend for every scheme a prescribed amount of computation time (say half an hour) applying random elements of to :

Input: a multiplication scheme
Output: an equivalent multiplication scheme whose weight is less than or equal to the weight of .

1while the time limit is not exhausted, do

2pick a group element at random

3if , then set

4return

With this algorithm, we were able to replace about 20% of the new schemes found by the SAT solver by equivalent schemes with smaller weight. It is not too surprising that no improvement was found for the majority of cases, because the way we specified the problem to the SAT solver already induces a bias towards solutions with a small weight.

The figure below shows the distribution of our

schemes according to weight, after simplification. It is clear that the weight is always odd, hence the small gaps between the bars. It is less clear why we seem to have an overlay of three normal distributions, but we believe that this is rather an artifact of the way we generated the solutions than a structural feature of the entire solution set.

weight

count

160

180

200

220

240

260

280

300

0

100

200

300

400

500

600

6 Generalizing the Coefficient Ring

At this point, we have a considerable number of new matrix multiplication schemes for the coefficient field . The next step is to lift them to schemes that work in any coefficient ring. The SAT solver presents us with a solution for in which all coefficients are or , and in order to lift such a solution, we make the hypothesis that this solution originated from a solution for an arbitrary coefficient ring in which all coefficients are , , or . The distinction between and gets lost in , and the task consists in recovering it. There is a priori no reason why such a lifting should exist, and indeed, we have seen a small number of instances where it fails. One such example is given in the appendix. Interestingly however, these examples seem to be very rare. In almost all cases, a lifting turned out to exist.

In order to explain the lifting process, we return to the Brent equations discussed in Section 2. We set variables corresponding to coefficients that are zero in the SAT solution to zero, which simplifies the system considerably. According to the axioms of tensor products, we have for any and every constant . We may therefore select in every summand one variable appearing in and one variable appearing in and set them to . This reduces the number of variables further. However, the resulting system is still to hard to be solved directly.

Before calling a general purpose Gröbner bases engine, we apply some simplifications to the system, which take into account that we are only interested in solutions whose coordinates are or . In particular, we can replace any exponent appearing in any of the polynomials by , we can cancel factors that clearly do not vanish on the points of interest, and we can replace polynomials of the from by . These simplifications may bring up some linear polynomials. By triangularizing the linear system corresponding to these polynomials, we can eliminate some of the variables. We can then simplify again, and possibly obtain new linear equations. The process is repeated until no further linear equations appear. We then add for each variable the polynomial and compute a Gröbner basis with respect to a degree order. If this leads to new linear polynomials, we return to iterating triangularization, elimination, and simplification until no further linear equations show up, and then compute again a degree Gröbner basis. The whole process is repeated until we obtain a Gröbner basis that does not contain any new linear equations. If there are more than 15 variables left, we next compute a minimal associated prime ideal of an elimination ideal involving only five variables, and check whether adding it to the original system and computing a Gröbner basis leads to new linear equations. If it does, we start over with the whole procedure. Otherwise, we compute the minimal associated prime ideal of the whole system and return the solution corresponding to one of the prime factors. The process is summarized in the following listing.

Input: A finite subset of
Output: A common root of all the elements of , or if no such common root exists.

1Replace every exponent appearing in an element of by

2For every and every with , replace by

3Replace every element of the form or by or , respectively.

4if now contains linear polynomials, then:

5Use them to eliminate some variables, say

6Call the procedure recursively on the resulting set of polynomials

7if there is a solution, extend it to the eliminated variables and return the result

8if there is no solution, return .

9Compute a Gröbner basis of with respect to a degree order

10if , return

11if contains linear polynomials, then call this procedure recursively and return the result

12if , then:

13Compute a basis of one of the minimal associated prime ideals of .

14Compute a Gröbner basis of with respect to a degree order

15if contains linear polynomials, then call this procedure recursively and return the result

16Compute a basis of one of the minimal associated prime ideals of .

17Return the common solution of .

An implementation of this procedure in Mathematica is available on the website of this article (Heule et al., 2019b). In this implementation, we use Singular (Greuel and Pfister, 2002) for doing the Gröbner basis calculations and for the computation of minimal associated prime ideals. Despite the large number of variables, Singular handles the required computations with impressive speed, so that the whole signing process takes only about 20 seconds per solution on the average. Only a small number of cases, which happen to have a few more variables than the others, need much longer, up to a few hours.

7 Introducing Parameters

The idea of instantiating some of the variables based on a known scheme and then solving for the remaining variables approach not only applies to SAT solving. It also has an algebraic counterpart. Solving the Brent equations with algebraic methods is infeasible because the equations are nonlinear, but observe that we only have to solve a linear system if we start from a known scheme and only replace all by fresh variables. Solving linear systems is of course much easier than solving nonlinear ones.

More generally, we can select for each separately whether we want to replace all ’s or all ’s or all

’s by fresh variables, and we still just get a linear system for these variables. Once we make a selection, solving the resulting linear system yields an affine vector space. One might expect this affine space will typically consist of a single point only, but this is usually not the case.

A solution space with positive dimension can be translated into a multiplication scheme involving one or more free parameters. Starting from the resulting parameterized scheme, we can play the same game with another selection of variables, which may allow us to introduce further parameters. If we repeat the procedure several times with random selections of which variables are known, we obtain huge schemes involving 40 or more parameters. These parameters are however algebraically dependent, or at least it is too costly check whether they are dependent or not. We got better results by proceeding more systematically, as summarized int in the following listing.

Input: A matrix multiplication scheme . Write , , .
Output: A family of matrix multiplication schemes with parameters

1for , do:

2for every choice with , do:

3replace all entries for in by fresh variables

4replace all entries for and in by fresh variables

5equate the resulting scheme to and compare coefficients

6solve the resulting inhomogeneous linear system for the fresh variables introduced in steps 3 and 4

7substitute the generic solution, using new parameters , into 

8return

With this algorithm and some slightly modified variants (e.g., letting the outer loop run backwards or transposing the inner and the outer loop), we were able to obtain schemes with altogether up to 17 parameters. Although all new parameters introduced in a certain iteration can only appear linearly in the scheme, old parameters that were considered as belonging to the ground ring during the linear solving can later appear rationally. However, by manually applying suitable changes of variables, we managed to remove all denominators from all the families we inspected. Not even integer denominators are needed. We can also check using Gröbner bases whether the parameters are independent, and for several families with 17 parameters they turn out to be. In the language of algebraic geometry, this means that the solution set of the Brent equations has at least dimension 17 as an algebraic variety.

One of our families is shown in the appendix, and some further ones are provided electronically on our website. These families should be contrasted with the family found by Johnson and McLoughlin in the the 1980s (Johnson and McLoughlin, 1986). In particular, while they lament that their family contains fractional coefficients such as and and therefore does not apply in every coefficient ring, our families only involve integer coefficients and therefore have no such restriction. Moreover, their family has only three parameters, and with the method described above, only additional parameters can be introduced into it. The number of parameters we managed to introduce into the known solutions by Laderman, Courtois et al., Oh et al., and Smirnov are , , and , respectively.

8 Concluding Remarks

Although we have found many new multiplication schemes with 23 multiplications, we did not encounter a single scheme with 22 multiplications. We have checked all schemes whether some of their summands can be merged together using tensor product arithmetic. For doing so, it would suffice if a certain scheme contains some summands which share the same ’s, say, and where the corresponding ’s, say, of these rows are linearly independent. We could then express one of these ’s in terms of the others and eliminate the summand in which it appears. For example, if , then we have . Since none of our schemes admits a simplification of this kind, it remains open whether a scheme with 22 multiplications exists.

Another open question is: how many further schemes with 23 multiplications and coefficients in are there? We have no evidence that we have found them all. In fact, we rather believe that there are many further ones, possibly including schemes that are very different from ours. There may also be parametrized families with more than 17 parameters, and it would be interesting to know the maximal possible number of parameters, i.e., the actual dimension of the solution set of the Brent equations.

References

Appendix

 List of all invariants appearing in our set of non-equivalent schemes. These tables are based on the schemes for and include the few schemes that could not be lifted to . The numbers on the right indicate how many schemes with the respective invariant we have found.

1
1
1
2
3
3
3
3
4
4
5
5
6
7
7
8
10
16
17
26
31
47
47
53
58
68
77
102
103
104
110
111
138
139
144
147
375
914
1371
2461
2824
3476
1
1
1
1
1
1
1
1
1
1
2
2
2
3
3
3
3
3
3
3