Multiplication is one of the most fundamental computational problems and the simple “long multiplication” -time algorithm for multiplying two -digit numbers is taught to elementary school pupils around the world. Despite its centrality, the true complexity of multiplication remains elusive. In 1960, Kolmogorov conjectured that the thousands of years old -time algorithm is optimal and he arranged a seminar at Moscow State University with the goal of proving this conjecture. However only a week into the seminar, the student Karatsuba came up with an time algorithm [KO62]. The algorithm was presented at the next seminar meeting and the seminar was terminated. This sparked a sequence of improved algorithm such as the Toom-Cook algorithm [Too63, Coo66] and the Schönhage-Strassen algorithm [SS71]. The Schönhage-Strassen algorithm, as well as the current fastest algorithm by Fürer [Fü09]
, are both based on the Fast Fourier Transform (FFT). Fürer’s algorithm can be shown to run in timewhen multiplying two -bit numbers [HvdH18]. It can even be implemented as a constant degree Boolean circuit of the same size. Here is the very slowly growing iterated logarithm.
But what is the true complexity of multiplying two -bit numbers? Can it be done via e.g. a Boolean circuit of size like addition? Or is multiplication strictly harder? Our main contribution is to show a connection between multiplication and a central conjecture by Li and Li [LL04] in the area of network coding. Our results show that if the conjecture by Li and Li [LL04] is true, then any constant degree Boolean circuit for computing the product of two -bit numbers must have size . This establishes a conditional lower bound for multiplication that comes within a factor of Fürer’s upper bound and implies that multiplication is strictly harder than addition.
Before diving into the details of our results, we first give a brief introduction to network coding.
Network coding studies communication problems in graphs. Given a graph with capacity constraints on the edges and data streams, each with a designated source-sink pair of nodes in , what is the maximum rate at which data can be transmitted concurrently between the source-sink pairs? One solution is to just forward the data, which reduces the problem to a multicommodity flow problem. The central question in network coding is whether one can achieve a higher rate by using coding/bit tricks. This question is known to have a positive answer in directed graphs, where the rate increase may be as high as a factor (by sending XOR’s of carefully chosen input bits), see e.g. [AHJ06]. However the question remains wide open for undirected graphs where there are no known examples for which network coding can do better than the multicommodity flow rate. A central conjecture in network coding, due to Li an Li [LL04], says that coding yields no advantage in undirected graphs.
Conjecture 1 (Undirected -pairs Conjecture [Ll04]).
The coding rate is equal to the Multicommodity-Flow rate in undirected graphs.
Despite the centrality of this conjecture, it has heretofore resisted all attempts at either proving or refuting it. Conjecture 1 has been used twice before for proving lower bounds for computational problems. Adler et al. [AHJ06] were the first to initiate this line of study. They presented conditional lower bounds for computing the transpose of a matrix via an oblivious algorithm. Here oblivious means that the memory access pattern is fixed and independent of the input. Since a circuit is oblivious, they also obtain circuit lower bounds for matrix transpose. Very recently Farhadi et al. [FHLS19] showed how to remove the obliviousness assumption for external memory problems. Their main result was a tight lower bound for external memory integer sorting, conditioned on Conjecture 1 being true.
1.1 Our Results
Our main result is an exciting new connection between network coding and the complexity of multiplication. Formally, we prove the following theorem:
Assuming Conjecture 1, every boolean circuit with arbitrary gates and bounded in and out degrees that computes the product of two numbers given as two -bit strings has size .
In fact, we prove our lower bound for an even simpler problem than multiplication, namely the shift problem: In the shift problem, we are given an -bit string and an index . The goal is to construct a circuit that outputs the -bit string whose th bit equals the th bit of for every . Here we think of the index as being given in binary using bits. We prove the following result:
Assuming Conjecture 1, every boolean circuit with arbitrary gates and bounded in and out degrees that computes the shift problem has size .
Theorem 1 follows as a corollary of Theorem 2 by observing that shifting by positions is equivalent to multiplication by . Moreover, it is not hard to see that there is a linear sized circuit that has input gates and output gates, where on an index , it outputs the number in binary (i.e. a single -bit at position ).
We find it quite fascinating that even a simple instruction such as shifting requires circuits of size , at least if we believe Conjecture 1.
Valiant’s Depth Reduction and Circuit Complexity Lower Bounds.
In addition to our main lower bound results for multiplication, we also demonstrate that the network coding conjecture sheds new light on some fundamental conjectures by Valiant. In a 1977 survey Valiant [Val77] outlined potentially plausible attacks on the problem of proving a lower bound for the size of any circuit that can compute a permutation or even shifts of a given input. The goal was to prove that achieving both size and depth for such circuits is impossible. While most of his attacks were rebuffed due to existence of complex and highly connected graphs that only had edges (superconcentrators), Valiant outlined one last potential approach that could still be fruitful. His main brilliant idea was to start with a circuit of some depth and by applying graph theoretical approaches reducing the depth of the circuit while eliminating only a small number of edges. The hope was that information theoretical approaches could finish the job once the depth of the circuit was very low and once the (graph theoretical) complexity of the circuit was peeled away.
More formally, Valiant showed that for every circuit with input and output gates, of size , depth and fan-in , and for every , the function computed by can be computed by a boolean circuit with arbitrary gates of depth with input and output gates and extra nodes. Moreover, the number of input gates directly connected to an output gate is bounded. That is, if we denote the set of input and output gates by and respectively, then for every , there are at most wires connecting and .
In turn, this reduction shows that it is enough to prove lower bounds on such depth circuits. Almost 20 years later and based on these ideas, Valiant [Val92] put forward several conjectures that if resolved could open the way for proving circuit complexity lower bounds. Loosely speaking, Valiant conjectured that if then such depth circuits cannot compute cyclic-shift permutation. Before discussing Valiant’s conjectures more formally, we first state our second main result, which essentially shows that Conjecture 1 implies one of Valiant’s conjectures, albeit with a smaller (but still constant) bound on .
Let be a depth circuit that computes multiplication such that the following holds.
The number of gates in the second layer of is at most for ; and
for every output gate of , the number of input gates directly connected to is at most .
Then assuming Conjecture 1, .
Let be a bipartite graph on two independent sets and such that denotes a set of inputs and denotes a set of outputs. Furthermore assume, let be extra nodes and connect them by edges to all the nodes in . Denoting the resulting graph by consider all possible boolean circuits with arbitrary gates whose underlying topology is . We say such a circuit computes a permutation if for every assignment to the input gates, after the evaluation of the circuit is assigned for every . Valiant conjectured that this should be impossible if is too small or if has too few edges. In particular, he proposed the following.
If has maximum degree at most 3 and if , then there exists a permutation such that no circuit that has as its underlying topology can compute the permutation . Moreover, there exists such that is a cyclic shift.
Theorem 3 shows that conditioned on Conjecture 1, if then Valiant’s first conjecture holds. We note that our proof for Theorem 3 continues to hold even if the gates’ boolean functions are fixed after the shift offset is given. That is, if only the topology is fixed in advance. This coincides exactly with the formulation of Valiant’s conjecture. Valiant further conjectured the following.
If has at most edges for some constant , and if , then there exists a permutation such that no circuit that has as its underlying topology can compute the permutation . Moreover, there exists such that is a cyclic shift.
1.2 Related Work
Lower Bounds for Multiplication.
There are a number of previous lower bounds for multiplication in various restricted models of computation. Clifford and Jalsenius [CJ11] considered a streaming variant of multiplication, where one number is fixed and the other is revealed one digit at a time. They require that a digit of the output is reported before the next digit of the input is revealed. In this streaming setting, they prove an lower bound, where is the number of bits in a digit and is the word size. For and , this is . Ponzio [Pon98] considered multiplication via read-once branching programs, i.e. programs that have bounded working memory and may only read each input bit exactly once. He proved that any read-once branching program for computing the middle bit of the product of two -bit numbers, must use bits of working memory. Finally, we also mention the work of Morgenstern [Mor73] who proved lower bounds for computing the related FFT. Morgenstern proved an lower bound for computing the unnormalied FFT via an arithmetic circuit when all constants used in the circuit are bounded. Unfortunately this doesn’t say anything about the complexity of multiplying two -bit numbers.
Despite their importance, Valiant’s conjectures are still mostly open. One interesting development by Riis [Rii07], shows that Conjecture 3 as stated is incorrect. Riis proved that all cyclic shifts are realizable for where is the total number of edges of . Riis further conjectured that replacing the bound on by a slightly stricter bound should result in a correct conjecture. Specifically, Riis suggest bounding .
We now give a formal definition of Boolean circuits with arbitrary gates, followed by definitions of the -pairs communication problem, the multicommodity flow problem. In the two latter problems we reuse some of the definitions used by Farhadi et al. [FHLS19], which have been simplified a bit compared to the more general definition by Adler et al. [AHJ06]. In particular, we have forced communication networks to be directed acyclic graphs. This is sufficient to prove our lower bounds and simplifies the definitions considerably.
Boolean Circuits with Arbitrary Gates.
A Boolean Circuit with Arbitrary Gates with source or input nodes and target or output nodes is a directed acyclic graph with nodes of in-degree , which are called input gates, and are labeled with input variables and nodes out-degree , which are called output gates and are labeled with output variables . All other nodes are simply called gates. For every gate of in-degree , is labeled with an arbitrary function . The circuit is also equipped with a topological ordering of , in which for and for all . The depth of a circuit is the length of the longest path in . An evaluation of a circuit on an bit input is conducted as follows. For every , assign to . For every , assign to the value , where are the nodes of with edges going into in the order induced by the topological ordering. The output of on an bit input , denoted is the value assigned to in the evaluation. We say a circuit computes a function if for every , .
For every and , we hardwire for in by removing and all adjacent edges from , and replacing for in the evaluation of for every such that is an edge in .
-Pairs Communication Problem.
The input to the -pairs communication problem is a directed acyclic graph where each edge has a capacity . There are sources and sinks .
Each source receives a message from a predefined set of messages . It will be convenient to think of this message as arriving on an in-edge. Hence we add an extra node for each source, which has a single out-edge to . The edge has infinite capacity.
A network coding solution specifies for each edge an alphabet representing the set of possible messages that can be sent along the edge. For a node , define as the set of in-edges at . A network coding solution also specifies, for each edge , a function which determines the message to be sent along the edge as a function of all incoming messages at node . Finally, a network coding solution specifies for each sink a decoding function . The network coding solution is correct if, for all inputs , it holds that applied to the incoming messages at equals , i.e. each source must receive the intended message.
In an execution of a network coding solution, each of the extra nodes starts by transmitting the message to along the edge . Then, whenever a node has received a message along all incoming edges , it evaluates on all out-edges and forwards the message along the edge .
We define the rate of a network coding solution as follows: Let each source receive a uniform random and independently chosen message from . For each edge , let
denote the random variable giving the message sent on the edgewhen executing the network coding solution with the given inputs. The network coding solution achieves rate if:
for all .
For each edge , we have .
Here denotes binary Shannon entropy. The intuition is that the rate is , if the solution can handle sending a message of entropy bits between every source-sink pair.
A multicommodity flow problem in an undirected graph is specified by a set of source-sink pairs of nodes in . We say that is the source of commodity and is the sink of commodity . Each edge has an associated capacity . A (fractional) solution to the multicommodity flow problem specifies for each pair of nodes and commodity , a flow . Intuitively specifies how much of commodity that is to be sent from to . The flow satisfies flow conservation, meaning that:
For all nodes that is not a source or sink, we have .
For all sources , we have .
For all sinks we have .
The flow also satisfies that for any pair of nodes and commodity , there is only flow in one direction, i.e. either or . Furthermore, if is not an edge in , then . A solution to the multicommodity flow problem achieves a rate of if:
For all edges , we have .
Intuitively, the rate is if we can handle a demand of for every commodity.
The Undirected -Pairs Conjecture.
Conjecture 1 implies the following for our setting: Given an input to the -pairs communication problem, specified by a directed acyclic graph with edge capacities and a set of source-sink pairs, let be the best achievable network coding rate for . Similarly, let denote the undirected graph resulting from making each directed edge in undirected (and keeping the capacities and source-sink pairs). Let be the best achievable flow rate in . Conjecture 1 implies that .
Having defined coding rate and flow rate formally, we also mention that a result of Braverman et al. [BGS17] implies that if there exists a graph where the network coding rate , and the flow rate in the corresponding undirected graph , satisfies for a constant , then there exists an infinite family of graphs for which the corresponding gap is at least for a constant . So far, all evidence suggest that no such gap exists, as formalized in Conjecture 1.
3 Key Tools and Techniques
The main idea in the heart of both proofs is the simple fact that in a graph with vertices and maximum degree at most , most node pairs lie far away from one another. Specifically, for every node in , at least nodes have distance from . While this key observation is almost enough to prove Theorem 2, the proof of Theorem 3 requires a much more subtle approach, as there is no bound on the maximum degree in the circuits in question. The only bound we have is on the number of wires going directly between from input gates into output gates. Specifically, every two nodes in the underlying undirected graph are at distance (see figure 1).
In order to overcome this obstacle, we present a construction of a communication network based on the circuit that essentially eliminates the middle layer in the depth- circuit , thus leaving a bipartite graph with bounded maximum degree. To this end, we observe that since the size of the middle layer is bounded by , then there exists a large set of inputs in such that on all inputs from , the gates attain the same values. By hardwiring these values to the circuit, we can evaluate the circuit for all inputs in on a depth- circuit obtained from by removing . We next turn to construct the communication network. Employing ideas recently presented by Farhadi et al. [FHLS19], we "wrap" the depth- circuit by adding source and target nodes. In order to cope with inputs that do not belong to , we add a designated supervisor node (see figure 2). Loosely speaking, the source nodes transmit their input to , and sends back the information needed to "edit" the input string and construct an input string , which is then transferred to the circuit as blackbox.
The Correction Game.
In order to bound the edge capacities of the network in a way that the supervisor node can transmit enough information to achieve a high communication rate, but then again not allow to much flow to go through the supervisor when considering as a multicommodity flow instance, Farhadi et al. [FHLS19] defined a game between a set of players and a supervisor, where given a fixed set and a random string given as a concatenation of strings of length each, the goal is to "correct" and produce a string such that . The caveat is that the only communication allowed is between the players and the supervisor. That is, no communication, and thus no cooperation, is allowed between the players. Formally, the game is defined as follows.
Let . The -correction game with players is defined as follows. The game is played by ordinary players and one designated supervisor player . The supervisor receives strings chosen independently at random. For every , then sends a message . Given , the player produces a string such that .
Farhadi et al. additionally present a protocol for the -correction game in which the supervisor player sends prefix-free messages to the players, and moreover, they give a bound on the amount of communication needed as a function of the number of players and the size of .
Lemma 4 ([Fhls19]).
If , then there exists a protocol for the -correction game with players such that the messages are prefix-free and
4 A Lower Bound for Boolean Circuits Computing Multiplication
In this section we show that conditioned on Conjecture 1, every bounded degree circuit computing multiplication must have size at least , thus proving Theorems 1 and 2. In fact, we will prove something slightly stronger. Define the shift function as follows. For every and , where if and otherwise. We will show that every circuit with bounded in and out degrees that computes the shift function on -bit numbers has size . Clearly, a circuit that can compute the product of two -bit numbers can also compute the shift function. Let denote the maximum in and out degree in , and let . Then in the undirected graph induced by , there are at most nodes whose distance from is at most . Therefore among , at least are at distance at least . In other words, , where denotes the undirected graph induced by (by removing edge directions). Therefore there exists a shift such that .
Fixing , let consider the following communication problem. For each , and . The circuit equipped with -uniform edge capacities is a network coding solution to this problem with rate . By the undirected -pairs conjecture, there is a multicommodity flow in that transfers one unit of flow from each source to its corresponding sink. For every , let be the flow associated with commodity . Then
5 A Lower Bound for Depth Boolean Circuits Computing Multiplication
Let be a depth circuit that computes multiplication such that the number of gates in the second layer of is at most for some small and for every , , where once again denotes the undirected graph induced by , and is the subgraph of induced by . By slightly increasing and (by a small constant factor) and without loss of generality, we can assume that this applies for all as well.
Denote the input and output gates of by and respectively, and denote the set of the middle-layer gates by (see Figure 1).
As before, we focus on computing the shift function, thus limiting the input to to have exactly one -entry. We next partition into consecutive blocks of size bits each. For every let be the set of indices belonging to the th block.
For every and , we say is far from all targets (with respect to ) if for all sources in the block are at distance at least from all respective destinations in . That is for every , .
Let . By the constraint on the degrees, for every , there are at most nodes whose distance from is at most in . Therefore for every ,
By averaging we get that for large enough there is some such that there are at least blocks which are far from all targets. Without loss of generality, we may assume for ease of notation that . By hardwiring for into the circuit , the circuit now simply transfers to .
Reduction to Network Coding.
Let and . By slightly abusing notation, we denote the value of the gate when evaluating the circuit by . By averaging, there exist a string and a set such that and such that for every and , . By hardwiring for into the circuit , we get a new circuit denoted that contains only the input and output gates of , and transfers to for every . Moreover, the set of edges between and in is equal to the set of edges between and in .
Next, we construct a communication network by adding some nodes and edges to , as demonstrated also in Figure 2. We add a new set of nodes . For every , add edges and of capacity and edges and of capacity , where is the message sent to player by the supervisor player in the -correction game protocol for players guaranteed in Lemma 4. In addition, for every and every add edges and of capacity . All edges of are assigned capacity of .
In what follows, we will lower bound the communication rate of the newly constructed network .
There exists a network coding solution on that achieves rate .
To this end, let be independent uniform random variables. We next give a protocol by which the sources transmit to the targets . The protocol employs as a an intermediate step the correction game protocol guaranteed by Lemma 4.
For every , sends to over the edge and to over the edge .
Employing the -correction game protocol with players, for every , sends a message to over the edge and to over the edge . Following the correction game protocol, for every , given , and produce a string satisfying that .
For every and every , transmits the th bit of to the th gate in the th block, namely . Note that .
Next, the communication network employs the circuit and transmits to . For every and every , transmits to .
Finally, for every , now holds both and . Therefore can recover .
By invoking the protocol described above, every one of the sources sends bits to the corresponding target. For every edge , let denote the random variable giving the message sent on the edge when executing the protocol.
For every , .
First note that for every , every edge leaving has capacity and transmits . Therefore . Every edge that is not leaving any source nor has capacity and transmits exactly one bit (not necessarily uniformly random) of information. Therefore . Finally, let be an edge leaving . Then there exists some such that or . In both cases the message transmitted on is and the capacity of satisfies , where the last inequality follows from Shannon’s Source Coding theorem, as all messages are prefix-free. ∎
We can therefore conclude that the network achieves rate , and the proof of Lemma 5 is complete.
Deriving the Lower Bound.
By Conjecture 1, the underlying undirected graph achieves a multicommodity-flow rate . Therefore there exists a multicommodity flow that achieves rate . We first observe that at most a constant fraction of the flow can go through the supervisor node . To see this, we note that as , then by Lemma 4 the expected total information sent by the supervisor in the -correction game with players is at most
Therefore by the definition of the capacities we get that for small enough (constant) ,
Since achieves rate we conclude that
By the flow-conservation constraint, we know that therefore the total amount of flow that can go through is . By averaging, at least a fraction of the sources send at least units of flow through . By the choice of , in , at least a of the sources are at least away from their targets. Without loss of generality, assume these are the first sources. We conclude that
and therefore , and the proof of Theorem 3 is now complete.
5.1 Remarks and Extensions
For sake of fluency, some minor remarks and extensions were intentionally left out of the text, and will be discussed now.
Circuits with Bounded Average Degree.
Our results still hold if we relax the second requirement of Theorem 3 and require instead that the number of edges in is at most . That is, the average degree in is at most . To see this, note that under this assumption, there are at most gates in whose degree in is larger than . For each such gate , add a new node in the middle layer, and connect and all the neighbours of in to . Then delete all the edges adjacent to in . The number of nodes added to the middle layer is at most , and the degree of all nodes in is now bounded by . The rest of our proof continues as before.
Shifts vs. Cyclic Shifts.
In order to prove lower bounds for circuits computing multiplication, our results are stated in terms of shifts (which are a special case of products, as mentioned). This is in contrast to Valiant’s conjectures, which are stated in terms of cyclic shifts. However, we draw the readers attention to the fact that our proofs work for cyclic shifts as well. The exact same arguments apply, and the proofs remain unchanged.
- [AHJ06] M. Adler, N. J. A. Harvey, K. Jain, R. Kleinberg, and A. R. Lehman. On the capacity of information networks. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, pages 241–250. Society for Industrial and Applied Mathematics, 2006. Available from: http://dl.acm.org/citation.cfm?id=1109557.1109585.
- [BGS17] M. Braverman, S. Garg, and A. Schvartzman. Coding in undirected graphs is either very helpful or not helpful at all. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 18:1–18:18, 2017.
- [CJ11] R. Clifford and M. Jalsenius. Lower bounds for online integer multiplication and convolution in the cell-probe model. In Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part I, pages 593–604, 2011.
- [Coo66] S. A. Cook. On the minimum computation time of functions. PhD thesis, Harvard University, 1966.
- [Fü09] M. Fürer. Faster integer multiplication. SIAM Journal on Computing, 39(3):979–1005, 2009. doi:10.1137/070711761.
A. Farhadi, M. Hajiaghayi, K. G. Larsen, and E. Shi.
Lower bounds for external memory integer sorting via network coding.
Proceedings of the 52st Symposium on Theory of Computing, STOC 2019, 2019. To appear.
- [HvdH18] D. Harvey and J. van der Hoeven. Faster integer multiplication using short lattice vectors. CoRR, 2018. arXiv:1802.07932.
- [KO62] A. A. Karatsuba and Y. P. Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293–294, 1962.
- [LL04] Z. Li and B. Li. Network coding: The case of multiple unicast sessions. In Proceedings of the 42nd Annual Allerton Conference on Communication, Control, and Computing, 2004.
- [Mor73] J. Morgenstern. Note on a lower bound on the linear complexity of the fast Fourier transform. Journal of the ACM, 20(2):305–306, 1973. doi:10.1145/321752.321761.
- [Pon98] S. Ponzio. A lower bound for integer multiplication with read-once branching programs. SIAM J. Comput., 28(3):798–815, 1998.
- [Rii07] S. Riis. Information flows, graphs and their guessing numbers. The Electronic Journal of Combinatorics, 14(1), 2007.
- [SS71] A. Schönhage and V. Strassen. Schnelle multiplikation großer zahlen. Computing, 7(3):281–292, Sep 1971. doi:10.1007/BF02242355.
- [Too63] A. L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. Proceedings of the USSR Academy of Sciences, 150(3):496–498, 1963.
- [Val77] L. G. Valiant. Graph-theoretic arguments in low-level complexity. In Mathematical Foundations of Computer Science 1977, pages 162–176, 1977.
- [Val92] L. G. Valiant. Why is boolean complexity theory difficult? In Proceedings of the London Mathematical Society Symposium on Boolean Function Complexity, pages 84–94, 1992.