The unbounded-error communication complexity model was introduced by Paturi and Simon  as a natural communication analog of the Turing Machine complexity class . In a communication protocol for a Boolean function , there are two parties, one with input and one with input . The two parties engage in a private-coin randomized communication protocol, at the end of which they are required to output
with probability strictly greater than. The cost of the protocol is the number of bits exchanged by the two parties. As is standard, we use the notation not only to denote the communication model, but also the class of functions solvable in the model by protocols of cost polylogarithmic in the size of the input.
Observe that success probability can be achieved with no communication by random guessing, so the model merely requires a strict improvement over this trivial solution. Owing to this liberal acceptance criterion, is a very powerful communication model, essentially the most powerful one against which we know how to prove lower bounds. In particular, is powerful enough to simulate many other models of computing, and this makes lower bounds highly useful. As one example, any function computable by a Threshold-of-Majority circuit of size has complexity at most , and this connection has been used to translate lower bounds into state of the art lower bounds against threshold circuits (see, for example, [12, 26, 10, 32, 8]).
also happens to be characterized by a natural matrix-analytic complexity measure called sign-rank . Here, the sign-rank of a matrix is the minimum rank of a real matrix whose entries agree in sign with . Equivalently, , where the minimum is over all matrices such that for all . Paturi and Simon  showed the following tight connection between and sign-rank: if we associate a function with the matrix , then the communication complexity of equals .
While lower bounds on complexity (equivalently, sign-rank) are useful in complexity theory, upper bounds on these quantities imply state of the art learning algorithms, including the fastest known algorithms for PAC learning DNFs and read-once formulas [21, 3]. More specifically, suppose we want to learn a concept class of functions mapping to . is naturally associated with a matrix , whose th row equals the truth table of the th function in . Then can be distribution-independently PAC learned in time polynomial in the sign-rank of . (The sign-rank of is often referred to in the learning theory literature as the dimension complexity of .) Moreover, the resulting learning algorithm is robust to random classification noise, a property not satisfied by the handful of known PAC learning algorithms that are not based on dimension complexity.
For the purpose of our work, one particularly important application of the dimension-complexity approach to PAC learning was derived by Klivans et al. , who showed that the concept class consisting of intersections of 2 majority functions has dimension complexity at most . They thereby obtained a quasipolynomial time algorithm for PAC learning intersections of two majority functions.111 In fact, their algorithm runs in quasipolynomial time for intersections of polylogarithmic many majorities. Prior to our work, it was consistent with current knowledge that the dimension complexity of this concept class is in fact , which would yield a polynomial time PAC learning algorithm for intersections of constantly many majority functions.
1.1 Our Results
Despite considerable effort, progress on understanding sign-rank (equivalently, ) has been slow. Our lack of knowledge is highlighted via the following well-known open question (cf. Göös et al. ). Throughout, for any function , denotes the function on twice as many inputs obtained by evaluating on two disjoint inputs and outputting only if both copies of evaluate to , i.e., .
Is the class closed under intersection? In other words, suppose the function satisfies for some constant . Is there always some constant (which may depend on ) such that ? More generally and informally, if is “small”, does this imply any non-trivial upper bound on ?
Prior to our work, essentially nothing was known about Question 1. In particular, we are not aware of prior work ruling out the possibility that . On the other hand, for reasons that will become apparent in Section 1.2, there is good reason to suspect that there exists a function with , yet . While we do not obtain a full resolution of Question 1, we do show for the first time that complexity can increase significantly under intersection.
Babai, Frankl and Simon  observed that there are two natural communication complexity analogs of the Turing machine class , namely and . It is well known  that is closed under intersection. Our work can be viewed as a first step towards showing that, in contrast, is not closed under intersection.
There is a function such that , yet .
In fact, for each fixed , the function from Theorem 1.1 simply outputs the majority of some subset of the bits of . This yields the following corollary.
Let be the concept class in which each concept is the intersection of two majorities on bits. Then has dimension complexity .
Corollary 1.2 shows that the dimension complexity upper bound of Klivans et al.  is tight for intersections of two majorities, and new approaches will be needed to PAC learn this concept class in polynomial time. For context, we remark that learning intersections of majorities is a special case of the more general problem of learning intersections of many halfspaces.222A halfspace is any function of the form for some real numbers . The latter is a central and well-studied challenge in learning theory, as intersections of halfspaces are powerful enough to represent any convex set, and they contain many basic problems (like learning DNFs) as special cases. In contrast to the well-understood problem of learning a single halfspace, for which many efficient algorithms are known, no -time algorithm is known for PAC learning even the intersection of two halfspaces. There have been considerable efforts devoted to showing that learning intersections of halfspaces is a hard problem [22, 11, 19, 6], but these results apply only to intersections of many halfspaces, or make assumptions about the form of the output hypothesis of the learner. Our work can be seen as a new form of evidence that learning intersections of even two majorities is hard.
1.2 Our Techniques
has a query complexity analog, denoted and defined as follows. A algorithm is a randomized algorithm which on input , queries bits of , and must output with probability strictly greater than ; the cost of the protocol is the number of bits of queried. How behaves under intersection is now well understood. More specifically, it is known  that there is a function (in fact, a halfspace) such that , yet . Define the Majority function, which we denote by MAJ, to be if at least half of its input bits are . It is also known  that MAJ satisfies , yet . Our goal in this paper is, to the extent possible, to show that the communication model behaves similarly to its query complexity analog.
Over the course of the last decade, there has been considerable progress in proving lifting theorems [27, 16, 15]. These theorems seek to show that if a function has large complexity in some query model , then for some “sufficiently complicated” function on a “small” number of inputs, the composition has large complexity in the associated communication model (ideally, ).
Unfortunately, a “generic” lifting theorem for complexity is not known. That is, it is not know how to take an arbitrary function with high complexity, and by composing it with a function on a small number of inputs, yield a function with high complexity.
However, as we now explain, some significant partial results have been shown in this direction. It is well-known that is equivalent to an approximation-theoretic notion called threshold degree, denoted (see Appendix B.1 for the definition). The threshold degree of
can in turn be expressed as the value of a certain (exponentially large) linear program. Linear programming duality then implies that one can prove lower bounds onby exhibiting good solutions to the dual linear program. We refer to such dual solutions as dual witnesses for threshold degree. Sherstov  and Razborov and Sherstov  showed that if is large, and moreover this can be exhibited by a dual witness satisfying a certain smoothness condition, then there is a function defined on a constant number of inputs such that does have large complexity. Several recent works [8, 7, 9, 32] have managed to prove new lower bounds by constructing, for various functions , smooth dual witnesses exhibiting the fact that is large.
Our key technical contribution is to bring this approach to bear on the function . Specifically, we show that the (known) threshold degree lower bound can be exhibited by a smooth dual witness.
We do this as follows. Sherstov  showed that for any function , the threshold degree of the function is characterized by the rational approximate degree of , i.e., the least total degree of real polynomials and such that for all . He then showed that the rational approximate degree of MAJ is , thereby concluding that has threshold degree .
From Sherstov’s arguments, one can derive a dual witness for the fact that the rational approximate degree of MAJ is , and then transform into a dual witness for the fact that has threshold degree . Unfortunately, neither nor satisfies the type of smoothness condition required by Razborov and Sherstov’s machinery to yield lower bounds.
The smoothness condition required for the Razborov-Sherstov machinery to work essentially states that the the mass of the dual witness
has to be “relatively large” (a reasonably large fraction of what mass the uniform distribution would have placed) on a “large” set of inputs (the fraction of inputs which do not have large mass has to be small).
To construct a smooth dual witness for , our primary technical contribution is to construct a smooth dual witness for the fact that the rational approximate degree of MAJ is . We then apply a different transformation, due to Sherstov , of into a dual witness for the fact that the threshold degree of is , and we show that this transformation preserves the smoothness of .
In a nutshell, our smooth dual witness for MAJ is obtained in two steps: first we define for all inputs whose Hamming weight lies in , a dual witness that places a large mass on and not too much mass on other points. Next, we define the final dual witness to be a certain weighted average over of all the dual witnesses thus obtained. The resulting mass on for each of Hamming weight in is large enough, and the fraction of inputs whose Hamming weight is not in is small enough, to allow us to use the Razborov-Sherstov framework (Theorem 2.3) to prove the desired sign-rank lower bound on the pattern matrix of .
All logarithms in this paper are taken base 2. We use the notation to denote , where is Euler’s number. Given any finite set and any functions , define and . We refer to as the -norm of . For any , we use the notation to denote the Hamming weight of , which is the number of ’s in the string .
Paturi and Simon  showed the following equivalence between the sign-rank of a matrix and the cost of its corresponding communication game.
For any , let denote its communication matrix, defined by . Then, .
Let be positive integers such that divides . Partition the set into disjoint blocks . Define the set to be the collection of subsets of which contain exactly one element from each block. For and , let , where are the elements of .
Definition 2.2 (Pattern matrix).
For any function , the -pattern matrix is defined as follows.
Note that is a matrix.
In a breakthrough result, Forster  proved that an upper bound on the spectral norm of a sign matrix implies a lower bound on its sign-rank. Razborov and Sherstov  established a generalization of Forster’s theorem  that can be used to prove sign-rank lower bounds for pattern matrices. Specifically, we require the following result, implicit in their work [26, Theorem 1.1].
Theorem 2.3 (Implicit in ).
Let be any Boolean function and be a real number. Suppose there exists a function satisfying the following conditions.
For all polynomials of degree at most ,
for all but a fraction of inputs .
Then, the sign-rank of the -pattern matrix can be bounded below as
We require the following well-known combinatorial identity.
For every polynomial of degree less than , we have .
Recall from Section 1.2 that the rational -approximate degree of is the least degree of two polynomials and such that for all in the domain of . Sherstov [31, Theorem 6.9] showed that a dual witness to the rational approximate degree of any function can be converted to a threshold degree dual witness for . Implicit in his theorem is the fact that a smooth dual witness to the rational approximate degree of can be converted to a smooth dual witness for the threshold degree of . More precisely, the following result is established by the proof of [31, Theorem 6.9].333 In Theorem 2.5, the functions and together form a dual witness for the fact that the rational -approximate degree of is at least , while is a dual witness to the fact that . See Appendix B for details. However, we will not exploit this interpretation of Theorem 2.5 in our analysis.
Theorem 2.5 (Sherstov ).
Let be any function. Let denote , and be any real numbers.
Suppose there exist functions that are not identically 0 and satisfy the following properties:
Then there exist functions such that satisfies the following properties.
3 A Smooth Dual Witness for Majority
Our main technical contribution in this paper is captured in Theorem 3.1 below. This theorem constructs a smooth dual witness for the hardness of rationally approximating the sign function on (cf. Appendix B for details of this interpretation of Theorem 3.1). We defer the proof until Section 4.
Let and let be odd. There exists a function
be odd. There exists a functionsuch that
For and every ,
If is any polynomial of degree less than , then
For every we have
The following theorem shows how to convert the (univariate) function from Theorem 3.1 into a dual witness for the (multivariate) MAJ function.
Let and let be odd. Let be any function obtained in Theorem 3.1. Then, the multivariate polynomial defined by satisfies the following properties.
For and every ,
for any such that .
For any polynomial of degree at most ,
For all such that ,
To establish Equation (3.2), observe:
To establish Equation (14), consider any polynomial of degree at most . For any permutation , define the polynomial by . Note that, since is symmetric, for all . Define . Note that is symmetric and . It is a well-known fact (cf. ) that can be written as a polynomial of degree at most in the variable , and so can . Hence, where the final equality holds by Equation (10).
We are ready to derive a lower bound on the sign-rank of the -pattern matrix.
The -pattern matrix satisfies
Let denote the function in this proof. Set and consider the function obtained via Theorem 3.1. Define the function by . Define the functions by , and . We now verify that satisfy the conditions in Theorem 2.5 for . Set , where is a constant such that .
By Equation (4), . Since , this implies that
By Equation (5), for all .
By Equation (7), we have
which is at most a constant, since .
Combined with the fact that is a constant, we conclude .
By Equation (5), . This implies that for ,
By a standard Chernoff bound, the number of inputs in such that is at least .
Plugging and into Theorem 2.3, we conclude that the sign-rank of the pattern matrix is bounded below as
We are now ready to prove Theorem 1.1.
Proof of Theorem 1.1.
Note that the function . Consider the dual witness obtained for the threshold degree of in the previous proof. Note that the function defined by acts as a dual witness for the threshold degree of , and satisfies all the conditions in Theorem 2.3 with the same parameters as in the proof of Theorem 3.3. Proceeding in exactly the same way as in the previous proof, we conclude that sign-rank of the pattern matrix is bounded below as
Denote by the communication game corresponding to the pattern matrix. For completeness, we now sketch a standard protocol of cost for . Note that Alice holds input bits, and Bob holds a -bit string indicating the “relevant bits” in each block of Alice’s input and a -bit string . Bob sends Alice the index of a uniformly random relevant bit using bits of communication. Alice responds with her value of that input bit, and Bob outputs . It is easy to check that this is a valid protocol, and it has cost .
Corollary 1.2 follows immediately from the previous proof and the definition of pattern matrices.
4 Proof of Theorem 3.1
The rest of this paper is dedicated towards proving Theorem 3.1. Before proving the theorem, we describe the main auxiliary construction and prove some preliminary facts about it.
Let . Fix any . Define the set
Define the polynomial by
Since is odd, notice that , for , and for .
The following claim tells us that for any , the function places a reasonably large mass on input .
|(pairing terms corresponding to and )|
The next claim tells us that the mass placed by on other points in its support is small.
For every ,
|(pairing terms corresponding to and )|