1 Introduction
The unboundederror communication complexity model was introduced by Paturi and Simon [25] as a natural communication analog of the Turing Machine complexity class . In a communication protocol for a Boolean function , there are two parties, one with input and one with input . The two parties engage in a privatecoin randomized communication protocol, at the end of which they are required to output
with probability strictly greater than
. The cost of the protocol is the number of bits exchanged by the two parties. As is standard, we use the notation not only to denote the communication model, but also the class of functions solvable in the model by protocols of cost polylogarithmic in the size of the input.Observe that success probability can be achieved with no communication by random guessing, so the model merely requires a strict improvement over this trivial solution. Owing to this liberal acceptance criterion, is a very powerful communication model, essentially the most powerful one against which we know how to prove lower bounds. In particular, is powerful enough to simulate many other models of computing, and this makes lower bounds highly useful. As one example, any function computable by a ThresholdofMajority circuit of size has complexity at most , and this connection has been used to translate lower bounds into state of the art lower bounds against threshold circuits (see, for example, [12, 26, 10, 32, 8]).
also happens to be characterized by a natural matrixanalytic complexity measure called signrank [25]. Here, the signrank of a matrix is the minimum rank of a real matrix whose entries agree in sign with . Equivalently, , where the minimum is over all matrices such that for all . Paturi and Simon [25] showed the following tight connection between and signrank: if we associate a function with the matrix , then the communication complexity of equals .
While lower bounds on complexity (equivalently, signrank) are useful in complexity theory, upper bounds on these quantities imply state of the art learning algorithms, including the fastest known algorithms for PAC learning DNFs and readonce formulas [21, 3]. More specifically, suppose we want to learn a concept class of functions mapping to . is naturally associated with a matrix , whose th row equals the truth table of the th function in . Then can be distributionindependently PAC learned in time polynomial in the signrank of . (The signrank of is often referred to in the learning theory literature as the dimension complexity of .) Moreover, the resulting learning algorithm is robust to random classification noise, a property not satisfied by the handful of known PAC learning algorithms that are not based on dimension complexity.
For the purpose of our work, one particularly important application of the dimensioncomplexity approach to PAC learning was derived by Klivans et al. [20], who showed that the concept class consisting of intersections of 2 majority functions has dimension complexity at most . They thereby obtained a quasipolynomial time algorithm for PAC learning intersections of two majority functions.^{1}^{1}1 In fact, their algorithm runs in quasipolynomial time for intersections of polylogarithmic many majorities. Prior to our work, it was consistent with current knowledge that the dimension complexity of this concept class is in fact , which would yield a polynomial time PAC learning algorithm for intersections of constantly many majority functions.
1.1 Our Results
Despite considerable effort, progress on understanding signrank (equivalently, ) has been slow. Our lack of knowledge is highlighted via the following wellknown open question (cf. Göös et al. [17]). Throughout, for any function , denotes the function on twice as many inputs obtained by evaluating on two disjoint inputs and outputting only if both copies of evaluate to , i.e., .
Question 1.
Is the class closed under intersection? In other words, suppose the function satisfies for some constant . Is there always some constant (which may depend on ) such that ? More generally and informally, if is “small”, does this imply any nontrivial upper bound on ?
Prior to our work, essentially nothing was known about Question 1. In particular, we are not aware of prior work ruling out the possibility that . On the other hand, for reasons that will become apparent in Section 1.2, there is good reason to suspect that there exists a function with , yet . While we do not obtain a full resolution of Question 1, we do show for the first time that complexity can increase significantly under intersection.
Babai, Frankl and Simon [4] observed that there are two natural communication complexity analogs of the Turing machine class , namely and . It is well known [5] that is closed under intersection. Our work can be viewed as a first step towards showing that, in contrast, is not closed under intersection.
Theorem 1.1.
There is a function such that , yet .
In fact, for each fixed , the function from Theorem 1.1 simply outputs the majority of some subset of the bits of . This yields the following corollary.
Corollary 1.2.
Let be the concept class in which each concept is the intersection of two majorities on bits. Then has dimension complexity .
Corollary 1.2 shows that the dimension complexity upper bound of Klivans et al. [20] is tight for intersections of two majorities, and new approaches will be needed to PAC learn this concept class in polynomial time. For context, we remark that learning intersections of majorities is a special case of the more general problem of learning intersections of many halfspaces.^{2}^{2}2A halfspace is any function of the form for some real numbers . The latter is a central and wellstudied challenge in learning theory, as intersections of halfspaces are powerful enough to represent any convex set, and they contain many basic problems (like learning DNFs) as special cases. In contrast to the wellunderstood problem of learning a single halfspace, for which many efficient algorithms are known, no time algorithm is known for PAC learning even the intersection of two halfspaces. There have been considerable efforts devoted to showing that learning intersections of halfspaces is a hard problem [22, 11, 19, 6], but these results apply only to intersections of many halfspaces, or make assumptions about the form of the output hypothesis of the learner. Our work can be seen as a new form of evidence that learning intersections of even two majorities is hard.
1.2 Our Techniques
has a query complexity analog, denoted and defined as follows. A algorithm is a randomized algorithm which on input , queries bits of , and must output with probability strictly greater than ; the cost of the protocol is the number of bits of queried. How behaves under intersection is now well understood. More specifically, it is known [30] that there is a function (in fact, a halfspace) such that , yet . Define the Majority function, which we denote by MAJ, to be if at least half of its input bits are . It is also known [29] that MAJ satisfies , yet . Our goal in this paper is, to the extent possible, to show that the communication model behaves similarly to its query complexity analog.
Over the course of the last decade, there has been considerable progress in proving lifting theorems [27, 16, 15]. These theorems seek to show that if a function has large complexity in some query model , then for some “sufficiently complicated” function on a “small” number of inputs, the composition has large complexity in the associated communication model (ideally, ).
Unfortunately, a “generic” lifting theorem for complexity is not known. That is, it is not know how to take an arbitrary function with high complexity, and by composing it with a function on a small number of inputs, yield a function with high complexity.
However, as we now explain, some significant partial results have been shown in this direction. It is wellknown that is equivalent to an approximationtheoretic notion called threshold degree, denoted (see Appendix B.1 for the definition). The threshold degree of
can in turn be expressed as the value of a certain (exponentially large) linear program. Linear programming duality then implies that one can prove lower bounds on
by exhibiting good solutions to the dual linear program. We refer to such dual solutions as dual witnesses for threshold degree. Sherstov [28] and Razborov and Sherstov [26] showed that if is large, and moreover this can be exhibited by a dual witness satisfying a certain smoothness condition, then there is a function defined on a constant number of inputs such that does have large complexity. Several recent works [8, 7, 9, 32] have managed to prove new lower bounds by constructing, for various functions , smooth dual witnesses exhibiting the fact that is large.Our key technical contribution is to bring this approach to bear on the function . Specifically, we show that the (known) threshold degree lower bound can be exhibited by a smooth dual witness.
We do this as follows. Sherstov [29] showed that for any function , the threshold degree of the function is characterized by the rational approximate degree of , i.e., the least total degree of real polynomials and such that for all . He then showed that the rational approximate degree of MAJ is , thereby concluding that has threshold degree .
From Sherstov’s arguments, one can derive a dual witness for the fact that the rational approximate degree of MAJ is , and then transform into a dual witness for the fact that has threshold degree . Unfortunately, neither nor satisfies the type of smoothness condition required by Razborov and Sherstov’s machinery to yield lower bounds.
The smoothness condition required for the RazborovSherstov machinery to work essentially states that the the mass of the dual witness
has to be “relatively large” (a reasonably large fraction of what mass the uniform distribution would have placed) on a “large” set of inputs (the fraction of inputs which do not have large mass has to be small).
To construct a smooth dual witness for , our primary technical contribution is to construct a smooth dual witness for the fact that the rational approximate degree of MAJ is . We then apply a different transformation, due to Sherstov [31], of into a dual witness for the fact that the threshold degree of is , and we show that this transformation preserves the smoothness of .
In a nutshell, our smooth dual witness for MAJ is obtained in two steps: first we define for all inputs whose Hamming weight lies in , a dual witness that places a large mass on and not too much mass on other points. Next, we define the final dual witness to be a certain weighted average over of all the dual witnesses thus obtained. The resulting mass on for each of Hamming weight in is large enough, and the fraction of inputs whose Hamming weight is not in is small enough, to allow us to use the RazborovSherstov framework (Theorem 2.3) to prove the desired signrank lower bound on the pattern matrix of .
2 Preliminaries
All logarithms in this paper are taken base 2. We use the notation to denote , where is Euler’s number. Given any finite set and any functions , define and . We refer to as the norm of . For any , we use the notation to denote the Hamming weight of , which is the number of ’s in the string .
Paturi and Simon [25] showed the following equivalence between the signrank of a matrix and the cost of its corresponding communication game.
Theorem 2.1.
For any , let denote its communication matrix, defined by . Then, .
Let be positive integers such that divides . Partition the set into disjoint blocks . Define the set to be the collection of subsets of which contain exactly one element from each block. For and , let , where are the elements of .
Definition 2.2 (Pattern matrix).
For any function , the pattern matrix is defined as follows.
Note that is a matrix.
In a breakthrough result, Forster [12] proved that an upper bound on the spectral norm of a sign matrix implies a lower bound on its signrank. Razborov and Sherstov [26] established a generalization of Forster’s theorem [12] that can be used to prove signrank lower bounds for pattern matrices. Specifically, we require the following result, implicit in their work [26, Theorem 1.1].
Theorem 2.3 (Implicit in [26]).
Let be any Boolean function and be a real number. Suppose there exists a function satisfying the following conditions.


For all polynomials of degree at most ,


for all but a fraction of inputs .
Then, the signrank of the pattern matrix can be bounded below as
We require the following wellknown combinatorial identity.
Claim 2.4.
For every polynomial of degree less than , we have .
Recall from Section 1.2 that the rational approximate degree of is the least degree of two polynomials and such that for all in the domain of . Sherstov [31, Theorem 6.9] showed that a dual witness to the rational approximate degree of any function can be converted to a threshold degree dual witness for . Implicit in his theorem is the fact that a smooth dual witness to the rational approximate degree of can be converted to a smooth dual witness for the threshold degree of . More precisely, the following result is established by the proof of [31, Theorem 6.9].^{3}^{3}3 In Theorem 2.5, the functions and together form a dual witness for the fact that the rational approximate degree of is at least , while is a dual witness to the fact that . See Appendix B for details. However, we will not exploit this interpretation of Theorem 2.5 in our analysis.
Theorem 2.5 (Sherstov [31]).
Let be any function. Let denote , and be any real numbers.
Suppose there exist functions that are not identically 0 and satisfy the following properties:
(1)  
(2)  
(3) 
Then there exist functions such that satisfies the following properties.
(4)  
(5)  
(6)  
(7) 
3 A Smooth Dual Witness for Majority
Our main technical contribution in this paper is captured in Theorem 3.1 below. This theorem constructs a smooth dual witness for the hardness of rationally approximating the sign function on (cf. Appendix B for details of this interpretation of Theorem 3.1). We defer the proof until Section 4.
Theorem 3.1.
Let and let
be odd. There exists a function
such that
(8)

For and every ,
(9) 
If is any polynomial of degree less than , then
(10) 
For every we have
(11)
The following theorem shows how to convert the (univariate) function from Theorem 3.1 into a dual witness for the (multivariate) MAJ function.
Theorem 3.2.
Let and let be odd. Let be any function obtained in Theorem 3.1. Then, the multivariate polynomial defined by satisfies the following properties.

(12)

For and every ,
(13) for any such that .

For any polynomial of degree at most ,
(14) 
For all such that ,
(15)
Proof.
To establish Equation (3.2), observe:
where the last equality follows from Equation (3.1). Equation (13) follows directly from Equation (9) and the definition of .
To establish Equation (14), consider any polynomial of degree at most . For any permutation , define the polynomial by . Note that, since is symmetric, for all . Define . Note that is symmetric and . It is a wellknown fact (cf. [24]) that can be written as a polynomial of degree at most in the variable , and so can . Hence, where the final equality holds by Equation (10).
We are ready to derive a lower bound on the signrank of the pattern matrix.
Theorem 3.3.
The pattern matrix satisfies
Proof.
Let denote the function in this proof. Set and consider the function obtained via Theorem 3.1. Define the function by . Define the functions by , and . We now verify that satisfy the conditions in Theorem 2.5 for . Set , where is a constant such that .
Moreover, Equation (15) implies that for all such that , and Equation (3.2) implies . Theorem 2.5 now implies the existence of a function satisfying the following properties.

By Equation (4), . Since , this implies that

By Equation (5), for all .

We now note that the functions and obtained in Theorem 2.5 have norm at most a constant. Since , we use Equation (6) to conclude that
By Equation (7), we have
which is at most a constant, since .
Combined with the fact that is a constant, we conclude .

By a standard Chernoff bound, the number of inputs in such that is at least .
Plugging and into Theorem 2.3, we conclude that the signrank of the pattern matrix is bounded below as
∎
We are now ready to prove Theorem 1.1.
Proof of Theorem 1.1.
Note that the function . Consider the dual witness obtained for the threshold degree of in the previous proof. Note that the function defined by acts as a dual witness for the threshold degree of , and satisfies all the conditions in Theorem 2.3 with the same parameters as in the proof of Theorem 3.3. Proceeding in exactly the same way as in the previous proof, we conclude that signrank of the pattern matrix is bounded below as
(16) 
Denote by the communication game corresponding to the pattern matrix. For completeness, we now sketch a standard protocol of cost for . Note that Alice holds input bits, and Bob holds a bit string indicating the “relevant bits” in each block of Alice’s input and a bit string . Bob sends Alice the index of a uniformly random relevant bit using bits of communication. Alice responds with her value of that input bit, and Bob outputs . It is easy to check that this is a valid protocol, and it has cost .
Corollary 1.2 follows immediately from the previous proof and the definition of pattern matrices.
4 Proof of Theorem 3.1
The rest of this paper is dedicated towards proving Theorem 3.1. Before proving the theorem, we describe the main auxiliary construction and prove some preliminary facts about it.
Let . Fix any . Define the set
Define the polynomial by
Since is odd, notice that , for , and for .
Define
The following claim tells us that for any , the function places a reasonably large mass on input .
Claim 4.1.
Proof.
We calculate
(pairing terms corresponding to and )  
∎
The next claim tells us that the mass placed by on other points in its support is small.
Claim 4.2.
For every ,
Proof.
We calculate
(pairing terms corresponding to and )  
Comments
There are no comments yet.