 # Algorithms and Lower Bounds for de Morgan Formulas of Low-Communication Leaf Gates

The class FORMULA[s] ∘𝒢 consists of Boolean functions computable by size-s de Morgan formulas whose leaves are any Boolean functions from a class 𝒢. We give lower bounds and (SAT, Learning, and PRG) algorithms for FORMULA[n^1.99]∘𝒢, for classes 𝒢 of functions with low communication complexity. Let R^(k)(𝒢) be the maximum k-party NOF randomized communication complexity of 𝒢. We show: (1) The Generalized Inner Product function GIP^k_n cannot be computed in FORMULA[s]∘𝒢 on more than 1/2+ε fraction of inputs for s = o ( n^2/(k · 4^k ·R^(k)(𝒢) ·log (n/ε) ·log(1/ε) )^2). As a corollary, we get an average-case lower bound for GIP^k_n against FORMULA[n^1.99]∘ PTF^k-1. (2) There is a PRG of seed length n/2 + O(√(s)· R^(2)(𝒢) ·log(s/ε) ·log (1/ε) ) that ε-fools FORMULA[s] ∘𝒢. For FORMULA[s] ∘ LTF, we get the better seed length O(n^1/2· s^1/4·log(n)·log(n/ε)). This gives the first non-trivial PRG (with seed length o(n)) for intersections of n half-spaces in the regime where ε≤ 1/n. (3) There is a randomized 2^n-t-time #SAT algorithm for FORMULA[s] ∘𝒢, where t=Ω(n/√(s)·log^2(s)· R^(2)(𝒢))^1/2. In particular, this implies a nontrivial #SAT algorithm for FORMULA[n^1.99]∘ LTF. (4) The Minimum Circuit Size Problem is not in FORMULA[n^1.99]∘ XOR. On the algorithmic side, we show that FORMULA[n^1.99] ∘ XOR can be PAC-learned in time 2^O(n/log n).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A (de Morgan) Boolean formula over -valued input variables is a binary tree whose internal nodes are labelled by AND or OR gates, and whose leaves are marked with a variable or its negation. The power of Boolean formulas has been intensively investigated since the early years of complexity theory (see, e.g., [Sub61, Nec66, Khr71, And87, PZ93, IN93, Hås98, Tal14, DM18]). The techniques underlying these complexity-theoretic results have also enabled algorithmic developments. These include learning algorithms [Rei11b, ST17], satisfiability algorithms (cf. [Tal15]), compression algorithms [CKK15], and the construction of pseudorandom generators [IMZ12] for Boolean formulas of different sizes. But despite many decades of research, the current non-trivial algorithms and lower bounds apply only to formulas of less than cubic size, and understanding larger formulas remains a major open problem in circuit complexity.

In many scenarios, however, understanding smaller formulas whose leaves are replaced by certain functions would also be very useful. Motivated by several recent works, we initiate a systematic study of the model, i.e., Boolean formulas whose leaves are labelled by an arbitrary function from a fixed class . This model unifies and generalizes a variety of models that have been previously studied in the literature:

• Oliveira, Pich, and Santhanam [OPS19] show that obtaining a refined understanding of formulas of size over parity (XOR) gates would have significant consequences in complexity theory. Note that de Morgan formulas of size can simulate such devices. Therefore, a better understanding of the model even when is necessary before we are able to analyze super-cubic size formulas.111We remark that even a single layer of gates can compute powerful primitives, such as error-correcting codes and hash functions.

• Tal [Tal17] obtains almost quadratic lower bounds for the model of bipartite formulas, where there is a fixed partition of the input variables into and , and a formula leaf can compute an arbitrary function over either or . This model was originally investigated by Pudlák, Rödl, and Savický [PRS88], where it was referred to as graph complexity. The model is also equivalent to PSPACE-protocols in communication complexity (cf. [GPW18]).

• Abboud and Bringmann [AB18] consider formulas where the leaves are threshold gates whose input wires can be arbitrary functions applied to either the first or the second half of the input. This extension of bipartite formulas is denoted by in [AB18]. Their work establishes connections between faster -SAT algorithms, the complexity of problems in P such as Longest Common Subsequence and the Fréchet Distance Problem, and circuit lower bounds.

• Polytopes (i.e. intersection of half-spaces), which corresponds to being the family of linear-threshold functions, and the formula contains only AND gates as internal gates. The constructing of PRGs for this model has received significant attention in the literature (see [OST19] and references therein).

We obtain in a unified way several new results for the model, for natural classes of functions which include parities, linear (and polynomial) threshold functions, and indeed many other functions of interest. In particular, we show that this perspective leads to stronger lower bounds, general satisfiability algorithms, and better pseudorandom generators for a broad class of functions.

### 1.1 Results

We now describe in detail our main results and how they contrast to previous works. Our techniques will be discussed in Section 1.2, while a few open problems are mentioned in Section 1.3.

We let denote the set of Boolean functions computed by formulas containing at most leaves, where each leaf computes according to some function in . The set of parity functions and their negations will be denoted by .

We use the following notation for communication complexity. For a Boolean function , we let be the two-party deterministic communication complexity of , where each party is given an input of bits. Similarly, for a Boolean function , we denote by the communication cost of the best -party number-on-forehead (NOF) communication protocol that computes

with probability at least

on every input, where the probability is taken over the random choices of the protocol. For simplicity, we might omit the superscript from when . One of our results will also consider -party number-in-hand (NIH) protocols, and this will be clearly indicated in order to avoid confusion. We always assume a canonical partition of the input coordinates in all statements involving -party communication complexity, unless stated otherwise. We generalize these definitions for a class of functions in the natural way. For instance, we let .

Our results refer to standard notions in the literature, but in order to fix notation, Section 2 formally defines communication protocols, Boolean formulas, and other notions relevant in this work. We refer to the textbooks [KN97] and [Juk12] for more information about communication complexity and Boolean formulas, respectively. To put our results into context, here we only briefly review a few known upper bounds on the communication complexity of certain classes .

##### Parities (XOR) and Bipartite Formulas.

Clearly, the deterministic two-party communication complexity of any parity function is at most , since to agree on the output it is enough for the players to exchange the parity of their relevant input bits. Moreover, note that the bipartite formula model discussed above precisely corresponds to formulas whose leaves are computed by a two-party protocol of communication cost at most .

##### Halfspaces and Polynomial Threshold Functions (PTFs).

Recall that a halfspace, also known as a Linear Threshold Function (LTF), is a Boolean function of the form , where each and , and that a degree- PTF is its natural generalization where degree- monomials are allowed. It is known that if is a halfspace, then its randomized two-party communication complexity, namely , satisfies [Nis94]. On the other hand, if is a degree- PTF, then [Nis94, Vio15].

##### Degree-d Polynomials over GF(2).

It is well known that a degree- -polynomial admits a -party deterministic protocol of communication cost under any variable partition, since in the number-on-forehead model each monomial is entirely seen by some player. In particular, the Inner Product function satisfies .

#### 1.1.1 Lower bounds

Prior to this work, the only known lower bound against or bipartite formulas was the recent result of [Tal17] showing that is hard (even on average) against nearly sub-quadratic formulas. In contrast, we obtain a significantly stronger result and establish lower bounds for different Boolean functions. We define such functions next.

##### GIPkn.

The Generalized Inner Product function is defined as

 GIPkn(x(1),x(2),…,x(k))=n/k∑j=1k⋀i=1x(i)j (mod2),

where for each .

##### MKtP.

In the Minimum Kt Problem, where refers to Levin’s time-bounded Kolmogorov complexity222For a string , denotes the minimum value taken over and , where is a machine that prints when it computes for steps, and is the description length of according to a fixed universal machine ., we are given a string and a string . We accept if and only if .

##### MCSP.

In the Minimum Circuit Size Problem, we are given as input the description of a Boolean function (represented as an -bit string), and a string . We accept if and only the circuit complexity of is at most .

###### Theorem 1 (Lower bounds).

The following unconditional lower bounds hold:

• If is

-close under the uniform distribution to a function in

, then

 s=Ω⎛⎜⎝n2k2⋅16k⋅(R(k)ε/(2n2)(G)+logn)2⋅log2(1/ε)⎞⎟⎠.
• If , then

 s=˜Ω⎛⎜⎝n2k2⋅16k⋅R(k)1/3(G)⎞⎟⎠.
• If , then , where hides inverse factors.

Observe that, while [Tal17] showed that the Inner Product function is hard against sub-quadratic bipartite formulas, Theorem 1 Item 1 yields lower bounds against formulas whose leaves can compute bounded-degree PTFs and -polynomials, including . PTF circuits were previously studied by Nisan [Nis94], who obtained an almost linear gate complexity lower bound against circuits with degree- PTF gates. Recently, [KKL17] gave a super-linear wire complexity lower bound for constant-depth circuits with constant-degree PTF gates. However, it was open whether we can prove lower bounds against any circuit model that can incorporate a linear number of PTF gates. In fact, it was open before this work to show a super-linear gate complexity lower bound against .

Let us now comment on the relevance of Items 2 and 3. Both and are believed to be computationally much harder than . However, it is more difficult to analyze these problems compared to because the latter is mathematically “structured,” while the former problems do not seem to be susceptible to typical algebraic, combinatorial, and analytic techniques.

More interestingly, and play an important role in the theory of hardness magnification (see [OPS19, CJW19]). In particular, if one could show that restricted to an input parameter is not in for some , then it would follow that cannot be computed by Boolean formulas of size , where is arbitrary. Theorem 1 makes partial progress on this direction by establishing the first lower bounds for these problems in the model. (We note that the proof of Theorem 1 Item 3 requires instances where the parameter is .)

#### 1.1.2 Pseudorandom generators

We also get pseudorandom generators (PRGs) against for various classes of functions . Recall that a PRG against a class of functions is a function mapping short Boolean strings (seeds) to longer Boolean strings, so that every function in accepts ’s output on a uniformly random seed with about the same probability as that for an actual uniformly random string. More formally, is a PRG that -fools if for every Boolean function in , we have

 ∣∣∣Prz∈{0,1}ℓ[h(G(z))=1]−Prx∈{0,1}n[h(x)=1]∣∣∣≤ε.

Furthermore, we require to run in deterministic time on an input string . The parameter is called the seed length of the PRG and is the main quantity to be minimized when constructing PRGs.

There exists a PRG that fools formulas of size and that has a seed of length  [IMZ12]. In particular, there are non-trivial PRGs for -variate formulas of size nearly . Unfortunately, such PRGs cannot be used to fool even linear size formulas over parity functions, since the naive simulation of these enhanced formulas by standard Boolean formulas requires size . Moreover, it is not hard to see that this simulation is optimal: Andreev’s function, which is hard against formulas of nearly cubic size (cf. [Hås98]), can be easily computed in . Given that a crucial idea in the construction of the PRG in [IMZ12] (shrinkage under restrictions) comes from this lower bound proof, new techniques are needed in order to approach the problem in the model.

More generally, extending a computational model for which strong PRGs are known to allow parities at the bottom layer can cause significant difficulties. A well-known example is circuits and their extension to -. While the former class admits PRGs of poly-logarithmic seed length (see e.g. [ST19]), the most efficient PRG construction for the latter has seed length [FSUV13]. Consequently, designing PRGs of seed length can already be a challenge. We are not aware of previous results on PRGs for for any non-trivial class .

By combining ideas from circuit complexity and communication complexity, we construct PRGs of various seed lengths for , where ranges from the class of parity functions to the much larger class of functions of bounded randomized -party communication complexity.

###### Theorem 2 (Pseudorandom generators).

Let be a class of -bits functions. Then,

• In the context of parity functions, there is a PRG that -fools of seed length

 ℓ=O(√s⋅log(s)⋅log(1/ε)+log(n)).
• In the context of two-party randomized communication complexity, there is a PRG that -fools of seed length

 ℓ=n/2+O(√s⋅(R(2)ε/(6s)(G)+log(s))⋅log(1/ε)).

More generally, for every , let be the class of functions that have -party number-in-hand (NIH) -error randomized communication protocols of cost at most . There exists a PRG that -fools with seed length

 ℓ=n/k+O(√s⋅(R(k-NIH)ε/(6s)+log(s))⋅log(1/ε)+log(k))⋅log(k).
• In the setting of -party NOF randomized communication complexity, there is a PRG that -fools of seed length

 ℓ=n−nO(√s⋅k⋅4k⋅(R(k)ε/(2s)(G)+log(n))⋅log(n/ε)).

A few comments are in order. Under a standard connection between PRGs and lower bounds (see e.g. [Kab02]), improving the dependence on in the seed length for (Theorem 2 Item 1) would require the proof of super-quadratic lower bounds against . We discuss this problem in more detail in Section 1.3. Note that the additive term is necessary in Theorem 2 Item 2, since the model computes in particular every Boolean function on the first input variables (i.e. a protocol of communication cost ). Similarly, in Theorem 2 Item 3. Removing the exponential dependence on would also require advances in state-of-the-art lower bounds for multiparty communication complexity.

Theorem 2 Item 2 has an interesting implication for fooling a well-studied class of functions: intersections of halfspaces.333Clearly, the intersection of functions can be computed by an enhanced formula of size . Note that an intersection of halfspaces is precisely a polytope, or equivalently, the set of solutions of a -

integer linear program. Such objects have found applications in many fields, including optimization and high-dimensional geometry. After a long sequence of works on the construction of PRGs for bounded-weight halfspaces, (unrestricted) halfspaces, and generalizations of these classes,

444We refer to the recent reference [OST19] for an extensive review of the literature in this area. the following results are known for the intersection of halfspaces over input variables. Gopalan, O’Donnell, Wu, and Zuckerman [GOWZ10] gave a PRG for this class for error with seed length

 O(m⋅log(m/ε)+logn)⋅log(m/ε)).

Note that the seed length of their PRG becomes trivial if the number of halfspaces is linear in . More recently, O’Donnell, Servedio and Tan [OST19] constructed a PRG with seed length

 \poly(log(m),1/ε)⋅log(n).

Their PRG has a much better dependence on , but it cannot be used in the small error regime. For example, the seed length becomes trivial if . In particular, before this work it was open to construct a non-trivial PRG for the following natural setting of parameters (cf. [OST19, Section 1.2]): intersection of halfspaces with error .

We obtain the following consequence of Theorem 2 Item 2, which follows from a result of Viola [Vio15] on the -party number-in-hand randomized communication complexity of a halfspace.

###### Corollary 3 (Fooling intersections of halfspaces in the low-error regime).

For every and , there is a pseudorandom generator with seed length

 O(n1/2⋅m1/4⋅log(n)⋅log(n/ε)).

that -fools the class of intersections of halfspaces over .

We note that the PRG from Theorem 2 Item 3 can fool, even in the exponentially small error regime, not only intersections of halfspaces, but also small formulas over bounded-degree PTFs.

Finally, Theorem 2 Item 2 yields the first non-trivial PRG for formulas over symmetric functions. Let denote the class of symmetric Boolean functions on any number of input variables.

###### Corollary 4 (Fooling sub-quadratic formulas over symmetric gates).

For every and , there is a pseudorandom generator with seed length

 O(n1/2⋅s1/4⋅log(n)⋅log(1/ε)).

that -fools -variate Boolean functions in .

Prior to this work, Chen and Wang [CW19] proved that the number of satisfying assignments of an -variate formula of size over symmetric gates can be approximately counted to an additive error term in deterministic time , where is an arbitrary constant. While their upper bound is achieved by a white-box algorithm, Corollary 4 provides a (black-box) PRG for the same task.

#### 1.1.3 Satisfiability algorithms

In the SAT problem for a computational model , we are given as input the description of a computational device from , and the goal is to count the number of satisfying assignments for . This generalizes the SAT problem for , where it is sufficient to decide whether is satisfiable by some assignment.

In this section, we show that SAT algorithms can be designed for a broad class of functions. We consider the model for classes that admit two-party communication protocols of bounded cost. We establish a general result in this context which can be used to obtain algorithms for previously studied classes of Boolean circuits.

To put our SAT algorithms for into context, we first mention relevant related work on the satisfiability of Boolean formulas. Recall that in the very restricted setting of CNF formulas, known algorithms run (in the worst-case) in time when the input formulas can have a super-linear number of clauses (cf. [DH09]). On the other hand, for the class of general formulas, there is a better-than-brute-force algorithm for formulas of size almost . In more detail, for any , there is a deterministic SAT algorithm for that runs in time [Tal15]. No results are known for formulas of cubic size and beyond, and for the reasons explained in Section 1.1.2, the algorithm from [Tal15] cannot even be applied to .

Before stating our results, we discuss the input encoding in the SAT problem for . The top formula is represented in some canonical way, while for each leaf of , the input string contains the description of a protocol computing a function in . Our results are robust to the encoding employed for . Recall that a protocol for a two-party function is specified by a protocol tree and a sequence of functions, where each function is associated with some internal node of the tree and depends on input bits. Since a protocol of communication cost has a protocol tree containing at most nodes, it can be specified by a string of length . Our algorithms will run in time closer to , and using a fully explicit input representation for the protocols is not an issue. Another possibility for the input representation is to use “computational efficient” protocols. Informally, the next bit messages of such protocols can be computed in polynomial time from the current transcript of the protocol and a player input. An advantage of this representation is that an input to our SAT problem can be succinctly represented. We observe that these input representations can be generalized to randomized two-party protocols in natural ways. We refer to Section 2 for a formal presentation.

We obtain non-trivial satisfiability algorithms assuming upper bounds on the two-party deterministic and randomized communication complexities of functions in .

###### Theorem 5 (Satisfiability algorithms).

The following results hold.

• There is a deterministic SAT algorithm for that runs in time

 2n−t, where t=Ω(n√s⋅log2(s)⋅D(G)).
• There is a randomized SAT algorithm for that runs in time

 2n−t, where t=Ω(n√s⋅log2(s)⋅R1/3(G))1/2.

Theorem 5 readily provides algorithms for many circuit classes. For instance, since one can effectively describe a randomized communication protocol for linear threshold functions [Nis94, Vio15], the algorithm from Theorem 5 Item 2 can be used to count the number of satisfying assignments of Boolean devices from .

###### Corollary 6 (#SAT algorithm for formulas of linear threshold functions).

There is a randomized SAT algorithm for that runs in time

 2n−t, where t=Ω(n√s⋅log2(s)⋅log(n))1/2.

In connection with Corollary 6, prior to this work essentially two lines of research have been pursued. SAT and/or SAT algorithms were known for bounded-depth circuits of almost-linear size whose gates can compute LTFs or sparse PTFs (see [KL18] and references therein), and for sub-exponential size circuits with two layers of LTFs at the bottom, assuming a sub-quadratic number of them in the layer next to the input variables (see [ACW16] for this result and further related work). Corollary 6 seems to provide the first non-trivial SAT algorithm that operates with unbounded-depth Boolean devices containing a layer with a sub-quadratic number of LTFs.

Theorem 5 can be seen as a generalization of several approaches to designing SAT algorithms appearing in the literature, which often employ ad-hoc constructions to convert bottlenecks in the computation of devices from a class into non-trivial SAT algorithms for . We observe that, before this work, [PW10] had made a connection between faster SAT algorithms for CNFs and the 3-party communication complexity of a specific function. Their setting is different though: it seems to work only for CNFs, and they rely on conjectured upper bounds on the communication complexity of a particular problem. More recently, [CW19] employed quantum communication protocols to design approximate counting algorithms for several problems.555Recall that approximately counting satisfying assignments is substantially easier than solving SAT, for which the fastest known algorithms run in time . In comparison to previous works, to our knowledge Theorem 5 is the first unconditional result that yields faster SAT algorithms via communication complexity in a generic way.666It has been brought to our attention that Avishay Tal has independently discovered a SAT algorithm for bipartite formulas of sub-quadratic size (see the discussion in [AB18, Page 7]), which corresponds to a particular case of Theorem 5.

#### 1.1.4 Learning algorithms

We describe a learning algorithm for the class in Leslie Valiant’s challenging PAC-learning model [Val84]. Recall that a (PAC) learning algorithm for a class of functions has access to labelled examples from an unknown function , where is sampled according to some (also unknown) distribution . The goal of the learner is to output, with high probability over its internal randomness and over the choice of random examples (measured by a confidence parameter ), a hypothesis that is close to under (measured by an error parameter ). We refer to [KV94] for more information about this learning model, and to Section 2 for its standard formalization.

It is known that formulas of size can be PAC-learned in time [Rei11b]. Therefore, formulas of almost quadratic size can be non-trivially learned from random samples of an arbitrary distribution. A bit more formally, we say that a learning algorithm is non-trivial if it runs in time , i.e., noticeably faster than the trivial brute-force algorithm that takes time . Obtaining non-trivial learning algorithms for various circuit classes is closely connected to the problem of proving explicit lower bounds against the class [OS17] (see also [ST17] for a systematic investigation of such algorithms). We are not aware of the existence of non-trivial learning algorithms for super-quadratic size formulas. However, it seems likely that such algorithms exist at least for formulas of near cubic size. As explained in Section 1.1.2, this would still be insufficient for the learnability of classes such as (linear size) .

We explore structural properties of employed in previous results and boosting techniques from learning theory to show that sub-quadratic size devices from this class can be PAC-learned in time .

###### Theorem 7 (PAC-learning FORMULA∘XOR in sub-exponential time).

For every constant , there is an algorithm that PAC learns the class of -variate Boolean functions to accuracy and with confidence in time .

Note that a sub-exponential running time cannot be achieved for when we consider the communication complexity of . Again, the class is too large, for the same reason discussed in Section 1.1.2. It might still be possible to design a non-trivial learning algorithm in this case, but this would possibly require the introduction of new lower bound techniques for .

In contrast to the algorithm mentioned above that learns (standard) formulas of size in time , the algorithm from Theorem 7 does not learn smaller formulas over parities in time faster than . We discuss this in more detail in Sections 1.2 and 1.3.

Finally, we mention a connection to cryptography that provides a conditional upper bound on the size of circuits that can be learned in time . It is well known that if a circuit class can compute pseudorandom functions (or some variants of this notion), then it cannot be learned in various learning models (see e.g. [KV94]). It has been recently conjectured that depth-two circuits of linear size can compute weak pseudorandom functions of exponential security [BIP18, Conjecture 3.7]. If this conjecture holds, then such circuits cannot be learned in time . Since gates over a linear number of input wires can be simulated by formulas of size at most [Ser17], under this cryptographic assumption it is not possible to learn in time , even if the learner only needs to succeed under the uniform distribution.

### 1.2 Techniques

In order to explain our techniques, we focus for the most part on the design of PRGs for when is of bounded two-party randomized communication complexity (a particular case of Theorem 2 Item 2). This proof makes use of various ingredients employed in other results. After sketching this argument, we say a few words about our strongest lower bound (Theorem 1 Item 1) and the satisfiability and learning algorithms (Theorems 5 and 7, respectively).

We build on a powerful result showing that any small de Morgan formula can be approximated pointwise by a low-degree polynomial:

(A) For every formula of size , there is a polynomial of degree such that on every .

The only known proof of this result [Rei11b] relies on a sequence of works [BBC01, LLS06, HLS07, FGG08, Rei09, ACR10, RS12] on quantum query complexity, generalizing Grover’s search algorithm for the OR predicate [Gro96] to arbitrary formulas. The starting point of many of our results is a consequence of (A) which is implicit in the work of Tal [Tal17].

(B) Let be a distribution over , and . Then, for every function ,

 if~{}Prx∼D[F(x)=f(x)]≥1/2+ε~{}~{}% then~{}~{}Prx∼D[h(x)=f(x)]≥1/2+exp(−t)

for some function which is the XOR of at most functions in , where .

Intuitively, if we could understand well enough the XOR of any small collection of functions in , then we can translate this into results for , as long as . We adapt the techniques behind (B) to provide a general approach to constructing PRGs against :

Main PRG Lemma. In order for a distribution to -fool the class , it is enough for it to -fool the class , where .

Recall that, in Theorem 2 Item 2, we consider a class of functions that admit two-party randomized protocols of cost . It is easy to see that the XOR of any functions from is a function that can be computed by a protocol of cost at most . Thus the lemma above shows that it is sufficient to fool, to exponentially small error, a class of functions of bounded two-party randomized communication complexity. Moreover, since a randomized protocol can be written as a convex combination of deterministic protocols, it is possible to prove that fooling functions of bounded deterministic communication complexity is enough.

Pseudorandom generators in the two-party communication model have been known since [INW94]. Their construction exploits that the Boolean matrix associated with a function of small communication cost can be partitioned into a not too large number of monochromatic rectangles. We provide in Appendix A.2 a slightly modified and self-contained construction based on explicit extractors. It achieves the following parameters: There is an explicit PRG that -fools any -bit function of two-party communication cost and that has seed length . This PRG has non-trivial seed length even when the error is exponentially small, as required by our techniques. One issue here is that the INW PRG was only shown to fool functions with low deterministic communication complexity. To obtain our PRGs for when admits low-cost randomized protocols, we first extend the analysis of the INW PRG to show that it also fools functions with low randomized communication complexity. Combining this construction with the aforementioned discussion completes the proof of Theorem 2 Item 2.

The argument just sketched reduces the construction of PRGs for when functions in admit low-cost randomized protocols to the analysis of PRGs for functions that admit relatively low-cost deterministic protocols. Our lower bound proof for in Theorem 1 Item 1 proceeds in a similar fashion. We combine statement (B) described above with other ideas to show:

Transfer Lemma (Informal). If a function correlates with some small formula whose leaf gates have low-cost randomized -party protocols, then it also non-trivially correlates with some function that has relatively low-cost deterministic -party protocols.

Given this result, we are able to rely on a strong average-case lower bound for against -party deterministic protocols from [BNS92] to conclude that is hard for .

Our SAT algorithms combine the polynomial representation of the top formula provided by (A), for which we show that such a polynomial can be obtained explicitly, with a decomposition of the Boolean matrix at each leaf that is induced by a corresponding low-cost randomized or deterministic two-party protocol. A careful combination of these two representations allows us to adapt a standard technique employed in the design of non-trivial SAT algorithms (fast rectangular matrix multiplication) to obtain non-trivial savings in the running time.

Finally, our learning algorithm for is a consequence of statement (B) above coupled with standard tools from learning theory. In a bit more detail, since a parity of parities is just another parity function, (B) implies that, under any distribution, every function in is weakly correlated with some parity function. Using the agnostic learning algorithm for parity functions of [KMV08], it is possible to weakly learn in time . This weak learner can then be transformed into a (strong) PAC learner using standard boosting techniques [Fre90], with only a polynomial blow-up over its running time.

### 1.3 Concluding remarks

The main message of our results is that the computational power of a subquadratic-size top formula is not significantly enhanced by leaf gates of low communication complexity. We believe that the idea of decomposing a Boolean device into a computational part and a layer of communication protocols will find further applications in lower bound proofs and algorithm design.

One of our main open problems is to discover a method that can analyze when . For instance, is it possible to adapt existing techniques to show an explicit lower bound against , or achieving this is just as hard as breaking the cubic barrier for formula lower bounds? Results in this direction would be interesting even for .

Finally, we would like to mention a few questions connected to our results and their applications. Is it possible to combine the techniques behind Corollary 3 and [OST19] to design a PRG of seed length and error for the intersection of halfspaces? Can we design a satisfiability algorithm for formulas over -party number-on-forehead communication protocols? Is it possible to learn in time ? (The learning algorithm for formulas from [Rei11b] relies on techniques from [KKMS08], and it is unclear how to extend them to the case of .)

### 1.4 Organization

Theorem 1 Item 1 is proved in Section 3, while Items 2 and 3 rely on our PRG constructions and are deferred to Section 4. The latter describes a general approach to constructing PRGs for . It includes the proof of Theorem 2 and other applications. Our satisfiability algorithms (Theorem 5) appear in Section 5. Finally, Section 6 discusses learning results for and contains a proof of Theorem 7.

## 2 Preliminaries

### 2.1 Notation

Let ; we denote by , and denote by the uniform distribution over . We use (and ) to hide polylogarithmic factors. That is, for any , we have that .

In this paper, we will mainly use as the Boolean basis. In some parts of this paper, we will use the basis for the simplicity of the presentation. This will be specified in corresponding sections.

### 2.2 De Morgan formulas and extensions

###### Definition 8.

An -variate de Morgan formula is a directed rooted tree; its non-leaf vertices (henceforth, internal gates) take labels from and its leaves (henceforth, variable gates) take labels from the set of variables . Each internal gate has bounded in-degree (henceforth, fan-in); the NOT gate in particular has fan-in and every variable gate has fan-in . The size of a de Morgan formula is the number of its leaf gates.

In this work, we denote by the class of Boolean functions computable by size- de Morgan formulas. Let denote some class of Boolean functions; then, we denote by the class of functions computable by some size- de Morgan formula where its leaves are labelled by functions in .

### 2.3 Approximating polynomials

###### Definition 9 (Point-wise approximation).

For a Boolean function , we say that the function -approximates if for every ,

 ∣∣f(z)−~f(z)∣∣≤ε.

We will need the following powerful result for the approximating degree of de Morgan formulas.

Let be an integer and . Any de Morgan formula of size has a -approximating polynomial of degree . That is, there exists a degree- polynomial over the reals such that for every ,

 |p(z)−F(z)|≤ε.

Note that Theorem 10 still holds if we use as the Boolean basis.

### 2.4 Communication complexity

We use standard definitions from communication complexity. In this paper we consider the standard two party model of Yao and its generalizations to multiparty setting. We denote deterministic communication complexity of a Boolean function by in the two party setting. We refer to [KN97] for standard definitions from communication complexity.

###### Definition 11.

Let be a Boolean function. The communication matrix of , namely , is a matrix defined by .

###### Definition 12.

A rectangle is a set of the form , for . A monochromatic rectangle is a rectangle such that for all pairs the value is the same.

###### Lemma 13.

Let be a protocol that computes with at most bits of communication. Then, induces a partition of into at most monochromatic rectangles.

Given a protocol, its transcript is the sequence of bits communicated.

###### Lemma 14.

For every transcript of some communication protocol, the set of inputs that generate is a rectangle.

Below, we recount the definitions of two multiparty communication models used in this work, namely the number-on-forehead and the number-in-hand models.

###### Definition 15 (“Number-on-forehead” communication model; informal).

In the -party “number-on-forehead” communication model, there are players and strings and player gets all the strings except for . The players are interested in computing a value , where is some fixed function. We denote by the number of bits that must be exchanged by the best possible number on forehead protocol solving .

We also use the following weaker communication model.

###### Definition 16 (“Number-in-hand” communication model; informal).

In the -party “number-in-hand” communication model, there are players and strings and player gets only . The players are interested in computing a value , where is some fixed function. We denote by the number of bits that must be exchanged by the best possible communication protocol.

Note that , for any -variate Boolean function , as if players write on the blackboard their string, then the player that did not reveal her input may compute on her own and then publish it.

For the communication models mentioned above, there are also bounded-error randomized versions, denoted by , , and , respectively, where is an upper bound on the error probability of the protocol. In this setting, the players have access to some shared random string, say , and the aforementioned error probability of the protocol is considered over the possible choices of . Moreover, we require the error to be at most on each fixed choice of inputs.

We can extend the definitions of the communication complexity measures, defined above, to classes of Boolean functions, in a natural way. That is, for any communication complexity measure and for any class of Boolean functions , we may define

 M(G):=maxg∈GM(g).

We note that throughout this paper, we denote by the number of input bits for the function regardless the communication models. In the -party communication setting (either NOF or NIH), we assume without loss of generality that is divisible by .

### 2.5 Pseudorandomness

A PRG against a class of functions is a deterministic procedure mapping short Boolean strings (seeds) to longer Boolean strings, so that ’s output “looks random” to every function in .

###### Definition 17 (Pseudorandom generators).

Let be a function, be a class of Boolean functions, and . We say that is a pseudorandom generator of seed length that -fools if, for every function , it is the case that

 ∣∣ ∣∣Ez∼{−1,1}ℓ[f(G(z))]−Ex∼{−1,1}n[f(x)]∣∣ ∣∣≤ε.

A PRG outputting bits is called explicit if can be computed in time. All PRGs stated in this paper are explicit.

### 2.6 Learning

For a function and a distribution supported over , we denote by a randomized oracle that outputs independent identically distributed labelled examples of the form , where .

###### Definition 18 (PAC learning model [Val84]).

Let be a class of Boolean functions. We say that a randomized algorithm learns if, when is given oracle access to and inputs , , and , the following holds. For every -variate function , distribution supported over , and real-valued parameters and , outputs with probability at least over its internal randomness and the randomness of the example oracle a description of a hypothesis such that

 Prx∼D[f(x)=h(x)]≥1−ε.

The sample complexity of a learning algorithm is the maximum number of random examples from requested during its execution.

## 3 Lower bounds

In this section, we prove an average-case lower bound for the generalized inner product function against , where is the set of functions that have low-cost randomized communication protocols in the number-on-forehead setting. This corresponds to Item 1 of Theorem 1. Items 2 and 3 rely on our PRG constructions, and the proofs are deferred to Section 4.

###### Theorem 19.

For any integer , and any class of functions , let be a function in such that

 Prx∼{−1,1}n[C(x)=GIPkn(x)]≥1/2+ε.

Then

 s=Ω⎛⎜ ⎜⎝n2k2⋅16k⋅(R(k)ε/(2n2)(G)+logn)2⋅log2(1/ε)⎞⎟ ⎟⎠.

We need a couple useful lemmas from [Tal16], whose proofs are presented in Section A.1 (Lemma 50 and Lemma 51) for completeness.

###### Lemma 20 ([Tal16]).

Let be a distribution over , and let be such that

 Prx∼D[C(x)=f(x)]≥1/2+ε.

Let be a -approximating function of , i.e., for every , . Then,

 Ex∼D[~C(x)⋅f(x)]≥ε.
###### Lemma 21 ([Tal16]).

Let be a distribution over and let be a class of functions. For , suppose that is such that

 Prx∼D[D(x)=f(x)]≥1/2+ε0.

Then there exists some such that

 Ex∼D[h(x)⋅f(x)]≥1sO(√s⋅log(1/ε0)).

We also need the following communication-complexity lower bound for .

###### Theorem 22 ([Bns92, Theorem 2]).

For any , any function that computes on more than fraction of the inputs (over uniformly random inputs) must have -party deterministic communication complexity at least .

We first show that if a function correlates with some small formula, whose leaves are functions with low randomized communication complexity, then it also correlates non-trivially with some function of relatively low deterministic communication complexity.

###### Lemma 23.

For any distribution over , and any class of functions , let and be such that

 Prx∼D[C(x)=f(x)]≥1/2+ε.

Then there exists a function , with -party deterministic communication complexity at most

 O(R(k)ε/(2s)(G)⋅√s⋅log(1/ε)),

such that

 Prx∼D[h(x)=f(x)]≥1/2+1/sO(√s⋅log(1/ε)).
###### Proof.

Let be the function in , where is a formula and are leaf functions from the class . For each , consider a -party randomized protocol of cost at most that has an error . Now consider the following function

 ~C(x)\vcentcolon=EΠ1,Π2,…,Πs[D(x)],

where

 D(x)\vcentcolon=F(Π1(x),Π2(x),…,Πs(x)).

Note that for any fixed choice of , is a formula whose leaves are functions with deterministic communication complexity at most . Next, we show the following.

###### Claim 24.

The function -approximates .

###### Proof of creftypecap 24.

First note that since each is a -error randomized protocol, by taking the union bound over the leaf functions, we have that for every input ,

 PrΠ1,Π2,…,Πs[Π1(x)=g1(x)∧Π2(x)=g2(x)∧⋯∧Πs(x)=gs(x)]≥1−ε/2.

Denote by the event . We have for every ,

 ~C(x) =EΠ1,Π2,…,Πs[D(x)] =E[D(x)∣E]⋅Pr[E]+E[D(x)∣¬E]⋅Pr[¬E] =C(x)⋅Pr[E]+E[D(x)∣¬E]⋅Pr[¬E].

On the one hand, we have

 ~C(x)=C(x)⋅Pr[E]+E[D(x)∣¬E]⋅Pr[¬E]≤C(x)+ε/2.

On the other hand, we get

 ~