 # Near-Optimal Lower Bounds on the Threshold Degree and Sign-Rank of AC^0

The threshold degree of a Boolean function f{0,1}^n→{0,1} is the minimum degree of a real polynomial p that represents f in sign: sgn p(x)=(-1)^f(x). A related notion is sign-rank, defined for a Boolean matrix F=[F_ij] as the minimum rank of a real matrix M with sgn M_ij=(-1)^F_ij. Determining the maximum threshold degree and sign-rank achievable by constant-depth circuits (AC^0) is a well-known and extensively studied open problem, with complexity-theoretic and algorithmic applications. We give an essentially optimal solution to this problem. For any ϵ>0, we construct an AC^0 circuit in n variables that has threshold degree Ω(n^1-ϵ) and sign-rank (Ω(n^1-ϵ)), improving on the previous best lower bounds of Ω(√(n)) and (Ω̃(√(n))), respectively. Our results subsume all previous lower bounds on the threshold degree and sign-rank of AC^0 circuits of any given depth, with a strict improvement starting at depth 4. As a corollary, we also obtain near-optimal bounds on the discrepancy, threshold weight, and threshold density of AC^0, strictly subsuming previous work on these quantities. Our work gives some of the strongest lower bounds to date on the communication complexity of AC^0.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

A real polynomial is said to sign-represent the Boolean function if for every input The threshold degree of , denoted , is the minimum degree of a multivariate real polynomial that sign-represents . Equivalent terms in the literature include strong degree , voting polynomial degree , PTF degree , and sign degree . Since any function can be represented exactly by a real polynomial of degree at most the threshold degree of is an integer between and Viewed as a computational model, sign-representation is remarkably powerful because it corresponds to the strongest form of pointwise approximation. The formal study of threshold degree began in 1969 with the pioneering work of Minsky and Papert 

on limitations of perceptrons. The authors of

 famously proved that the parity function on variables has the maximum possible threshold degree,

. They obtained lower bounds on the threshold degree of several other functions, including DNF formulas and intersections of halfspaces. Since then, sign-representing polynomials have found applications far beyond artificial intelligence. In theoretical computer science, applications of threshold degree include circuit lower bounds

[28, 29, 42, 19, 7], size-depth trade-offs [37, 55], communication complexity [42, 19, 44, 39, 7, 53, 51], structural complexity [4, 9], and computational learning [26, 25, 35, 3, 47, 49, 13, 50, 56].

The notion of threshold degree has been especially influential in the study of , the class of constant-depth polynomial-size circuits with gates of unbounded fan-in. The first such result was obtained by Aspnes et al. , who used sign-representing polynomials to give a beautiful new proof of classic lower bounds for . In communication complexity, the notion of threshold degree played a central role in the first construction [42, 44] of an circuit with exponentially small discrepancy and hence large communication complexity in nearly every model. That discrepancy result was used in  to show the optimality of Allender’s classic simulation of by majority circuits, solving the open problem  on the relation between the two circuit classes. Subsequent work [19, 7, 53, 51] resolved other questions in communication complexity and circuit complexity related to constant-depth circuits by generalizing the threshold degree method of [42, 44].

Sign-representing polynomials also paved the way for algorithmic breakthroughs in the study of constant-depth circuits. Specifically, any function of threshold degree can be viewed as a halfspace in dimensions, corresponding to the monomials in a sign-representation of . As a result, a class of functions of threshold degree at most can be learned in the standard PAC model under arbitrary distributions in time polynomial in Klivans and Servedio  used this threshold degree approach to give what is currently the fastest algorithm for learning polynomial-size DNF formulas, with running time . Another learning-theoretic breakthrough based on threshold degree is the fastest algorithm for learning Boolean formulas, obtained by O’Donnell and Servedio  for formulas of constant depth and by Ambainis et al.  for arbitrary depth. Their algorithm runs in time for formulas of size and constant depth , and in time for formulas of unbounded depth. In both cases, the bound on the running time follows from the corresponding upper bound on the threshold degree.

A far-reaching generalization of threshold degree is the matrix-analytic notion of sign-rank, which allows sign-representation out of arbitrary low-dimensional subspaces rather than the subspace of low-degree polynomials. The contribution of this paper is to prove essentially optimal lower bounds on the threshold degree and sign-rank of , which in turn imply lower bounds on other fundamental complexity measures of interest in communication complexity and learning theory. In the remainder of this section, we give a detailed overview of the previous work, present our main results, and discuss our proofs.

### 1.1. Threshold degree of AC0

Determining the maximum threshold degree of an circuit in variables is a longstanding open problem in the area. It is motivated by algorithmic and complexity-theoretic applications [26, 35, 27, 39, 13], in addition to being a natural question in its own right. Table 1 gives a quantitative summary of the results obtained to date. In their seminal monograph, Minsky and Papert  proved a lower bound of on the threshold degree of the following DNF formula in variables:

 f(x)=n1/3⋀i=1n2/3⋁j=1xi,j.

Three decades later, Klivans and Servedio  obtained an upper bound on the threshold degree of any polynomial-size DNF formula in variables, essentially matching Minsky and Papert’s result and resolving the problem for depth . Determining the threshold degree of circuits of depth proved to be challenging. The only upper bound known to date is the trivial which follows directly from the definition of threshold degree. In particular, it is consistent with our knowledge that there are circuits with linear threshold degree. On the lower bounds side, the only progress for a long time was due to O’Donnell and Servedio , who constructed for any a circuit of depth with threshold degree The authors of  formally posed the problem of obtaining a polynomial improvement on Minsky and Papert’s lower bound. Such an improvement was obtained in , with a threshold degree lower bound of for circuits of depth  A polynomially stronger result was obtained in , with a lower bound of on the threshold degree of an explicit circuit of depth . Bun and Thaler  recently used a different, depth- circuit to give a much simpler proof of the lower bound for . We obtain a quadratically stronger, and near-optimal, lower bound on the threshold degree of .

###### Theorem 1.1.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and threshold degree

 \degthr(fn)=Ω(nk−1k+1⋅(logn)−1k+1⌈k−22⌉⌊k−22⌋).

Moreover, has bottom fan-in for all

For large Theorem 1.1 essentially matches the trivial upper bound of on the threshold degree of any function. For any fixed depth Theorem 1.1 subsumes all previous lower bounds on the threshold degree of with a polynomial improvement starting at depth In particular, the lower bounds due to Minsky and Papert  and Bun and Thaler  are subsumed as the special cases and , respectively. From a computational learning perspective, Theorem 1.1 definitively rules out the threshold degree approach to learning constant-depth circuits.

### 1.2. Sign-rank of AC0

The sign-rank of a matrix without zero entries, denoted is the least rank of a real matrix with for all In other words, the sign-rank of is the minimum rank of a matrix that can be obtained by making arbitrary sign-preserving changes to the entries of . The sign-rank of a Boolean function is defined in the natural way as the sign-rank of the matrix In particular, the sign-rank of is an integer between and . This fundamental notion has been studied in contexts as diverse as matrix analysis, communication complexity, circuit complexity, and learning theory; see  for a bibliographic overview. To a complexity theorist, sign-rank is a vastly more challenging quantity to analyze than threshold degree. Indeed, a sign-rank lower bound rules out sign-representation out of every linear subspace of given dimension, whereas a threshold degree lower bound rules out sign-representation specifically by linear combinations of monomials up to a given degree.

Unsurprisingly, progress in understanding sign-rank has been slow and difficult. No nontrivial lower bounds were known for any explicit matrices until the breakthrough work of Forster , who proved strong lower bounds on the sign-rank of Hadamard matrices and more generally all sign matrices with small spectral norm. The sign-rank of constant-depth circuits has since seen considerable work, as summarized in Table 2. The first exponential lower bound on the sign-rank of an circuit was obtained by Razborov and Sherstov , solving a -year-old problem due to Babai, Frankl, and Simon . The authors of  constructed a polynomial-size circuit of depth  with sign-rank . In follow-up work, Bun and Thaler  constructed a polynomial-size circuit of depth  with sign-rank . A more recent and incomparable result, also due to Bun and Thaler , is a sign-rank lower bound of for a circuit of polynomial size and depth . No nontrivial upper bounds are known on the sign-rank of . Closing this gap between the best lower bound of and the trivial upper bound of has been a challenging open problem. We solve this problem almost completely, by constructing for any a constant-depth circuit with sign-rank In quantitative detail, our results on the sign-rank of are the following two theorems.

###### Theorem 1.2.

Let be a given integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and sign-rank

 \rk±(Fn)=exp(Ω(n1−1k+1⋅(logn)−k(k−1)2(k+1))).

As a companion result, we prove the following qualitatively similar but quantitatively incomparable theorem.

###### Theorem 1.3.

Let be a given integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and sign-rank

 \rk±(Gn)=exp(Ω(n1−1k+1.5⋅(logn)−k22k+3)).

For large , the lower bounds of Theorems 1.2 and 1.3 approach the trivial upper bound of on the sign-rank of any Boolean function . For any given depth, Theorems 1.2 and 1.3 subsume all previous lower bounds on the sign-rank of with a strict improvement starting at depth . From a computational learning perspective, Theorems 1.2 and 1.3 state that has near-maximum dimension complexity [41, 43, 39, 17], namely, for any constant This rules out the possibility of learning circuits via dimension complexity , a far-reaching generalization of the threshold degree approach from the monomial basis to arbitrary bases.

### 1.3. Communication complexity

Theorems 1.11.3 imply strong new lower bounds on the communication complexity of . We adopt the standard randomized model of Yao , with players Alice and Bob and a Boolean function On input Alice and Bob receive the arguments and respectively, and communicate back and forth according to an agreed-upon protocol. Each player privately holds an unlimited supply of uniformly random bits that he or she can use when deciding what message to send at any given point in the protocol. The cost of a protocol is the total number of bits communicated in a worst-case execution. The -error randomized communication complexity of is the least cost of a protocol that computes

with probability of error at most

on every input.

Of particular interest to us are communication protocols with error probability close to that of random guessing, There are two standard ways to formalize the complexity of a communication problem in this setting, both inspired by probabilistic polynomial time

for Turing machines:

 UPP(F)=inf0≤ϵ<1/2Rϵ(F)

and

 PP(F)=inf0≤ϵ<1/2⎧⎨⎩Rϵ(F)+log2⎛⎝112−ϵ⎞⎠⎫⎬⎭.

The former quantity, introduced by Paturi and Simon , is called the communication complexity of with unbounded error, in reference to the fact that the error probability can be arbitrarily close to The latter quantity is called the communication complexity of with weakly unbounded error. Proposed by Babai et al. , it features an additional penalty term that depends on the error probability. It is clear that

 UPP(F)≤PP(F)≤n+2

for every communication problem , with an exponential gap achievable between the two complexity measures [10, 41]. These two models occupy a special place in the study of communication because they are more powerful than any other standard model (deterministic, nondeterministic, randomized, quantum with or without entanglement). Moreover, unbounded-error protocols represent a frontier in communication complexity theory in that they are the most powerful protocols for which explicit lower bounds are currently known. Our results imply that even for such protocols, has near-maximal communication complexity.

To begin with, combining Theorem 1.1 with the pattern matrix method [42, 44] gives:

###### Theorem 1.4.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth communication complexity

 PP(Fn)=Ω(nk−1k+1⋅(logn)−1k+1⌈k−22⌉⌊k−22⌋)

and discrepancy

 \disc(Fn)=exp(−Ω(nk−1k+1⋅(logn)−1k+1⌈k−22⌉⌊k−22⌋)).

Discrepancy is a combinatorial complexity measure of interest in communication complexity theory and other research areas; see Section 2.8 for a formal definition. As grows, the bounds of Theorem 1.4 approach the best possible bounds for any communication problem The same qualitative behavior was achieved in previous work by Bun and Thaler , who constructed, for any constant , a constant-depth circuit with communication complexity and discrepancy . Theorem 1.4 strictly subsumes the result of Bun and Thaler  and all other prior work on the discrepancy and -complexity of constant-depth circuits [42, 10, 44, 50, 52]. For any fixed depth greater than , the bounds of Theorem 1.4 are a polynomial improvement in over all previous work. We further show that Theorem 1.4 carries over to the number-on-the-forehead model, the strongest formalism of multiparty communication. This result, presented in detail in Section 4.4, uses the multiparty version  of the pattern matrix method.

Our work also gives near-optimal lower bounds for in the much more powerful unbounded-error model. Specifically, it is well-known  that the unbounded-error communication complexity of any Boolean function coincides up to an additive constant with the logarithm of the sign-rank of As a result, Theorems 1.2 and 1.3 imply:

###### Theorem 1.5.

Let be a given integer. Let and be the polynomial-size circuit families of depth and respectively, constructed in Theorems 1.2 and 1.3. Then

 UPP(Fn) =Ω(n1−1k+1⋅(logn)−k(k−1)2(k+1)), UPP(Gn) =Ω(n1−1k+1.5⋅(logn)−k22k+3).

For large the lower bounds of Theorem 1.5 essentially match the trivial upper bound of on the unbounded-error communication complexity of any function Theorem 1.5 strictly subsumes all previous lower bounds on the unbounded-error communication complexity of , with a polynomial improvement for any depth greater than . The best lower bound on the unbounded-error communication complexity of prior to our work was for a circuit of depth , due to Bun and Thaler . Finally, we remark that Theorem 1.5 gives essentially the strongest possible separation of the communication complexity classes and . We refer the reader to the work of Babai et al.  for definitions and detailed background on these classes.

Qualitatively, Theorem 1.5 is stronger than Theorem 1.4 because communication protocols with unbounded error are significantly more powerful than those with weakly unbounded error. On the other hand, Theorem 1.4 is stronger quantitatively for any fixed depth and has the additional advantage of generalizing to the multiparty setting.

### 1.4. Threshold weight and threshold density

By well-known reductions, Theorem 1.1 implies a number of other lower bounds for the representation of circuits by polynomials. For the sake of completeness, we mention two such consequences. The threshold density of a Boolean function denoted is the minimum size of a set family such that

 \sign⎛⎝∑S∈\ScalλS(−1)∑i∈Sxi⎞⎠≡(−1)f(x)

for some reals A related complexity measure is threshold weight, denoted and defined as the minimum sum over all integers such that

 \sign⎛⎝∑S⊆{1,2,…,n}λS(−1)∑i∈Sxi⎞⎠≡(−1)f(x).

It is not hard to see that the threshold density and threshold weight of correspond to the minimum size of a threshold-of-parity and majority-of-parity circuit for respectively. The definitions imply that for every and a little more thought reveals that and These complexity measures have seen extensive work, motivated by applications to computational learning and circuit complexity. For a bibliographic overview, we refer the reader to [50, Section 8.2].

Krause and PudlÃ¡k [28, Proposition 2.1] gave an ingenious method for transforming threshold degree lower bounds into lower bounds on threshold density and thus also threshold weight. Specifically, let be a Boolean function of interest. The authors of  considered the related function given by , and proved that In this light, Theorem 1.1 implies that the threshold density of is for any constant :

###### Corollary 1.6.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size and depth and satisfies

 W(Fn) ≥\dns(Fn) =exp(Ω(nk−1k+1⋅(logn)−1k+1⌈k−22⌉⌊k−22⌋)).

For large the lower bounds on the threshold weight and density in Corollary 1.6 essentially match the trivial upper bounds. Observe that the circuit family of Corollary 1.6 has the same depth as the circuit family of Theorem 1.1. This is because has bottom fan-in , and thus the Krause-PudlÃ¡k transformation can be “absorbed” into the bottom two levels of . Corollary 1.6 subsumes all previous lower bounds [28, 13, 50, 52, 17] on the threshold weight and density of with a polynomial improvement for every The improvement is particularly noteworthy in the case of threshold density, where the best previous lower bound [52, 17] was .

### 1.5. Previous approaches

In the remainder of this section, we discuss our proofs of Theorems 1.11.3. The notation that we use here is standard, and we defer its formal review to Section 2. We start with necessary approximation-theoretic background, then review relevant previous work, and finally contrast it with the approach of this paper. To sidestep minor technicalities, we will represent Boolean functions in this overview as mappings We alert the reader that we will revert to the standard representation starting with Section 2.

#### Background

Recall that our results concern the sign-representation of Boolean functions and matrices. To properly set the stage for our proofs, however, we need to consider the more general notion of pointwise approximation . Let be a Boolean function of interest. The -approximate degree of denoted is the minimum degree of a real polynomial that approximates within  pointwise: The regimes of most interest are bounded-error approximation, corresponding to constants ; and large-error approximation, corresponding to In the former case, the choice of error parameter is immaterial and affects the approximate degree of a Boolean function by at most a multiplicative constant. It is clear that pointwise approximation is a stronger requirement than sign-representation, and thus for all

A moment’s thought reveals that threshold degree is in fact the limiting case of

-approximate degree as the error parameter approaches :

 \degthr(f)=limϵ↗1\degeps(f). (1.1)

Both approximate degree and threshold degree have dual characterizations 

, obtained by appeal to linear programming duality. Specifically,

if and only if there is a function with the following two properties: ; and for every polynomial of degree less than . Rephrasing, must have large correlation with but zero correlation with every low-degree polynomial. By weak linear programming duality, constitutes a proof that and for that reason is said to witness the lower bound In view of (1.1), this discussion generalizes to threshold degree. The dual characterization here states that if and only if there is a nonzero function with the following two properties: for all and for every polynomial of degree less than . In this dual characterization, agrees in sign with and is additionally orthogonal to polynomials of degree less than The sign-agreement property can be restated in terms of correlation, as . As before, is called a threshold degree witness for

What distinguishes the dual characterizations of approximate degree and threshold degree is how the dual object relates to . Specifically, a threshold degree witness must agree in sign with at every point. An approximate degree witness, on the other hand, need only exhibit such sign-agreement with at most points, in that the points where the sign of is correct should account for most of the norm of As a result, constructing dual objects for threshold degree is significantly more difficult than for approximate degree. This difficulty is to be expected because the gap between threshold degree and approximate degree can be arbitrary, e.g.,  versus  for the majority function on bits .

#### Hardness amplification via block-composition

Much of the recent work on approximate degree and threshold degree is concerned with composing functions in ways that amplify their hardness. Of particular significance here is block-composition, defined for functions and as the Boolean function given by Block-composition works particularly well for threshold degree. To use an already familiar example, the block-composition has threshold degree whereas the constituent functions and have threshold degree  As a more extreme example, Sherstov  obtained a lower bound of on the threshold degree of the conjunction of two halfspaces , each of which by definition has threshold degree . The fact that threshold degree can increase spectacularly under block-composition is the basis of much previous work, including the best previous lower bounds [50, 52] on the threshold degree of Apart from threshold degree, block-composition has yielded strong results for approximate degree in various error regimes, including direct sum theorems , direct product theorems , and error amplification results [46, 13, 56, 14].

How, then, does one prove lower bounds on the threshold degree or approximate degree of a composed function ? It is here that the dual characterizations take center stage: they make it possible to prove lower bounds algorithmically, by constructing the corresponding dual object for the composed function. Such algorithmic proofs run the gamut in terms of technical sophistication, from straightforward to highly technical, but they have some structure in common. In most cases, one starts by obtaining dual objects and for the constituent functions and , respectively, either by direct construction or by appeal to linear programming duality. They are then combined to yield a dual object for the composed function, using dual block-composition [47, 31]:

 Φ(x1,x2,…,xn)=ϕ(\signψ(x1),…,\signψ(xn))n∏i=1|ψ(xi)|. (1.2)

This composed dual object often requires additional work to ensure sign-agreement or correlation with the composed Boolean function. Among the generic tools available to assist in this process is a “corrector” object due to Razborov and Sherstov , with the following four properties: (i)  is orthogonal to low-degree polynomials; (ii)  takes on  at a prescribed point of the hypercube; (iii)  is bounded on inputs of low Hamming weight; and (iv)  vanishes on all other points of the hypercube. Using the Razborov–Sherstov object, suitably shifted and scaled, one can surgically correct the behavior of a given dual object on a substantial fraction of inputs, thus modifying its metric properties without affecting its orthogonality to low-degree polynomials. This technique has played an important role in recent work, e.g., [15, 16, 11, 17].

#### Hardness amplification for approximate degree

While block-composition has produced a treasure trove of results on polynomial representations of Boolean functions, it is of limited use when it comes to constructing functions with high bounded-error approximate degree. To illustrate the issue, consider arbitrary functions and with -approximate degrees and respectively, for some and It is well-known  that the composed function on variables has -approximate degree This means that relative to the new number of variables, the block-composed function is asymptotically no harder to approximate to bounded error than the constituent functions and . In particular, one cannot use block-composition to transform functions on bits with -approximate degree at most into functions on bits with -approximate degree

Until recently, the best lower bound on the bounded-error approximate degree of was , due to Aaronson and Shi . Breaking this barrier was a fundamental problem in its own right, in addition to being a hard prerequisite for threshold degree lower bounds for better than This barrier was overcome in a brilliant paper of Bun and Thaler , who proved, for any constant an lower bound on the -approximate degree of . Their hardness amplification for approximate degree works as follows. Let be given, with -approximate degree for some . Bun and Thaler consider the block-composition , for an appropriate parameter As shown in earlier work [47, 13] on approximate degree, dual block-composition witnesses the lower bound Next, Bun and Thaler make the crucial observation that the dual object for has most of its mass on inputs of Hamming weight , which in view of (1.2) implies that the dual object for places most of its mass on inputs of Hamming weight The authors of  then use the Razborov–Sherstov corrector object to transfer the small amount of mass that the dual object for places on inputs of high Hamming weight, to inputs of low Hamming weight. The resulting dual object for is supported entirely on inputs of low Hamming weight and therefore witnesses a lower bound on the -approximate degree of the restriction of to inputs of low Hamming weight. By re-encoding the input to , one finally obtains a function on variables with -approximate degree polynomially larger than that of This passage from to is the desired hardness amplification for approximate degree. We find it helpful to think of Bun and Thaler’s technique as block-composition followed by input compression, to reduce the number of input variables in the block-composed function. To obtain an lower bound on the approximate degree of , the authors of  start with a trivial circuit and iteratively apply the hardness amplification step a constant number of times, until approximate degree is reached.

In follow-up work, Bun, Kothari, and Thaler  refined the technique of  by deriving optimal concentration bounds for the dual object for They thereby obtained tight or nearly tight lower bounds on the -approximate degree of surjectivity, element distinctness, and other important problems. The most recent contribution to this line of work is due to Bun and Thaler , who prove an lower bound on the -approximate degree of by combining the method of  with Sherstov’s work  on direct product theorems for approximate degree. This near-linear lower bound substantially strengthens the authors’ previous result  on the bounded-error approximate degree of , but does not address the threshold degree.

### 1.6. Our approach

#### Threshold degree of AC0

Bun and Thaler  refer to obtaining an threshold degree lower bound for as the “main glaring open question left by our work.” It is important to note here that lower bounds on approximate degree, even with the error parameter exponentially close to as in , have no implications for threshold degree. For example, there are functions  with -approximate degree but threshold degree . Our proof of Theorem 1.1 is unrelated to the most recent work of Bun and Thaler  on the large-error approximate degree of and instead builds on their earlier and simpler “block-composition followed by input compression” approach . The centerpiece of our proof is a hardness amplification result for threshold degree, whereby any function with threshold degree for a constant can be transformed efficiently and within into a function with polynomially larger threshold degree.

In more detail, let be a function of interest, with threshold degree . We consider the block-composition where is an appropriate parameter and is the Minsky–Papert function with threshold degree . We construct the dual object for from scratch to ensure concentration on inputs of Hamming weight . By applying dual block-composition to the threshold degree witnesses of and , we obtain a dual object witnessing the threshold degree of . So far in the proof, our differences from  are as follows: (i) since our goal is amplification of threshold degree, we work with witnesses of threshold degree rather than approximate degree; (ii) to ensure rapid growth of threshold degree, we use block-composition with inner function of threshold degree , in place of Bun and Thaler’s inner function of threshold degree

Since the dual object for by construction has most of its norm on inputs of Hamming weight , the dual object for the composed function has most of its norm on inputs of Hamming weight . Analogous to [16, 11, 17], we would like to use the Razborov–Sherstov corrector object to remove the mass that has on the inputs of high Hamming weight, transferring it to inputs of low Hamming weight. This brings us to the novel and technically demanding part of our proof. Previous works [16, 11, 17] transferred the mass from the inputs of high Hamming weight to the neighborhood of the all-zeroes input An unavoidable feature of the Razborov–Sherstov transfer process is that it amplifies the mass being transferred. When the transferred mass finally reaches its destination, it overwhelms ’s original values at the local points, destroying ’s sign-agreement with the composed function . It is this difficulty that most prevented earlier works [16, 11, 17] from obtaining a strong threshold degree lower bound for .

We proceed differently. Instead of transferring the mass of from the inputs of high Hamming weight to the neighborhood of , we transfer it simultaneously to exponentially many strategically chosen neighborhoods. Split this way across many neighborhoods, the transferred mass does not overpower the original values of and in particular does not change any signs. Working out the details of this transfer scheme requires subtle and lengthy calculations; it was not clear to us until the end that such a scheme exists. Once the transfer process is complete, we obtain a witness for the threshold degree of restricted to inputs of low Hamming weight. Compressing the input as in [16, 11], we obtain an amplification theorem for threshold degree. With this work behind us, the proof of Theorem 1.1 for any depth amounts to starting with a trivial circuit and amplifying its threshold degree times.

#### Sign-rank of AC0

It is not known how to “lift” a threshold degree lower bound in a black-box manner to a sign-rank lower bound. In particular, Theorem 1.1 has no implications a priori for the sign-rank of . Our proofs of Theorems 1.2 and 1.3 are completely disjoint from Theorem 1.1 and are instead based on a stronger approximation-theoretic quantity that we call -smooth threshold degree. Formally, the -smooth threshold degree of a Boolean function is the largest for which there is a nonzero function with the following two properties: for all and for every polynomial of degree less than . Taking in this formalism, one recovers the standard dual characterization of the threshold degree of In particular, threshold degree is synonymous with -smooth threshold degree. The general case of -smooth threshold degree for requires threshold degree witnesses that are min-smooth, in that the absolute value of at any given point is at least a fraction of the average absolute value of over all points. A substantial advantage of smooth threshold degree is that it has immediate sign-rank implications. Specifically, any lower bound of on the -smooth threshold degree can be converted efficiently and in a black-box manner into a sign-rank lower bound of , using a combination of the pattern matrix method [42, 44] and Forster’s spectral lower bound on sign-rank [21, 22]. Accordingly, we obtain Theorems 1.2 and 1.3 by proving an lower bound on the -smooth threshold degree of , for any constant .

At the core of our result is an amplification theorem for smooth threshold degree, whose repeated application makes it possible to prove arbitrarily strong lower bounds for . Amplifying smooth threshold degree is a complex juggling act due to the presence of two parameters—degree and smoothness—that must evolve in coordinated fashion. The approach of Theorem 1.1 is not useful here because the threshold degree witnesses that arise from the proof of Theorem 1.1 are highly nonsmooth. In more detail, when amplifying the threshold degree of a function as in the proof of Theorem 1.1, two phenomena adversely affect the smoothness parameter. The first is block-composition itself as a composition technique, which in the regime of interest to us transforms every threshold degree witness for into a hopelessly nonsmooth witness for the composed function. The other culprit is the input compression step, which re-encodes the input and thereby affects the smoothness in ways that are hard to control. To overcome these difficulties, we develop a novel approach based on what we call local smoothness.

Formally, let be a function of interest. For a subset and a real number we say that is -smooth on if for all Put another way, for any two points of at  distance the corresponding values of differ in magnitude by a factor of at most In and of itself, a locally smooth function need not be min-smooth because for a pair of points that are far from each other, the corresponding -values can differ by many orders of magnitude. However, locally smooth functions exhibit extraordinary plasticity. Specifically, we show how to modify a locally smooth function’s metric properties—such as its support or the distribution of its mass—without the change being detectable by low-degree polynomials. This apparatus makes it possible to restore min-smoothness to the dual object that results from the block-composition step and preserve that min-smoothness throughout the input compression step, eliminating the two obstacles to min-smoothness in the earlier proof of Theorem 1.1. The new block-composition step uses a locally smooth witness for the threshold degree of , which needs to be built from scratch and is quite different from the witness in the proof of Theorem 1.1.

Our described approach departs considerably from previous work on the sign-rank of constant-depth circuits [39, 15, 17]. The analytic notion in those earlier papers is weaker than -smooth threshold degree and in particular allows the dual object to be arbitrary on a fraction of the inputs. This weaker property is acceptable when the main result is proved in one shot, with a closed-form construction of the dual object. By contrast, we must construct dual objects iteratively, with each iteration increasing the degree parameter and proportionately decreasing the smoothness parameter. This iterative process requires that the dual object in each iteration be min-smooth on the entire domain. Perhaps unexpectedly, we find -smooth threshold degree easier to work with than the weaker notion in previous work [39, 15, 17]. In particular, we are able to give a new and short proof of the lower bound on the sign-rank of , originally obtained by Razborov and Sherstov  with a much more complicated approach. The new proof can be found in Section 5.1, where it serves as a prelude to our main result on the sign-rank of .

## 2. Preliminaries

### 2.1. General

For a string and a set we let denote the restriction of to the indices in In other words, where are the elements of The characteristic function of a set is given by

 \1S(x)={1if x∈S,0otherwise.

For a logical condition we use the Iverson bracket

 \I[C]={1if C holds,0otherwise.

We let denote the set of natural numbers. The following well-known bound [24, Proposition 1.4] is used in our proofs without further mention:

 k∑i=0(ni)≤(\enk)k, k=0,1,2,…,n, (2.1)

where denotes Euler’s number.

We adopt the extended real number system in all calculations, with the additional convention that We use the comparison operators in a unary capacity to denote one-sided intervals of the real line. Thus, stand for respectively. We let and stand for the natural logarithm of and the logarithm of to base respectively. We use the following two versions of the sign function:

 \signx=⎧⎨⎩−1if x<0,0if x=0,1if x>0, ˜sgnx={−1if x<0,1if x≥0.

The term Euclidean space refers to for some positive integer We let

denote the vector whose

th component is and the others are Thus, the vectors form the standard basis for For vectors and we write to mean that for each The relations on vectors are defined analogously.

We frequently omit the argument in equations and inequalities involving functions, as in . Such statements are to be interpreted pointwise. For example, the statement “ on ” means that for every The positive and negative parts of a function are denoted and , respectively.

### 2.2. Boolean functions and circuits

We view Boolean functions as mappings for some finite set More generally, we consider partial Boolean functions with the output value used for don’t-care inputs. The negation of a Boolean function is denoted as usual by The familiar functions and are given by and We abbreviate The generalized Minsky–Papert function is given by We abbreviate which is the right setting of parameters for most of our applications.

We adopt the standard notation for function composition, with defined by In addition, we use the operator to denote the componentwise composition of Boolean functions. Formally, the componentwise composition of and is the function given by To illustrate, Componentwise composition is consistent with standard composition, which in the context of Boolean functions is only defined for Thus, the meaning of is determined by the range of and is never in doubt. Componentwise composition generalizes in the natural manner to partial Boolean functions and , as follows:

 (f∘g)(x1,…,xn)={f(g(x1),…,g(xn))% if x1,…,xn∈g−1(0∪1),∗otherwise.

Compositions of three or more functions, where each instance of the operator can be standard or componentwise, are well-defined by associativity and do not require parenthesization.

For Boolean strings we let denote their bitwise XOR. The strings and are defined analogously, with the binary connective applied bitwise. A Boolean circuit in variables is a circuit with inputs and gates and . The circuit is monotone if it does not use any of the negated inputs The fan-in of is the maximum in-degree of any or gate. Unless stated otherwise, we place no restrictions on the gate fan-in. The size of is the number of and gates. The depth of is the maximum number of and gates on any path from an input to the circuit output. With this convention, the circuit that computes has depth . The circuit class consists of function families such that is computed by a Boolean circuit of size at most and depth at most , for some constant and all We specify small-depth layered circuits by indicating the type of gate used in each layer. For example, an AND-OR-AND circuit is a depth- circuit with the top and bottom layers composed of gates, and middle layer composed of gates. A Boolean formula is a Boolean circuit in which every gate has fan-out . Common examples of Boolean formulas are DNF and CNF formulas.

### 2.3. Norms and products

For a set we let denote the linear space of real-valued functions on The support of a function is denoted For real-valued functions with finite support, we adopt the usual norms and inner product:

 ∥f∥∞=maxx∈\suppf|f(x)|, ∥f∥1=∑x∈\suppf|f(x)|, ⟨f,g⟩=∑x∈\suppf∩\suppgf(x)g(x).

This covers as a special case functions on finite sets. The tensor product of and is denoted and given by

The tensor product

( times) is abbreviated For a subset and a function we define by As extremal cases, we have and Tensor product notation generalizes naturally to sets of functions: and A conical combination of is any function of the form where are nonnegative reals. A convex combination of is any function where are nonnegative reals that sum to  The conical hull of , denoted is the set of all conical combinations of functions in The convex hull, denoted , is defined analogously as the set of all convex combinations of functions in For any set of functions we have

 (\convF)⊗n ⊆\conv(F⊗n). (2.2)

Throughout this manuscript, we view probability distributions as real functions. This convention makes available the shorthands introduced above. In particular, for probability distributions

and the symbol denotes the support of , and denotes the probability distribution given by If is a probability distribution on we consider to be defined also on any superset of with the understanding that outside We let denote the family of all finitely supported probability distributions on Most of this paper is concerned with the distribution family and its subfamilies, each of which we will denote with a Fraktur letter.

Analogous to functions, we adopt the familiar norms for vectors in Euclidean space: and The latter norm is particularly prominent in this paper, and to avoid notational clutter we use interchangeably with . We refer to as the weight of For any sets and we define

 X|W={x∈X:|x|∈W