## 1. Introduction

A real polynomial is said to *sign-represent* the Boolean
function if for every
input The *threshold degree* of , denoted ,
is the minimum degree of a multivariate real polynomial that sign-represents
. Equivalent terms in the literature include *strong degree* [4],
*voting polynomial degree* [28], *PTF
degree* [34], and *sign degree* [10].
Since any function can be represented exactly
by a real polynomial of degree at most the threshold degree
of is an integer between and Viewed as a computational
model, sign-representation is remarkably powerful because it corresponds
to the strongest form of pointwise approximation. The formal study
of threshold degree began in 1969 with the pioneering work of Minsky
and Papert [32]

on limitations of perceptrons. The authors of

[32] famously proved that the parity function on variables has the maximum possible threshold degree,. They obtained lower bounds on the threshold degree of several other functions, including DNF formulas and intersections of halfspaces. Since then, sign-representing polynomials have found applications far beyond artificial intelligence. In theoretical computer science, applications of threshold degree include circuit lower bounds

[28, 29, 42, 19, 7], size-depth trade-offs [37, 55], communication complexity [42, 19, 44, 39, 7, 53, 51], structural complexity [4, 9], and computational learning [26, 25, 35, 3, 47, 49, 13, 50, 56].The notion of threshold degree has been especially influential in the study of , the class of constant-depth polynomial-size circuits with gates of unbounded fan-in. The first such result was obtained by Aspnes et al. [4], who used sign-representing polynomials to give a beautiful new proof of classic lower bounds for . In communication complexity, the notion of threshold degree played a central role in the first construction [42, 44] of an circuit with exponentially small discrepancy and hence large communication complexity in nearly every model. That discrepancy result was used in [42] to show the optimality of Allender’s classic simulation of by majority circuits, solving the open problem [28] on the relation between the two circuit classes. Subsequent work [19, 7, 53, 51] resolved other questions in communication complexity and circuit complexity related to constant-depth circuits by generalizing the threshold degree method of [42, 44].

Sign-representing polynomials also paved the way for *algorithmic*
breakthroughs in the study of constant-depth circuits. Specifically,
any function of threshold degree can be viewed as a halfspace
in dimensions, corresponding
to the monomials in a sign-representation of . As a result, a
class of functions of threshold degree at most can be learned
in the standard PAC model under arbitrary distributions in time polynomial
in Klivans and Servedio [26]
used this threshold degree approach to give what is currently the
fastest algorithm for learning polynomial-size DNF formulas, with
running time . Another learning-theoretic
breakthrough based on threshold degree is the fastest algorithm for
learning Boolean formulas, obtained by O’Donnell and Servedio [35]
for formulas of constant depth and by Ambainis et al. [3]
for arbitrary depth. Their algorithm runs in time
for formulas of size and constant depth , and in time
for formulas of unbounded depth. In both cases, the bound on the running
time follows from the corresponding upper bound on the threshold degree.

A far-reaching generalization of threshold degree is the matrix-analytic
notion of *sign-rank*, which allows sign-representation out of
arbitrary low-dimensional subspaces rather than the subspace of low-degree
polynomials. The contribution of this paper is to prove essentially
optimal lower bounds on the threshold degree and sign-rank of ,
which in turn imply lower bounds on other fundamental complexity measures
of interest in communication complexity and learning theory. In the
remainder of this section, we give a detailed overview of the previous
work, present our main results, and discuss our proofs.

Depth | Threshold degree | Reference |
---|---|---|

Minsky and Papert [32] | ||

Klivans and Servedio [26] | ||

O’Donnell and Servedio [35] | ||

Sherstov [50] | ||

Sherstov [52] | ||

Bun and Thaler [17] | ||

This paper |

###
1.1. Threshold degree of AC^{0}

Determining the maximum threshold degree of an circuit in variables is a longstanding open problem in the area. It is motivated by algorithmic and complexity-theoretic applications [26, 35, 27, 39, 13], in addition to being a natural question in its own right. Table 1 gives a quantitative summary of the results obtained to date. In their seminal monograph, Minsky and Papert [32] proved a lower bound of on the threshold degree of the following DNF formula in variables:

Three decades later, Klivans and Servedio [26] obtained an upper bound on the threshold degree of any polynomial-size DNF formula in variables, essentially matching Minsky and Papert’s result and resolving the problem for depth . Determining the threshold degree of circuits of depth proved to be challenging. The only upper bound known to date is the trivial which follows directly from the definition of threshold degree. In particular, it is consistent with our knowledge that there are circuits with linear threshold degree. On the lower bounds side, the only progress for a long time was due to O’Donnell and Servedio [35], who constructed for any a circuit of depth with threshold degree The authors of [35] formally posed the problem of obtaining a polynomial improvement on Minsky and Papert’s lower bound. Such an improvement was obtained in [50], with a threshold degree lower bound of for circuits of depth A polynomially stronger result was obtained in [52], with a lower bound of on the threshold degree of an explicit circuit of depth . Bun and Thaler [17] recently used a different, depth- circuit to give a much simpler proof of the lower bound for . We obtain a quadratically stronger, and near-optimal, lower bound on the threshold degree of .

###### Theorem 1.1.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and threshold degree

Moreover, has bottom fan-in for all

For large Theorem 1.1 essentially
matches the trivial upper bound of on the threshold degree of
any function. For any fixed depth Theorem 1.1
subsumes all* *previous lower bounds on the threshold degree
of with a polynomial improvement starting at depth
In particular, the lower bounds due to Minsky and Papert [32]
and Bun and Thaler [17] are subsumed as the
special cases and , respectively. From a computational
learning perspective, Theorem 1.1 definitively
rules out the threshold degree approach to learning constant-depth
circuits.

###
1.2. Sign-rank of AC^{0}

The *sign-rank* of a matrix without zero entries,
denoted is the least rank of a real matrix
with for all In other words,
the sign-rank of is the minimum rank of a matrix that can be
obtained by making arbitrary sign-preserving changes to the entries
of . The sign-rank of a Boolean function
is defined in the natural way as the sign-rank of the matrix
In particular, the sign-rank of is an integer between and
. This fundamental notion has been studied in contexts as
diverse as matrix analysis, communication complexity, circuit complexity,
and learning theory; see [39] for a bibliographic overview.
To a complexity theorist, sign-rank is a vastly more challenging
quantity to analyze than threshold degree. Indeed, a sign-rank lower
bound rules out sign-representation out of *every *linear subspace
of given dimension, whereas a threshold degree lower bound rules out
sign-representation specifically by linear combinations of monomials
up to a given degree.

Unsurprisingly, progress in understanding sign-rank has been slow and difficult. No nontrivial lower bounds were known for any explicit matrices until the breakthrough work of Forster [21], who proved strong lower bounds on the sign-rank of Hadamard matrices and more generally all sign matrices with small spectral norm. The sign-rank of constant-depth circuits has since seen considerable work, as summarized in Table 2. The first exponential lower bound on the sign-rank of an circuit was obtained by Razborov and Sherstov [39], solving a -year-old problem due to Babai, Frankl, and Simon [5]. The authors of [39] constructed a polynomial-size circuit of depth with sign-rank . In follow-up work, Bun and Thaler [15] constructed a polynomial-size circuit of depth with sign-rank . A more recent and incomparable result, also due to Bun and Thaler [17], is a sign-rank lower bound of for a circuit of polynomial size and depth . No nontrivial upper bounds are known on the sign-rank of . Closing this gap between the best lower bound of and the trivial upper bound of has been a challenging open problem. We solve this problem almost completely, by constructing for any a constant-depth circuit with sign-rank In quantitative detail, our results on the sign-rank of are the following two theorems.

###### Theorem 1.2.

Let be a given integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and sign-rank

As a companion result, we prove the following qualitatively similar but quantitatively incomparable theorem.

###### Theorem 1.3.

Let be a given integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth and sign-rank

For large , the lower bounds of Theorems 1.2
and 1.3 approach the trivial upper
bound of on the sign-rank of any Boolean function .
For any given depth, Theorems 1.2 and 1.3
subsume all previous lower bounds on the sign-rank of
with a strict improvement starting at depth . From a computational
learning perspective, Theorems 1.2 and 1.3
state that has near-maximum *dimension complexity* [41, 43, 39, 17],
namely, for any constant *
*This rules out the possibility of learning circuits
via dimension complexity [39], a far-reaching generalization
of the threshold degree approach from the monomial basis to arbitrary
bases.

### 1.3. Communication complexity

Theorems 1.1–1.3
imply strong new lower bounds on the communication complexity of .
We adopt the standard randomized model of Yao [30], with
players Alice and Bob and a Boolean function
On input Alice and Bob receive the arguments
and respectively, and communicate back and forth according
to an agreed-upon protocol. Each player privately holds an unlimited
supply of uniformly random bits that he or she can use when deciding
what message to send at any given point in the protocol. The *cost*
of a protocol is the total number of bits communicated in a worst-case
execution. The* -error randomized communication complexity
* of is the least cost of a protocol that computes

with probability of error at most

on every input.Of particular interest to us are communication protocols with error probability close to that of random guessing, There are two standard ways to formalize the complexity of a communication problem in this setting, both inspired by probabilistic polynomial time

for Turing machines:

and

The former quantity, introduced by Paturi and Simon [38],
is called the *communication complexity of with unbounded
error*, in reference to the fact that the error probability can be
arbitrarily close to The latter quantity is called the *communication
complexity of with weakly unbounded error*. Proposed by Babai
et al. [5], it features an additional penalty term that
depends on the error probability. It is clear that

for every communication problem , with an exponential gap achievable between the two complexity measures [10, 41]. These two models occupy a special place in the study of communication because they are more powerful than any other standard model (deterministic, nondeterministic, randomized, quantum with or without entanglement). Moreover, unbounded-error protocols represent a frontier in communication complexity theory in that they are the most powerful protocols for which explicit lower bounds are currently known. Our results imply that even for such protocols, has near-maximal communication complexity.

###### Theorem 1.4.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size, depth communication complexity

and discrepancy

*Discrepancy* is a combinatorial complexity measure
of interest in communication complexity theory and other research
areas; see Section 2.8 for a
formal definition. As grows, the bounds of Theorem 1.4
approach the best possible bounds for any communication problem
The same *qualitative* behavior was achieved in previous work
by Bun and Thaler [17], who constructed, for
any constant , a constant-depth circuit
with communication complexity
and discrepancy . Theorem 1.4
strictly subsumes the result of Bun and Thaler [17]
and all other prior work on the discrepancy and -complexity
of constant-depth circuits [42, 10, 44, 50, 52].
For any fixed depth greater than , the bounds of Theorem 1.4
are a polynomial improvement in over all previous work. We further
show that Theorem 1.4 carries over to the *number-on-the-forehead
model*, the strongest formalism of multiparty communication. This
result, presented in detail in Section 4.4,
uses the multiparty version [51] of the
pattern matrix method.

Our work also gives near-optimal lower bounds for in the much more powerful unbounded-error model. Specifically, it is well-known [38] that the unbounded-error communication complexity of any Boolean function coincides up to an additive constant with the logarithm of the sign-rank of As a result, Theorems 1.2 and 1.3 imply:

###### Theorem 1.5.

For large the lower bounds of Theorem 1.5 essentially match the trivial upper bound of on the unbounded-error communication complexity of any function Theorem 1.5 strictly subsumes all previous lower bounds on the unbounded-error communication complexity of , with a polynomial improvement for any depth greater than . The best lower bound on the unbounded-error communication complexity of prior to our work was for a circuit of depth , due to Bun and Thaler [17]. Finally, we remark that Theorem 1.5 gives essentially the strongest possible separation of the communication complexity classes and . We refer the reader to the work of Babai et al. [5] for definitions and detailed background on these classes.

Qualitatively, Theorem 1.5 is stronger than Theorem 1.4 because communication protocols with unbounded error are significantly more powerful than those with weakly unbounded error. On the other hand, Theorem 1.4 is stronger quantitatively for any fixed depth and has the additional advantage of generalizing to the multiparty setting.

### 1.4. Threshold weight and threshold density

By well-known reductions, Theorem 1.1 implies
a number of other lower bounds for the representation of
circuits by polynomials. For the sake of completeness, we mention
two such consequences. The *threshold density *of a Boolean function
denoted is the minimum size of
a set family such that

for some reals A related complexity measure is *threshold
weight*, denoted and defined as the minimum sum
over all integers such that

It is not hard to see that the threshold density and threshold weight of correspond to the minimum size of a threshold-of-parity and majority-of-parity circuit for respectively. The definitions imply that for every and a little more thought reveals that and These complexity measures have seen extensive work, motivated by applications to computational learning and circuit complexity. For a bibliographic overview, we refer the reader to [50, Section 8.2].

Krause and PudlÃ¡k [28, Proposition 2.1] gave an ingenious method for transforming threshold degree lower bounds into lower bounds on threshold density and thus also threshold weight. Specifically, let be a Boolean function of interest. The authors of [28] considered the related function given by , and proved that In this light, Theorem 1.1 implies that the threshold density of is for any constant :

###### Corollary 1.6.

Let be a fixed integer. Then there is an explicitly given Boolean circuit family where has polynomial size and depth and satisfies

For large the lower bounds on the threshold weight and density in Corollary 1.6 essentially match the trivial upper bounds. Observe that the circuit family of Corollary 1.6 has the same depth as the circuit family of Theorem 1.1. This is because has bottom fan-in , and thus the Krause-PudlÃ¡k transformation can be “absorbed” into the bottom two levels of . Corollary 1.6 subsumes all previous lower bounds [28, 13, 50, 52, 17] on the threshold weight and density of with a polynomial improvement for every The improvement is particularly noteworthy in the case of threshold density, where the best previous lower bound [52, 17] was .

### 1.5. Previous approaches

In the remainder of this section, we discuss our proofs of Theorems 1.1–1.3. The notation that we use here is standard, and we defer its formal review to Section 2. We start with necessary approximation-theoretic background, then review relevant previous work, and finally contrast it with the approach of this paper. To sidestep minor technicalities, we will represent Boolean functions in this overview as mappings We alert the reader that we will revert to the standard representation starting with Section 2.

#### Background

Recall that our results concern the sign-representation of Boolean
functions and matrices. To properly set the stage for our proofs,
however, we need to consider the more general notion of pointwise
approximation [33]. Let
be a Boolean function of interest. The *-approximate
degree of * denoted is the minimum degree of a
real polynomial that approximates within pointwise:
The
regimes of most interest are *bounded-error approximation*, corresponding
to constants ; and *large-error approximation*,
corresponding to In the former case, the choice
of error parameter is immaterial and affects the
approximate degree of a Boolean function by at most a multiplicative
constant. It is clear that pointwise approximation is a stronger requirement
than sign-representation, and thus for
all

A moment’s thought reveals that threshold degree is in fact the limiting case of

-approximate degree as the error parameter approaches :(1.1) |

Both approximate degree and threshold degree have dual characterizations [44]

, obtained by appeal to linear programming duality. Specifically,

if and only if there is a function with the following two properties: ; and for every polynomial of degree less than . Rephrasing, must have large correlation with but zero correlation with every low-degree polynomial. By weak linear programming duality, constitutes a proof that and for that reason is said to*witness*the lower bound In view of (1.1), this discussion generalizes to threshold degree. The dual characterization here states that if and only if there is a nonzero function with the following two properties: for all and for every polynomial of degree less than . In this dual characterization, agrees in sign with and is additionally orthogonal to polynomials of degree less than The sign-agreement property can be restated in terms of correlation, as . As before, is called a threshold degree

*witness*for

What distinguishes the dual characterizations of approximate degree
and threshold degree is how the dual object relates to .
Specifically, a threshold degree witness must agree in sign with
at every point. An approximate degree witness, on the other hand,
need only exhibit such sign-agreement with at *most* points,
in that the points where the sign of is correct should account
for most of the norm of As a result, constructing
dual objects for threshold degree is significantly more difficult
than for approximate degree. This difficulty is to be expected because
the gap between threshold degree and approximate degree can be arbitrary,
e.g., versus for the majority function on
bits [36].

#### Hardness amplification via block-composition

Much of the recent work on approximate degree and threshold degree
is concerned with composing functions in ways that amplify their hardness.
Of particular significance here is *block-composition*, defined
for functions and as
the Boolean function given by
Block-composition works particularly well for threshold degree. To
use an already familiar example, the block-composition
has threshold degree whereas the constituent functions
and have threshold degree
As a more extreme example, Sherstov [49] obtained
a lower bound of on the threshold degree of the conjunction
of two halfspaces ,
each of which by definition has threshold degree . The fact that
threshold degree can increase spectacularly under block-composition
is the basis of much previous work, including the best previous lower
bounds [50, 52] on the
threshold degree of Apart from threshold degree,
block-composition has yielded strong results for approximate degree
in various error regimes, including direct sum theorems [47],
direct product theorems [46], and error
amplification results [46, 13, 56, 14].

How, then, does one prove lower bounds on the threshold degree or
approximate degree of a composed function ? It is here
that the dual characterizations take center stage: they make it possible
to prove lower bounds *algorithmically*, by constructing the
corresponding dual object for the composed function. Such algorithmic
proofs run the gamut in terms of technical sophistication, from straightforward
to highly technical, but they have some structure in common. In most
cases, one starts by obtaining dual objects and for
the constituent functions and , respectively, either by direct
construction or by appeal to linear programming duality. They are
then combined to yield a dual object for the composed function,
using *dual block-composition [47, 31]*:

(1.2) |

This composed dual object often requires additional work to ensure sign-agreement or correlation with the composed Boolean function. Among the generic tools available to assist in this process is a “corrector” object due to Razborov and Sherstov [39], with the following four properties: (i) is orthogonal to low-degree polynomials; (ii) takes on at a prescribed point of the hypercube; (iii) is bounded on inputs of low Hamming weight; and (iv) vanishes on all other points of the hypercube. Using the Razborov–Sherstov object, suitably shifted and scaled, one can surgically correct the behavior of a given dual object on a substantial fraction of inputs, thus modifying its metric properties without affecting its orthogonality to low-degree polynomials. This technique has played an important role in recent work, e.g., [15, 16, 11, 17].

#### Hardness amplification for approximate degree

While block-composition has produced a treasure trove of results on
polynomial representations of Boolean functions, it is of limited
use when it comes to constructing functions with high *bounded-error*
approximate degree. To illustrate the issue, consider arbitrary functions
and
with -approximate degrees and
respectively, for some and It
is well-known [48] that the composed function
on variables has -approximate degree
This means that relative to the new number of variables, the block-composed
function is asymptotically no harder to approximate to
bounded error than the constituent functions and . In particular,
one cannot use block-composition to transform functions on bits
with -approximate degree at most into functions
on bits with -approximate degree

Until recently, the best lower bound on the bounded-error approximate
degree of was , due to Aaronson
and Shi [1]. Breaking this
barrier was a fundamental problem in its own right, in addition to
being a hard prerequisite for *threshold* degree lower bounds
for better than This barrier was
overcome in a brilliant paper of Bun and Thaler [16],
who proved, for any constant an
lower bound on the -approximate degree of . Their
hardness amplification for approximate degree works as follows. Let
be given, with -approximate degree
for some . Bun and Thaler consider the
block-composition , for
an appropriate parameter As shown in earlier work [47, 13]
on approximate degree, dual block-composition witnesses the lower
bound
Next, Bun and Thaler make the crucial observation that the dual object
for has most of its mass on inputs of Hamming
weight , which in view of (1.2) implies
that the dual object for places most of its mass
on inputs of Hamming weight The authors of [16]
then use the Razborov–Sherstov corrector object to transfer
the small amount of mass that the dual object for
places on inputs of high Hamming weight, to inputs of low Hamming
weight. The resulting dual object for is supported entirely on
inputs of low Hamming weight and therefore witnesses a lower bound
on the -approximate degree of the *restriction* of
to inputs of low Hamming weight. By re-encoding the input to
, one finally obtains a function on variables
with -approximate degree polynomially larger than that of
This passage from to is the desired hardness amplification
for approximate degree. We find it helpful to think of Bun and Thaler’s
technique as block-composition followed by input compression, to reduce
the number of input variables in the block-composed function. To obtain
an lower bound on the approximate degree
of , the authors of [16] start
with a trivial circuit and iteratively apply the hardness amplification
step a constant number of times, until approximate degree
is reached.

In follow-up work, Bun, Kothari, and Thaler [11]
refined the technique of [16] by deriving
optimal concentration bounds for the dual object for They
thereby obtained tight or nearly tight lower bounds on the -approximate
degree of *surjectivity*, *element distinctness*, and other
important problems. The most recent contribution to this line of work
is due to Bun and Thaler [17], who prove an
lower bound on the -approximate
degree of by combining the method of [16]
with Sherstov’s work [46] on direct product
theorems for approximate degree. This near-linear lower bound substantially
strengthens the authors’ previous result [16]
on the *bounded-error* approximate degree of ,
but does not address the threshold degree.

### 1.6. Our approach

#### Threshold degree of AC^{0}

Bun and Thaler [17] refer to obtaining an threshold degree lower bound for as the “main glaring open question left by our work.” It is important to note here that lower bounds on approximate degree, even with the error parameter exponentially close to as in [17], have no implications for threshold degree. For example, there are functions [49] with -approximate degree but threshold degree . Our proof of Theorem 1.1 is unrelated to the most recent work of Bun and Thaler [17] on the large-error approximate degree of and instead builds on their earlier and simpler “block-composition followed by input compression” approach [16]. The centerpiece of our proof is a hardness amplification result for threshold degree, whereby any function with threshold degree for a constant can be transformed efficiently and within into a function with polynomially larger threshold degree.

In more detail, let be a function of interest, with threshold degree . We consider the block-composition where is an appropriate parameter and is the Minsky–Papert function with threshold degree . We construct the dual object for from scratch to ensure concentration on inputs of Hamming weight . By applying dual block-composition to the threshold degree witnesses of and , we obtain a dual object witnessing the threshold degree of . So far in the proof, our differences from [16] are as follows: (i) since our goal is amplification of threshold degree, we work with witnesses of threshold degree rather than approximate degree; (ii) to ensure rapid growth of threshold degree, we use block-composition with inner function of threshold degree , in place of Bun and Thaler’s inner function of threshold degree

Since the dual object for by construction has most of its
norm on inputs of Hamming weight , the
dual object for the composed function has most of its
norm on inputs of Hamming weight . Analogous to [16, 11, 17],
we would like to use the Razborov–Sherstov corrector object
to *remove* the mass that has on the inputs
of high Hamming weight, transferring it to inputs of low Hamming weight.
This brings us to the novel and technically demanding part of our
proof. Previous works [16, 11, 17]
transferred the mass from the inputs of high Hamming weight
to the neighborhood of the all-zeroes input An
unavoidable feature of the Razborov–Sherstov transfer process
is that it amplifies the mass being transferred. When
the transferred mass finally reaches its destination, it overwhelms
’s original values at the local points, destroying ’s
sign-agreement with the composed function . It is
this difficulty that most prevented earlier works [16, 11, 17]
from obtaining a strong threshold degree lower bound for .

We proceed differently. Instead of transferring the mass
of from the inputs of high Hamming weight to the neighborhood
of , we transfer it simultaneously to *exponentially
many* strategically chosen neighborhoods. Split this way across many
neighborhoods, the transferred mass does not overpower the original
values of and in particular does not change any signs. Working
out the details of this transfer scheme requires subtle and lengthy
calculations; it was not clear to us until the end that such a scheme
exists. Once the transfer process is complete, we obtain a witness
for the threshold degree of
restricted to inputs of low Hamming weight. Compressing the input
as in [16, 11], we obtain
an amplification theorem for threshold degree. With this work behind
us, the proof of Theorem 1.1 for any depth
amounts to starting with a trivial circuit and amplifying its
threshold degree times.

#### Sign-rank of AC^{0}

It is not known how to “lift” a threshold degree lower bound in
a black-box manner to a sign-rank lower bound. In particular, Theorem 1.1
has no implications a priori for the sign-rank of .
Our proofs of Theorems 1.2 and 1.3
are completely disjoint from Theorem 1.1 and
are instead based on a stronger approximation-theoretic quantity that
we call *-smooth threshold degree.* Formally, the -smooth
threshold degree of a Boolean function is the
largest for which there is a nonzero function
with the following two properties:
for all and for every polynomial
of degree less than . Taking in this formalism,
one recovers the standard dual characterization of the threshold degree
of In particular, threshold degree is synonymous with -smooth
threshold degree. The general case of -smooth threshold degree
for requires threshold degree witnesses that are
*min-smooth*, in that the absolute value of at any given
point is at least a fraction of the average absolute value
of over all points. A substantial advantage of *smooth*
threshold degree is that it has immediate sign-rank implications.
Specifically, any lower bound of on the -smooth threshold
degree can be converted efficiently and in a black-box manner into
a sign-rank lower bound of , using a combination of
the pattern matrix method [42, 44]
and Forster’s spectral lower bound on sign-rank [21, 22].
Accordingly, we obtain Theorems 1.2 and 1.3
by proving an lower bound on the -smooth
threshold degree of , for any constant .

At the core of our result is an amplification theorem for smooth threshold
degree, whose repeated application makes it possible to prove arbitrarily
strong lower bounds for . Amplifying smooth threshold
degree is a complex juggling act due to the presence of two parameters—degree
and smoothness—that must evolve in coordinated fashion.
The approach of Theorem 1.1 is not useful
here because the threshold degree witnesses that arise from the proof
of Theorem 1.1 are highly nonsmooth. In more
detail, when amplifying the threshold degree of a function as
in the proof of Theorem 1.1, two phenomena
adversely affect the smoothness parameter. The first is block-composition
itself as a composition technique, which in the regime of interest
to us transforms *every* threshold degree witness for into
a hopelessly nonsmooth witness for the composed function. The other
culprit is the input compression step, which re-encodes the input
and thereby affects the smoothness in ways that are hard to control.
To overcome these difficulties, we develop a novel approach based
on what we call *local smoothness.*

Formally, let be a function of interest.
For a subset and a real number we
say that is *-smooth on * if
for all Put another way, for any two points of
at distance the corresponding values of
differ in magnitude by a factor of at most In and of itself,
a locally smooth function need not be min-smooth because for
a pair of points that are far from each other, the corresponding -values
can differ by many orders of magnitude. However, locally smooth functions
exhibit extraordinary plasticity. Specifically, we show how to modify
a locally smooth function’s metric properties—such as its
support or the distribution of its mass—without
the change being detectable by low-degree polynomials. This apparatus
makes it possible to restore min-smoothness to the dual object
that results from the block-composition step and preserve that min-smoothness
throughout the input compression step, eliminating the two obstacles
to min-smoothness in the earlier proof of Theorem 1.1.
The new block-composition step uses a *locally smooth* witness
for the threshold degree of , which needs to be built from
scratch and is quite different from the witness in the proof of Theorem 1.1.

Our described approach departs considerably from previous work on
the sign-rank of constant-depth circuits [39, 15, 17].
The analytic notion in those earlier papers is weaker than -smooth
threshold degree and in particular allows the dual object to be *arbitrary*
on a fraction of the inputs. This weaker property is acceptable
when the main result is proved in one shot, with a closed-form construction
of the dual object. By contrast, we must construct dual objects iteratively,
with each iteration increasing the degree parameter and proportionately
decreasing the smoothness parameter. This iterative process requires
that the dual object in each iteration be min-smooth on the entire
domain. Perhaps unexpectedly, we find -smooth threshold degree
easier to work with than the weaker notion in previous work [39, 15, 17].
In particular, we are able to give a new and short proof of the
lower bound on the sign-rank of , originally obtained
by Razborov and Sherstov [39] with a much more complicated
approach. The new proof can be found in Section 5.1,
where it serves as a prelude to our main result on the sign-rank of
.

## 2. Preliminaries

### 2.1. General

For a string and a set
we let denote the restriction of to the indices in
In other words,
where are the elements of The
*characteristic function* of a set
is given by

For a logical condition we use the Iverson bracket

We let denote the set of natural numbers. The following well-known bound [24, Proposition 1.4] is used in our proofs without further mention:

(2.1) |

where denotes Euler’s number.

We adopt the extended real number system in all calculations, with the additional convention that We use the comparison operators in a unary capacity to denote one-sided intervals of the real line. Thus, stand for respectively. We let and stand for the natural logarithm of and the logarithm of to base respectively. We use the following two versions of the sign function:

The term *Euclidean space* refers to for some positive
integer We let

denote the vector whose

th component is and the others are Thus, the vectors form the standard basis for For vectors and we write to mean that for each The relations on vectors are defined analogously.We frequently omit the argument in equations and inequalities involving functions, as in . Such statements are to be interpreted pointwise. For example, the statement “ on ” means that for every The positive and negative parts of a function are denoted and , respectively.

### 2.2. Boolean functions and circuits

We view Boolean functions as mappings for some finite
set More generally, we consider *partial* Boolean functions
with the output value used for don’t-care
inputs. The negation of a Boolean function is denoted as usual
by The familiar functions
and are given by
and We abbreviate
The generalized *Minsky–Papert function*
is given by
We abbreviate which is the right setting
of parameters for most of our applications.

We adopt the standard notation for function composition, with
defined by In addition, we use the
operator to denote the *componentwise* composition of Boolean
functions. Formally, the componentwise composition of
and is the function
given by
To illustrate, Componentwise composition
is consistent with standard composition, which in the context of Boolean
functions is only defined for Thus, the meaning of
is determined by the range of and is never in doubt. Componentwise
composition generalizes in the natural manner to partial Boolean functions
and , as follows:

Compositions of three or more functions, where each instance of the operator can be standard or componentwise, are well-defined by associativity and do not require parenthesization.

For Boolean strings we let denote their
bitwise XOR. The strings and are defined analogously,
with the binary connective applied bitwise. A *Boolean circuit
* in variables is a circuit with inputs
and gates
and . The circuit is *monotone* if it does
not use any of the negated inputs
The *fan-in* of is the maximum in-degree of any
or gate. Unless stated otherwise, we place no restrictions
on the gate fan-in. The *size* of is the number of
and gates. The *depth* of is the maximum number
of and gates on any path from an input to the circuit
output. With this convention, the circuit that computes
has depth . The circuit class consists of function
families such that
is computed by a Boolean circuit of size at most and depth
at most , for some constant and all We specify
small-depth layered circuits by indicating the type of gate used in
each layer. For example, an *AND-OR-AND circuit* is a depth-
circuit with the top and bottom layers composed of gates,
and middle layer composed of gates. A *Boolean formula
*is a Boolean circuit in which every gate has fan-out . Common
examples of Boolean formulas are DNF and CNF formulas.

### 2.3. Norms and products

For a set we let denote the linear space of real-valued
functions on The *support* of a function
is denoted For real-valued functions
with finite support, we adopt the usual norms and inner product:

This covers as a special case functions on finite sets. The *tensor
product* of and is denoted
and given by

The tensor product

( times) is abbreviated For a subset and a function we define by As extremal cases, we have and Tensor product notation generalizes naturally to*sets*of functions: and A

*conical combination*of is any function of the form where are nonnegative reals. A

*convex combination*of is any function where are nonnegative reals that sum to The

*conical hull*of , denoted is the set of all conical combinations of functions in The

*convex hull*, denoted , is defined analogously as the set of all convex combinations of functions in For any set of functions we have

(2.2) |

Throughout this manuscript, we view probability distributions as real functions. This convention makes available the shorthands introduced above. In particular, for probability distributions

and the symbol denotes the support of , and denotes the probability distribution given by If is a probability distribution on we consider to be defined also on any superset of with the understanding that outside We let denote the family of all finitely supported probability distributions on Most of this paper is concerned with the distribution family and its subfamilies, each of which we will denote with a Fraktur letter.Analogous to functions, we adopt the familiar norms for vectors
in Euclidean space: and
The latter norm is particularly
prominent in this paper, and to avoid notational clutter we use
interchangeably with . We refer to as
the *weight* of For any sets and
we define

Comments

There are no comments yet.