Algorithmic Polynomials

01/14/2018 ∙ by Alexander A. Sherstov, et al. ∙ 0

The approximate degree of a Boolean function f(x_1,x_2,...,x_n) is the minimum degree of a real polynomial that approximates f pointwise within 1/3. Upper bounds on approximate degree have a variety of applications in learning theory, differential privacy, and algorithm design in general. Nearly all known upper bounds on approximate degree arise in an existential manner from bounds on quantum query complexity. We develop a first-principles, classical approach to the polynomial approximation of Boolean functions. We use it to give the first constructive upper bounds on the approximate degree of several fundamental problems: - O(n^3/4-1/4(2^k-1)) for the k-element distinctness problem; - O(n^1-1/k+1) for the k-subset sum problem; - O(n^1-1/k+1) for any k-DNF or k-CNF formula; - O(n^3/4) for the surjectivity problem. In all cases, we obtain explicit, closed-form approximating polynomials that are unrelated to the quantum arguments from previous work. Our first three results match the bounds from quantum query complexity. Our fourth result improves polynomially on the Θ(n) quantum query complexity of the problem and refutes the conjecture by several experts that surjectivity has approximate degree Ω(n). In particular, we exhibit the first natural problem with a polynomial gap between approximate degree and quantum query complexity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Let be a given Boolean function, defined on a subset The -approximate degree of , denoted is the minimum degree of a multivariate real polynomial such that for all The standard setting of the error parameter for most applications is , an aesthetically motivated constant that can be replaced by any other in at the expense of a constant-factor increase in approximate degree. The notion of approximate degree originated 25 years ago in the pioneering work of Nisan and Szegedy [43] and has since proved to be a powerful and versatile tool in theoretical computer science. Lower bounds on approximate degree have complexity-theoretic applications, whereas upper bounds are a tool in algorithm design. In the former category, the notion of approximate degree has enabled spectacular progress in circuit complexity [46, 57, 12, 8, 35, 36, 52, 10], quantum query complexity [9, 15, 3, 1, 4, 32, 20], and communication complexity [16, 47, 19, 52, 53, 48, 38, 23, 50, 10, 56, 55]. On the algorithmic side, approximate degree underlies many of the strongest results obtained to date in computational learning [58, 34, 33, 31, 44, 7], differentially private data release [59, 22], and algorithm design in general [39, 30, 51].

Despite these applications, progress in understanding approximate degree as a complexity measure has been slow and difficult. With very few exceptions [43, 30, 51, 54], all known upper bounds on approximate degree arise from quantum query algorithms. The connection between approximate degree and quantum query complexity was discovered by Beals et al. [9]

, who proved that the acceptance probability of an algorithm that makes

queries is representable by a real polynomial of degree . Put another way, every quantum algorithm implies an approximating polynomial of comparable complexity for the problem in question. Since the seminal work of Beals et al., essentially all upper bounds on approximate degree have come from quantum query algorithms, e.g., [15, 60, 6, 28, 7, 27, 26, 13, 40]. An illustrative example is the problem of determining the approximate degree of Boolean formulas of size posed in 2003 by O’Donnell and Servedio [44]. Progress on this question was stalled for a long time until it was finally resolved by Ambainis et al. [7], who built on the work of Farhi et al. [28] to give a near-optimal quantum query algorithm for any Boolean formula.

While quantum query complexity has been a fruitful source of approximate degree upper bounds, the exclusive reliance on quantum techniques for the polynomial approximation of Boolean functions is problematic. For one thing, a quantum query algorithm generally does not give any information about the approximating polynomial apart from its existence. For example, converting the quantum algorithms of [6, 7, 13] to polynomials results in expressions so large and complicated that they are no longer meaningful. More importantly, quantum query algorithms are more constrained objects than real polynomials, and an optimal query algorithm for a given problem may be far less efficient than a polynomial constructed from scratch. Given the many unresolved questions on approximate degree, there is a compelling need for polynomial approximation techniques that go beyond quantum query complexity.

In this paper, we take a fresh look at several breakthrough upper bounds for approximate degree, obtained over the years by sophisticated quantum query algorithms. In each case, we are able to construct an approximating polynomial from first principles that matches or improves on the complexity of the best quantum algorithm. All of our constructions produce explicit, closed-form polynomials that are unrelated to the corresponding quantum algorithms and are in the author’s opinion substantially simpler. In one notable instance, our construction achieves a polynomial improvement on the complexity of the best possible quantum algorithm, refuting a conjecture [21] on the approximate degree of that problem and exhibiting the first natural example of a polynomial gap between approximate degree and quantum query complexity. Our proofs, discussed shortly, contribute novel techniques to the area.

1.1. k-Element distinctness

The starting point in our work is the element distinctness problem [17, 3, 6, 4, 37, 13], which is one of the most studied questions in quantum query complexity and a major success story of the field. The input to the problem is a list of elements from a given range of size and the objective is to determine if the elements are pairwise distinct. A well-studied generalization of this problem is -element distinctness, where is an arbitrary constant and the objective is to determine if some -tuple of the elements are identical. Formally, the input to element distinctness and -element distinctness is represented by a Boolean matrix in which every row has precisely one “” entry, corresponding to the value of the th element.111Alternately, the input can be represented by a string of bits. Switching to this more compact representation changes the complexity of the problem by a factor of at most which is negligible in all settings of interest. Aaronson and Shi [3], Ambainis [4], and Kutin [37] showed that element distinctness has quantum query complexity In follow-up work, Ambainis [6] gave a quantum algorithm for element distinctness with queries, matching the lower bound in [3, 4, 37]. For the more general problem of -element distinctness, Ambainis’s algorithm [6] requires queries. Using a different approach, Belovs [13] gave a polynomially faster algorithm for -element distinctness, with query complexity . Belovs’s algorithm is currently the fastest known.

The algorithms of Ambainis [6] and Belovs [13] are highly nontrivial. The former is based on a quantum walk on the Johnson graph, whereas the latter uses the framework of learning graphs. We give an elementary, closed-form construction of an approximating polynomial for -element distinctness that bypasses the quantum work. Formally, let be given by

The notation for the domain of this function indicates that we allow arbitrary input matrices of Hamming weight at most , with no restriction on the placement of the “1” bits. This is of course a problem more general than -element distinctness. We prove:

Theorem 1.1 (-element distinctness).

Let be a fixed integer. Then for all

Moreover, the approximating polynomial is given explicitly in each case.

Theorem 1.1 matches the quantum query bound of due to Belovs [13] and further generalizes it to every

1.2. k-Subset sum, k-DNF and k-CNF formulas

Another well-studied problem in quantum query complexity is -subset sum [25, 14]. The input to this problem is a list of elements from a given finite Abelian group and the objective is to determine whether there is a -tuple of elements that sum to Formally, the input is represented by a matrix with precisely one “1” entry in every row. Childs and Eisenberg [25] contributed an alternate analysis of Ambainis’s algorithm for -element distinctness [6] and showed how to adapt it to compute -subset sum or any other function property with -certificate complexity at most In particular, any such problem has an approximating polynomial of degree We give a first-principles construction of an approximating polynomial for any problem in this class, using techniques that are elementary and unrelated to the quantum work of Ambainis [6] and Childs and Eisenberg [25]. Our result is more general:

Theorem 1.2 (-DNF and -CNF formulas).

Let be a fixed integer. Let be representable on its domain by a -DNF or -CNF formula. Then

Moreover, the approximating polynomial is given explicitly in each case.

Recall that a -DNF formula in Boolean variables is the disjunction of an arbitrary number of terms, where each term is the conjunction of at most literals from among An essential aspect of Theorem 1.2 is that the approximate degree upper bound depends only on the Hamming weight of the input and does not depend at all on the number of variables , which can be arbitrarily large. Several special cases of Theorem 1.2 are worth noting. The theorem clearly applies to -subset sum, which is by definition representable on its domain by a -DNF formula. Moreover, in the terminology of Childs and Eisenberg [25], Theorem 1.2 applies to any function property with -certificate complexity at most . Finally, taking shows that Theorem 1.2 applies to any function representable by a -DNF or -CNF formula.

1.3. Surjectivity

While our proofs of Theorems 1.1 and 1.2 are significantly simpler than their quantum query counterparts, they do not give a quantitative improvement on previous work. This brings us to our next result. In the surjectivity problem [11], the input is a list of elements from a given range of size where The objective is to determine whether the input features all elements of the range. In function terminology, the input represents a mapping and the objective is to determine whether the mapping is surjective. As usual in the quantum query literature, the input is represented by a Boolean matrix in which every row has precisely one “” entry. Beame and Machmouchi [11] proved that for the surjectivity problem has the maximum possible quantum query complexity, namely, This led several experts to conjecture that the approximate degree of surjectivity is also ; see, e.g., [21]. The conjecture was significant because its resolution would give the first circuit with approximate degree closing a long line of research [43, 3, 4, 21].

Surprisingly, we are able to show that surjectivity has an approximating polynomial of substantially lower degree, regardless of the range parameter . Formally, let be given by

In keeping with our other results, our definition of allows arbitrary input matrices of Hamming weight at most In this generalization of the surjectivity problem, the input can be thought of as an arbitrary relation rather than a function. We prove:

Theorem 1.3 (Surjectivity).

For all positive integers and

Moreover, the approximating polynomial is given explicitly in each case.

In particular, the theorem gives an approximating polynomial of degree for all This upper bound is polynomially smaller than the problem’s quantum query complexity for While explicit functions with a polynomial gap between approximate degree and quantum query complexity have long been known [5, 2], Theorem 1.3 exhibits the first natural function with this property. The functions in previous work [5, 2] were constructed with the specific purpose of separating complexity measures.

1.4. Symmetric functions

Key building blocks in our proofs are symmetric functions A classic result due to Paturi [45] states that the -approximate degree of any such function is where is the smallest number such that is constant on inputs of Hamming weight in When a symmetric function is used in an auxiliary role as part of a larger construction, it becomes important to have approximating polynomials for every possible setting of the error parameter, . A complete characterization of the -approximate degree of symmetric functions for all was obtained by de Wolf [60], who sharpened previous bounds [30, 15, 51] using an elegant quantum query algorithm. Prior to our work, no classical, first-principles proof was known for de Wolf’s characterization, which is telling in view of the basic role that and other symmetric functions play in the area. We are able to give such a first-principles proof—in fact, three of them.

Theorem 1.4 (Symmetric functions).

Let be a symmetric function. Let be an integer such that is constant on inputs of Hamming weight in Then for

Moreover, the approximating polynomial is given explicitly in each case.

Theorem 1.4 matches de Wolf’s quantum query result, tightly characterizing the -approximate degree of every nonconstant symmetric function.

1.5. Our techniques

Our proofs use only basic tools from approximation theory, such as Chebyshev polynomials. Our constructions additionally incorporate elements of classic algorithm design, e.g., the divide-and-conquer paradigm, the inclusion-exclusion principle, and probabilistic reasoning. The title of our paper, “Algorithmic Polynomials,” is a reference to this combination of classic algorithmic methodology and approximation theory. The informal message of our work is that algorithmic polynomials are not only more powerful than quantum algorithms but also easier to construct. A detailed discussion of Theorems 1.11.4 follows.

Extension theorem.

As our starting point, we prove an extension theorem for polynomial approximation. This theorem allows one to construct an approximant for a given function using an approximant for a restriction of In more detail, let be an arbitrary function, defined on inputs of Hamming weight at most Let be the natural extension of to inputs of Hamming weight at most defined by outside the domain of From an approximation-theoretic point of view, a fundamental question to ask is how to efficiently “extend” any approximant for to an approximant for Unfortunately, this naïve formulation of the extension problem has no efficient solution; we describe a counterexample in Section 3. We are able to show, however, that the extension problem becomes meaningful if one works with instead of . In other words, we give an efficient, explicit, black-box transformation of any approximant for the extension into an approximant for the extension , for any . This result is essentially as satisfying as the “ideal” extension theorem in that the domains of and almost coincide and can be arbitrarily smaller than the domain of . Our proof makes use of extrapolation bounds, extremal properties of Chebyshev polynomials, and ideas from rational approximation theory.

Symmetric functions.

As mentioned earlier, we give three proofs of Theorem 1.4 on the

-approximate degree of symmetric functions. Each of the three proofs is fully constructive. Our simplest proof uses the extension theorem and is only half-a-page long. Here, we use brute-force interpolation to compute the function

of interest on inputs of small Hamming weight, and then apply the extension theorem to effortlessly extend the interpolant to the full domain of Our second proof of Theorem 1.4 is an explicit, closed-form construction that uses Chebyshev polynomials as its only ingredient. This proof is a refinement of previous, suboptimal approximants for the AND function [30, 51]. We eliminate the inefficiency in previous work by using Chebyshev polynomials to achieve improved control at every point of the domain. Finally, our third proof of Theorem 1.4 is inspired by combinatorics rather than approximation theory. Here, we use a sampling experiment to construct an approximating polynomial for any symmetric function from an approximating polynomial for AND. In more detail, the experiment allows us to interpret as a linear combination of conjunctions of arbitrary degree, where the sum of the absolute values of the coefficients is reasonably small. Once such a representation is available, we simply replace every conjunction with its approximating polynomial. These substitutions increase the error of the approximation by a factor bounded by the sum of the absolute values of the coefficients in the original linear combination, which is negligible.

k-Element distinctness, k-DNF and k-CNF formulas.

We first establish an auxiliary result on the approximate degree of composed Boolean functions. Specifically, let be given by for some set and some functions We bound the -approximate degree of in terms of the approximate degree of , maximized over all sets of certain size. Crucially for our applications, the bound that we derive has no dependence on The proof uses Chebyshev polynomials and the inclusion-exclusion principle. Armed with this composition theorem, we give a short proof of Theorem 1.2 on the approximate degree of -DNF and -CNF formulas. The argument proceeds by induction on with the composition theorem invoked to implement the inductive step. The proof of Theorem 1.1 on the approximate degree of -element distinctness is more subtle. It too proceeds by induction, with the composition theorem playing a central role. This time, however, the induction is with respect to both and the range parameter and the extension theorem is required to complete the inductive step. We note that we are able to bound the -approximate degree of -DNF formulas and -element distinctness for every setting of the error parameter , rather than just in Theorems 1.1 and 1.2.

Surjectivity.

Our proof of Theorem 1.3 is surprisingly short, given how improbable the statement was believed to be. As one can see from the defining equation for , this function is the componentwise composition restricted to inputs of Hamming weight at most With this in mind, we start with a degree- polynomial that approximates pointwise within The approximant in question is simply a scaled and shifted Chebyshev polynomial. It follows that the componentwise composition , restricted to inputs of Hamming weight at most approximates pointwise within . We are not finished, however, because the degree of is unacceptably large. Moving on, a few lines of algebra reveal that is a linear combination of conjunctions in which the absolute values of the coefficients sum to . It remains to approximate each of these conjunctions pointwise within by a polynomial of degree for which we use our explicit approximant from Theorem 1.4 along with the guarantee that the input has Hamming weight at most The proof of Theorem 1.3 is particularly emblematic of our work in its interplay of approximation-theoretic methodology (Chebyshev polynomials, linear combinations) and algorithmic thinking (reduction of the problem to the approximation of individual conjunctions).

We are pleased to report that our upper bound for the surjectivity problem has just sparked further progress in the area by Bun, Kothari, and Thaler [20], who prove tight or nearly tight lower bounds on the approximate degree of several key problems in quantum query complexity. In particular, the authors of [20] prove that our upper bound for surjectivity is tight. We are confident that the ideas of our work will inform future research as well.

2. Preliminaries

We start with a review of the technical preliminaries. The purpose of this section is to make the paper as self-contained as possible, and comfortably readable by a broad audience. The expert reader may wish to skim it for the notation or skip it altogether.

2.1. Notation

We view Boolean functions as mappings for some finite set This arithmetization of the Boolean values “true” and “false” makes it possible to use Boolean operations in arithmetic expressions, as in The familiar functions and are given by and The negation of a Boolean function is denoted as usual by The composition of and is denoted , with

For a string we denote its Hamming weight by We use the following notation for strings of Hamming weight at most greater than and exactly :

For a string and a set we let denote the restriction of to the indices in In other words, where are the elements of

The characteristic vector of a subset

is denoted

We let and For a set and a real number we define

We analogously define and We let and stand for the natural logarithm of and the logarithm of to base respectively. The following bound is well known [29, Proposition 1.4]:

(2.1)

where denotes Euler’s number. For a logical condition we use the Iverson bracket notation

For a function on a finite set we use the standard norms

2.2. Approximate degree

Recall that the total degree of a multivariate real polynomial , denoted is the largest degree of any monomial of We use the terms “degree” and “total degree” interchangeably in this paper. This paper studies the approximate representation of functions of interest by polynomials. Specifically, let be a given function, for a finite subset Define

where the minimum is over polynomials of degree at most In words, is the least error to which can be approximated by a real polynomial of degree at most . For a real number the -approximate degree of is defined as

Thus, is the least degree of a real polynomial that approximates pointwise to within We refer to any such polynomial as a uniform approximant for with error . In the study of Boolean functions , the standard setting of the error parameter is . This constant is chosen mostly for aesthetic reasons and can be replaced by any other constant in at the expense of a constant-factor increase in approximate degree. The following fact on the exact representation of functions by polynomials is well known.

Fact 2.1.

For every function

Proof.

The proof is by induction on The base case is trivial since is then a constant function. For the inductive step, let be arbitrary. By the inductive hypothesis, there is a polynomial of degree at most such that for inputs of Hamming weight at most Define

For any fixed input with every term in the summation over evaluates to zero and therefore For any fixed input with on the other hand, the summation over contributes precisely one nonzero term, corresponding to As a result, in that case. ∎

2.3. Inclusion-exclusion

All Boolean, arithmetic, and relational operations on functions in this paper are to be interpreted pointwise. For example, refers to the mapping Similarly, is the pointwise product of . Recall that in the case of Boolean functions, we have The well-known inclusion-exclusion principle, stated in terms of Boolean functions asserts that

We will need the following less common form of the inclusion-exclusion principle, where the AND and OR operators are interchanged.

Fact 2.2.

For any and any Boolean functions

Proof.

We have

where the third step uses the fact that half of the subsets of

have odd cardinality and the other half have even cardinality. ∎

2.4. Symmetrization

Let denote the symmetric group on elements. For a permutation and a string we adopt the shorthand A function is called symmetric if it is invariant under permutations of the input variables: for all and Symmetric functions on are intimately related to univariate polynomials, as borne out by Minsky and Papert’s symmetrization argument [42].

Proposition 2.3 (Minsky and Papert).

Let be a polynomial of degree Then there is a univariate polynomial of degree at most such that for all

Minsky and Papert’s result generalizes to block-symmetric functions, as pointed out in [48, Prop. 2.3]:

Proposition 2.4.

Let be positive integers. Let be a polynomial of degree Then there is a polynomial of degree at most such that for all

Proposition 2.4 follows in a straightforward manner from Proposition 2.3 by induction on the number of blocks,

2.5. Chebyshev polynomials

Recall from Euler’s identity that

(2.2)

where denotes the imaginary unit. Multiplying out the left-hand side and using we obtain a univariate polynomial of degree such that

(2.3)

This unique polynomial is the Chebyshev polynomial of degree . The representation (2.3) immediately reveals all the roots of , and all the extrema of in the interval :

(2.4)
(2.5)
(2.6)

The extremum at is of particular significance, and we note it separately:

(2.7)

In view of (2.2), the defining equation (2.3) implies that

so that the leading coefficient of for is given by . As a result, we have the factored representation

(2.8)

By (2.2) and (2.3),

whence

(2.9)

The following fundamental fact follows from (2.9) by elementary calculus.

Fact 2.5 (Derivative of Chebyshev polynomials).

For any integer and real

Together, (2.9) and Fact 2.5 give the following useful lower bound for Chebyshev polynomials on

Proposition 2.6.

For any integer

Proof.

The first bound follows from the intermediate value theorem in view of (2.7) and Fact 2.5. For the second bound, use (2.9) to write

where the last step uses for

2.6. Coefficient bounds for univariate polynomials

We let stand for the set of univariate polynomials of degree at most For a univariate polynomial we let denote the sum of the absolute values of the coefficients of Then is a norm on the real linear space of polynomials, and it is in addition submultiplicative:

Fact 2.7.

For any polynomials and

  1. with equality if and only if

  2. for any real

Proof.

All four properties follow directly from the definition. ∎

We will need a bound on the coefficients of a univariate polynomial in terms of its degree and its maximum absolute value on the interval This fundamental problem was solved in the nineteenth century by V. A. Markov [41, p. 81], who proved an upper bound of

(2.10)

on the size of the coefficients of any degree- polynomial that is bounded on in absolute value by  Markov further showed that (2.10) is tight. Rather than appeal to this deep result in approximation theory, we will use the following weaker bound that suffices for our purposes.

Lemma 2.8.

Let be a univariate polynomial of degree . Then

(2.11)

Lemma 2.8 is a cosmetic modification of a lemma from [54], which in our notation states that for . We include a detailed proof for the reader’s convenience.

Proof of Lemma 2.8..

We use a common approximation-theoretic technique [24, 49] whereby one expresses as a linear combination of more structured polynomials and analyzes the latter objects. For this, define by

One easily verifies that these polynomials behave like delta functions, in the sense that for

Lagrange interpolation gives

(2.12)

By Fact 2.7,

(2.13)

Now

where the first step uses (2.12), (2.13), and Fact 2.7. ∎

2.7. Coefficient bounds for multivariate polynomials

Let be a multivariate polynomial. Analogous to the univariate case, we let denote the sum of the absolute values of the coefficients of Fact 2.7 is clearly valid in this multivariate setting as well. Recall that a multivariate polynomial is multilinear if it has degree at most  in each variable. The following result is an analogue of Lemma 2.8.

Lemma 2.9.

Let be a symmetric multilinear polynomial. Then

Proof.

Abbreviate and write

where are real coefficients. For let

denote the Bernoulli distribution with success probability

Then

where the second and third steps use Lemma 2.8 and multilinearity, respectively. ∎

The following lemma, due to Razborov and Sherstov [48, Lemma 3.2], bounds the value of a polynomial at a point of large Hamming weight in terms of ’s values at points of low Hamming weight.

Lemma 2.10 (Extrapolation lemma).

Let be an integer, Let be a polynomial of degree at most Then

As one would expect, one can sharpen the bound of Lemma 2.10 by maximizing over a larger neighborhood of the Boolean hypercube than . The resulting bound is as follows.

Lemma 2.11 (Generalized extrapolation lemma).

Fix positive integers Let be a polynomial of degree at most Then

One recovers Lemma 2.10 as a special case by taking and

Proof of Lemma 2.11..

Consider an arbitrary vector of Hamming weight and abbreviate Let be a partition of such that for all Observe that

(2.14)

Define by

Then clearly

(2.15)
(2.16)

Moreover, the mapping is a real polynomial on of degree at most As a result,

where the first step uses (2.15); the second step follows by (2.14) and Lemma 2.10; and the third step is valid by (2.16). ∎

2.8. The conjunction norm

Recall that a conjunction in Boolean variables is the AND of some subset of the literals Analogously, a disjunction is the OR of some subset of We regard conjunctions and disjunctions as Boolean functions and in particular as a special case of real functions For a subset and a function , we define the conjunction norm to be the minimum such that

for some integer some conjunctions and some real coefficients with Our choice of the symbol for “product,” is motivated by the view of conjunctions as products of literals. In particular, we have for any multivariate polynomial The next proposition shows that is a norm on the space of multivariate real functions and establishes other useful properties of this complexity measure.

Proposition 2.12 (Conjunction norm).

Let be given functions, for a nonempty set