Random restrictions and PRGs for PTFs in Gaussian Space

03/25/2021 ∙ by Zander Kelley, et al. ∙ University of Illinois at Urbana-Champaign 0

A polynomial threshold function (PTF) f:ℝ^n →ℝ is a function of the form f(x) = 𝗌𝗂𝗀𝗇(p(x)) where p is a polynomial of degree at most d. PTFs are a classical and well-studied complexity class with applications across complexity theory, learning theory, approximation theory, quantum complexity and more. We address the question of designing pseudorandom generators (PRG) for polynomial threshold functions (PTFs) in the gaussian space: design a PRG that takes a seed of few bits of randomness and outputs a n-dimensional vector whose distribution is indistinguishable from a standard multivariate gaussian by a degree d PTF. Our main result is a PRG that takes a seed of d^O(1)log ( n / ε)log(1/ε)/ε^2 random bits with output that cannot be distinguished from n-dimensional gaussian distribution with advantage better than ε by degree d PTFs. The best previous generator due to O'Donnell, Servedio, and Tan (STOC'20) had a quasi-polynomial dependence (i.e., seedlength of d^O(log d)) in the degree d. Along the way we prove a few nearly-tight structural properties of restrictions of PTFs that may be of independent interest.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Polynomial threshold functions (PTFs) are a classical and well-studied class of functions with several applications in complexity theory, learning theory, theory of approximation and more. Here we study the question of designing pseudorandom generators (PRGs) that fool test functions that are PTFs. We first start with some standard definitions. Let be defined as if and otherwise.

Definition 1.1.

For an integer , a degree PTF is a function of the form , where is a polynomial of degree at most .

Our goal is to design a PRG that takes few bits of randomness and outputs a high-dimensional vector whose distribution is indistinguishable from a standard multivariate gaussian by any low-degree PTF. Specifically:

Definition 1.2.

A function is a pseudorandom generator for degree PTFs with error if for every degree at most PTF ,

We call the seedlength of the generator and say -fools degree PTFs with respect to the gaussian distribution 333We will drop the latter phrase when there is no ambiguity.. We say is explicit if its output can be computed in time polynomial in .

(Here, and henceforth, denotes a uniformly random element from a multi-set , and

denotes the standard univariate gaussian distribution of variance

.)

Of particular interest is the boolean case

where the target distribution is not gaussian but uniform distribution on the hypercube

. It is known that the boolean case is stronger than the gaussian case (a PRG for the former implies a PRG for the latter). As such, besides being interesting by itself, the gaussian case above has been an important intermediate step in constructing PRGs in the boolean case. In particular, achieving parameters as we do for the boolean case would be a major achievement (as we do not currently have non-trivial correlation lower bounds against PTFs of degree ).

Over the last several years, the question of designing PRGs for PTFs has received a lot of attention. Meka and Zuckerman [MZ13] gave the first non-trivial PRG for bounded degree PTFs with a seedlength of for the boolean and gaussian cases. Independent of [MZ13], [DKN10] showed that bounded independence fools degree- PTFs leading to seedlength . Since then, there have been several other works which make progress on the gaussian case [KAN11a, KAN11b, KAN12, KAN14, KAN15]. The seedlength in all of these works had an exponential dependence on the degree of the PTF. In particular, until recently no non-trivial PRGs (i.e., seedlength ) were known for PTFs of degree . In a remarkable recent work, O’Donnell, Servedio, Tan [OST20] got around this exponential dependence on the degree , achieving a seedlength of . Our work builds on their work (which in turn builds on a framework of [KAN11a]).

1.1 Main Results

Our main result is a PRG that -fools -variate degree- PTFs with error at most :

Theorem 1.3 (PRG for PTFs).

There exist constants such that for all and , there exists an explicit PRG that -fools -variate degree PTFs with respect to the gaussian distribution with seedlength .

Towards proving the above result, we develop several structural results on PTFs in the gaussian space that we now expand on. Besides these structural results, we additionally show how to use our structural results to carry out the analysis of the PRG in a simpler way when compared to [KAN11a, OST20]. We will expand on this in Section 2 when discussing our analysis.

Gaussian restrictions of PTFs.

Our main result above relies on new structural results about PTFs which might be of additional interest. The results are similar in spirit to switching lemmas that try to show that certain classes of functions simplify significantly under random restrictions. Switching lemmas and random restrictions are a cornerstone in complexity theory, and our approach relies on an analogue for the continuous world as studied in [KAN11a, OST20].

In the boolean case, i.e., when studying distributions on the hypercube , a restriction is a partial assignment of the form with the understanding that the -variables are free. Typically, restrictions as above are parametrized by , the fraction of

’s. In our case, we are working with real-valued random variables and the multivariate gaussian distribution. What should the right analogue be?

The answer comes from the work of [OST20] who introduced the notion of a zoom of a polynomial. To draw a clearer parallel with random restrictions, we term these gaussian restrictions:

Definition 1.4.

Given a function and , and a restriction parameter , let be444As the value of will often be clear, we will in fact just use for brevity. the function .

Intuitively, we can view as a restriction where -fraction of the variance is already fixed. (Note that for independent , is distributed as .)

A crucial conceptual ingredient in our analysis is the following lemma saying that PTFs become almost constant under gaussian restrictions for :

Corollary 1.5.

There is a constant such that the following holds. For any , and , we have that for any degree- PTF

, with probability at least

over , the gaussian restriction of the PTF () is nearly fixed to a constant, in the sense that for some ,

That is, if , then with probability over , the restricted PTF yields the same fixed value with probability over .

The work of [OST20] achieves a similar conclusion but when the restriction parameter is as opposed to being polynomially small as above. This improved significantly on the work of [KAN11a] that implicitly shows a similar claim when the restriction parameter is .

We remark that in a related line of work, [BLY09, HKM14, DRS+14, KKL17] study random restrictions of PTFs over the hypercube. Our focus here is on gaussian restrictions and obtaining stronger bounds quantitatively: these works had exponential dependence on the degree d.

The above statement while conceptually nice is not enough for our analysis of the PRG. The analysis relies on a more refined notion of hypervariance of a polynomial that was introduced in [OST20]. This analytical notion is best described in terms of the Hermite expansion of a polynomial. We next expand on this and a related statement about derivatives, creftype 1.8, that may be of independent interest below.

Improved hypervariance reduction.

Hermite polynomials are the orthonormal family of polynomials under gaussian distribution and are widely used as a canonical basis for working with polynomials for the normal distribution. See

Section 3 for their formal definition. For now, recall that any degree polynomial can be written as

where denotes a multi-index and is the ’th Hermite polynomial. The hypervariance and normalized hypervariance of a polynomial introduced in [OST20] are defined as follows:

Definition 1.6.

For a polynomial of the form , define its hypervariance, , and normalized hypervariance ,, as

Note that for , the orthonormality of Hermite polynomials implies that

Intuitively, if the normalized hypervariance of a polynomial is small for a large , then it means that the weight of the higher-order Hermite coefficients of have a geometric decay. This (as we will see) tells us that the polynomial is simple in the sense that the corresponding PTF is nearly fixed to a constant, connecting back to creftype 1.5.

[OST20] showed that for any polynomial , for a suitable , a gaussian restriction (i.e. ) leads to a polynomial being “simple” in the sense of having small normalized hypervariance. Specifically, they showed that if , then is bounded with high probability over . They also asked whether this property holds when instead of being quasi-polynomially small in . Our second main result, which will play crucial role in our proof of creftype 1.3 answers this question:

Lemma 1.7.

For any degree polynomial and , the following holds. Except with probability over , the normalized hypervariance .

Slow-growth of derivatives.

The proof of the above theorem in turn relies on a claim about the magnitude of the derivatives of a polynomial evaluated at random gaussian input which may be of independent interest.

For a function , let denote the sum of squares of all partial derivatives of of order at . That is,

is the Frobenius norm of the tensor of

’th order partial derivatives of . We show that for any degree polynomial , the Frobenius-norm of the ’th order derivatives are comparable to the ’th order derivatives on a random gaussian input with high probability:

Lemma 1.8.

For any degree- polynomial , and , the following holds with probability at least :

(1)

Note that the above lemma is tight up to the factor of : consider the example .

Independent and concurrent work.

Independently and concurrent to our work, [OST+21] also obtained similar results to creftype 1.3, creftype 1.7. They first obtained an analogue of creftype 1.7 and then combined the improved hypervariance reduction lemma with the framework of [OST20] to yield the improved PRG with dependence on the degree .

The two proofs of the creftype 1.7 are similar but our analysis of the PRG is different from that of [OST+21]. In particular, our analysis relies directly on creftype 1.8 (rather than its corollary creftype 1.7), and on a new set of identities for Hermite-expansions which lead to possibly simpler approach as described in the next section.

2 Proof Overview

We next describe the high-level ideas underlying our results creftype 1.3, creftype 1.7. We first describe our approach for proving creftype 1.7.

Improved hypervariance reduction.

The proof of the analogue of creftype 1.7 for quasi-polynomially small (i.e. ) in [OST20] was by an iterative process: Intuitively, if one sets , and , then the random restriction is equivalent to independent random restrictions with restriction parameter . The authors in [OST20] show that each such -restriction (essentially) decreases the degree by a factor of . We take a different approach in our work by first connecting hypervariance of the restricted polynomial to the norms of the derivatives of at . The actual proof is relatively simple given a relative anti-concentration lemma from [KAN13] developed in the context of studying the Gotsman-Linial conjecture for PTFs.

First, it is not too hard to prove creftype 1.7 given creftype 1.8. For illustration, suppose that we have a degree- multi-linear polynomial , and let for brevity. Then, by elementary algebra555If is multi-linear, then the Hermite expansion is just . We can prove the identity for each monomial and use additivity., we have the identity

(2)

Thus, . Now, with probability over , we have , for all . Thus, if we take , the factor of will kill the growing derivatives leading to a bounded .

Notice that Eq. 2 is essentially a Taylor expansion of at : it expresses the function as a polynomial in in the standard basis, whose coefficients are determined by the derivatives of at . In the general case, we would like to do something similar, but in the Hermite basis; for non-multi-linear polynomials these two bases no longer coincide. So, in the general case, we rely on the following identity, which we regard as an analogue of the Taylor expansion for the Hermite basis.

Lemma 2.1 (See Section 3).

Let Then

where

Hermite polynomials are such a ubiquitous tool used in such a wide range of fields that it seems unlikely that such an identity is new. However, we are not aware of any previous appearance of such an identity in the literature (at least in the body of work on PTFs) and we provide a proof.

The proof of creftype 1.8 is iterative and uses Kane’s relative anti-concentration inequality for degree polynomials [KAN13]. [KAN13] shows that for any degree polynomial, and with probability at least , we have . As in the above statement is independent of , for any , is distributed as . This says that the inequality is essentially equivalent to saying that with probability at least over , we have . The latter can be seen as the inequality corresponding to in the statement of creftype 1.8. The full proof of the lemma is via iteratively applying the above lemma to a vector-valued generalization of the above inequality.

2.1 PRG Construction and Analysis

We now sketch the main ideas behind the proof of our main result creftype 1.3. First, note that given the improved hypervariance lemma, creftype 1.7, it is potentially possible to use the framework of [OST20] to get the improved PRG. However, their analysis is quite involved. We will use the same generator, and the overall strategy of our analysis will be similar in spirit, but working directly from creftype 1.8 (rather than its corollary creftype 1.7) will allow us to present a simpler analysis.

As in the works of [KAN11a] and [OST20], the PRG output will be

where each is an independent

-moment-matching gaussian vector with

. For the time being let us work under the idealized assumption that each is exactly -moment-matching with a standard gaussian: i.e., for any polynomial of degree at most , . We will later relax this condition without too much additional work as is now standard (see Section 3 for details), and ultimately output a discrete approximation to with finite support. For now, it is appropriate to imagine that the seedlength required for generating each will be roughly ; the total seedlength will thus be roughly . We improve prior works by showing that it suffices to let , rather than as in [KAN11a] or as in [OST20].

For the rest of this section, fix a degree polynomial . We wish to compare to where each is an independent standard gaussian. Note that itself is distributed as . At a very high-level, the basic approach of the analysis is to replace each with a -moment matching gaussian vector as in our PRG.

Set , and for each , write so that we may express for any . For a vector , let denote the polynomial . Note that is essentially a gaussian restriction but with a slightly different normalization.

The starting point is that, if is a degree- polynomial with small normalized hypervariance, then it is fooled by -moment-matching for . This is simply because, when the hypervariance of is small, we can use bounds on the moments of to show that it will likely have the same sign as its constant term in the Hermite basis. The latter argument works equally well for limited-independence distributions. The moment bounds follow from hypercontractivity. Specifically, we will use the following:

Lemma 2.2 (See Section 3).

Let be a degree polynomial with normalized hypervariance . Then,

Further, the same holds more generally for , as long as the distribution is -moment-matching.

The above lemma when combined with creftype 1.7 implies creftype 1.5.

The above idea suggests the following strategy: Show that the polynomial has small normalized hypervariance with high probability over and use that is -moment-matching to replace with a standard gaussian . This indeed seems plausible as our hypervariance reduction lemma, creftype 1.7 indeed shows that when is standard gaussian, the polynomial does have small normalized hypervariance with high probability.

Immediately, there are two obstacles for this approach:

  • First, our hypervariance-reduction theorem works only for truly random gaussian and not for pseudorandom .

  • Second, even if we argue that likely has small hypervariance, we cannot apply a union bound over . The error guarantee in our hypervariance-reduction statement, creftype 1.7, is ; whereas, we have choices of , so we cannot use such a straightforward union-bound argument to replace each with a .

The second issue is especially problematic as the error probability in creftype 1.7 cannot be improved, at least in that variant; the probability that the hypervariance-reduction fails is generally not small compared to . In [KAN11a], Kane shows how to address both obstacles at once with a clever sandwiching argument with a series of mollifier checks. This approach is further expanded in [OST20]. We employ the same high-level approach, but we manage to introduce some substantial simplifcations by working directly from our creftype 1.8 (rather than its corollary creftype 1.7).

Beating the union bound.

For brevity, say that is well-behaved at a point if

where is a parameter that will be set to be roughly . We say is poorly-behaved at if the above condition does not hold. If is well-behaved at , then we know that is fooled by a moment-matching with very good error.

Roughly speaking, the main insight in going beyond the union bound obstacle mentioned above is as follows. There are two sources of error in the naive hybrid argument outlined above: (1) The probability of failure coming from being poorly-behaved at the points . (2) The error coming from applying creftype 2.2 to replace a with when is well-behaved at .

Note that we have very good control on the error of type (2) above: we could make it be much smaller than by increasing the amount of independence . We will exploit this critically. We will complement this by showing that even though a naive union bound would be bad for error of type (1) above, it turns out that we don’t have to incur this loss: we (implicitly) show that . We do so by checking only that is well-behaved at the single point (in a slightly stronger sense) and then we conclude that is also highly-likely to be well-behaved at each of the “nearby” points . Intuitively, this is what allows us to circumvent the union bound in the hybrid argument. However, it would be difficult to actually carry out the analysis as stated this way – we use a sandwiching argument to sidestep the complicated conditionings which would arise in this argument as stated.

We proceed to describe the sandwiching argument. We wish to lower-bound the PTF by , where is some “mollifier” function taking values in . The role of is roughly to “test” whether is well-behaved at ; we ideally want at points where is well-behaved and at points where is poorly-behaved. However, we also need to be smooth, so there will be some intermediate region of points for which yields a non-informative, non-boolean value.

We set to be a smoothed version of the indicator function

which tests whether the derivatives of at have controlled growth in the sense of creftype 1.8. Specifically, we set

where is some smooth univariate function with for and for .

Now, for every point we have

Furthermore, under truly-random gaussian inputs we have

where the final inequality here follows from creftype 1.8. Combining these, we get that

Note that we can similarly obtain an upper-bound for by repeating this argument on the polynomial .

Thus, it suffices to bound . We do so by a hybrid argument. We first represent as where each is an independent standard gaussian. Recall that is also of a similar form: , where the are -moment-matching. We can replace each with and get

as a consequence of the following lemma.

Lemma 2.3.

There exists a constant such that the following holds for . For any fixed vector , a -moment-matching gaussian vector, and ,

Technically speaking, the above lemma is where our intuition on going around the union bound is quantified, allowing us to use the hybrid argument. We briefly outline our proof of this lemma, where for the purpose of illustration we make the simplifying assumption that the polynomial is multilinear.

The proof is by a case analysis on the behavior of at the the fixed point . In the multilinear case it suffices to consider the derivatives ; in the general case we need to consider something slightly different.

  • Case 1: is well-behaved at , i.e., for all .

    • We can use creftype 2.2 in this case to conclude that , are both almost constant with error .

    • So, it remains to show that fools . We approximate by a low-degree polynomial in using a Taylor-truncation argument. Our assumption on the controlled growth of derivatives allows us to bound the Taylor-truncation error by bounding the higher-moments of the deviations .

  • Case 2: is not well-behaved at ; let be the largest such that .

    • Intuitively, this says that the polynomial is well behaved at degree above , but not at degree . This allows us to show, via an -th moment bound, that both

      are highly likely. Thus, it is highly likely that

      The latter means is still sufficiently poorly-behaved at the point

      that the mollifier classifies it as

3 Preliminaries

The pseduorandom generator construction: idealization vs. discretization. Following [KAN11a] and [OST20], we analyze the idealized pseudorandom distribution

where each is a -moment-matching gaussian (that is, for all polynomials of degree at most ).

Suppose that, for any such with parameters , it is the case that fools degree- PTFs with error . Then, it is shown in [KAN11a] how to obtain a small-seedlength PRG (in the sense of Definition 1.2) by providing a specific instantiation and discretization of this construction.

Theorem 3.1 ([KAN11a], implicit in Section 6).

Suppose a as above with parameters fools degree -PTFs with error . Then, there is an explcit, efficiently computable PRG with seedlength that -fools degree PTFs.

Hermite polynomials. To argue about polynomials which are not necessarily multilinear, we need some simple facts concerning Hermite polynomials. For our purposes, Hermite polynomials are simply a convenient choice of polynomial basis which have nice properties (in particular being orthonormal) with respect to gaussian inputs. For a more detailed background on Hermite polynomials and their use for analyzing functions over gaussian space, see [O’D14, Ch. 11].

One concrete way to define the Hermite polynomials is the following:

  • For the univariate polynomials, the degree- “Probabilist’s” Hermite polynomial is the -th coefficient of the generating function

  • We define the degree- univariate Hermite polynomial by the normalization

  • For a multi-index , we define the multivariate Hermite polynomial via the product

We record some basic properties of this particular choice of polynomial basis. The final two properties say that the Hermite basis is orthonormal with respect to correlation under the standard gaussian distribution – this is the reason for our choice of normalization.

  • The set is a basis for real polynomials in variables of degree .

  • is the constant polynomial .

  • For multi-indicies , is simply the monomial .

  • For , and distinct multi-indices ,

  • For , and any multi-index , .

Guassian noise operator. We recall the definition of the noise operator , which here we regard as an operator on real polynomials in variables (see [O’D14, Ch. 11] for background and a more general viewpoint). For a polynomial and a parameter , the action of on is specified by

An important feature of the Hermite basis is that the noise operator acts on it diagonally (see [O’D14, Ch. 11]):

Thus, if is a degree- polynomial given in the Hermite basis as

then we can express the result of the noise operator applied to explicitly as

Higher moments and hypercontractivity. Fix a polynomial . For an even natural number , we write the gaussian -norm of as

We wish to be able to bound this quantity in terms of the magnitudes of the Hermite coefficients of , . For this purpose, we extend the definition of also to by its action on the Hermite basis: With this notation, we can express the well-known -hypercontractive inequality [O’D14, Ch. 9,11] as

which is quite convenient for us, as we can use orthonormality of the Hermite basis to explicitly compute

To get a feel for the utility of this bound, let’s see how it can be used to prove creftype 2.2:

Lemma 3.2 ( creftype 2.2 restated).

Let be a degree polynomial with normalized hypervariance , where is an even natural number. Then,

Further, the same holds more generally for , as long as the distribution is -moment-matching.

Proof.

Suppose that is normalized so that

We have the -th moment bound

From the generic concentration inequality

we obtain

Thus, we find that the PTF almost always yields the value under random gaussian inputs. Crucially for us, this argument is also easy to derandomize: since the argument merely relies on a bound on the -th moment , and for which is -moment-matching for we have

we conclude also that is typically equal to . ∎

We remark that this lemma further implies that fools when is small:

Gaussian restrictions and derivatives on the Hermite basis. Besides the effect of the noise operator, it will also be important to understand the effect of two further operations on polynomials:

  • The derivative map,

  • The gaussian restriction at , .

In particular, we are concerned with how these operations affect the Hermite coefficients of a polynomial; ultimately, our goal will be to develop a “Hermite-basis analogue” of the Taylor expansion which can be applied to expand as a function of . We start by computing the effect of these two operations on univariate Hermite polynomials, and then on the full multivariate Hermite basis, and finally on a general polynomial expressed in the Hermite basis.

Proposition 3.3.

For univariate Hermite polynomials, we have the identities

  • ,

Proof.

The first of these identities is standard (see e.g. [O’D14, Ex. 11.10]); we provide a proof of the second.

The second identity can be proved by considering the generating function

and comparing the coefficient of on both sides of

The corresponding identities for multivariate Hermite polynomials follow easily from above.

Proposition 3.4.

We have

  • , where ,

  • ,

We conclude with a Taylor-like expansion in the Hermite basis that we use repeatedly.

Lemma 3.5.

Let Then

where

Proof.

We express

Lastly, we will also need an extension of this theorem which expresses , at the point

as a polynomial in in the Hermite basis.

Theorem 3.6.

Let Then

where

Proof.

We express