On the Fourier Entropy Influence Conjecture for Extremal Classes

06/10/2018 ∙ by Guy Shalev, et al. ∙ 0

The Fourier Entropy-Influence (FEI) Conjecture of Friedgut and Kalai states that H[f] ≤ C · I[f] holds for every Boolean function f, where H[f] denotes the spectral entropy of f, I[f] is its total influence, and C > 0 is a universal constant. Despite significant interest in the conjecture it has only been shown to hold for some classes of Boolean functions such as symmetric functions and read-once formulas. In this work, we prove the conjecture for extremal cases, functions with small influence and functions with high entropy. Specifically, we show that: * FEI holds for the class of functions with I[f] ≤ 2^-cn with the constant C = 4 ·c+1/c. Furthermore, proving FEI for a class of functions with I[f] ≤ 2^-s(n) for some s(n) = o(n) will imply FEI for the class of all Boolean functions. * FEI holds for the class of functions with H[f] ≥ cn with the constant C = 1 + c/h^-1(c^2). Furthermore, proving FEI for a class of functions with H[f] ≥ s(n) for some s(n) = o(n) will imply FEI for the class of all Boolean functions. Additionally, we show that FEI holds for the class of functions with constant f_1, completing the results of Chakhraborty et al. that bounded the entropy of such functions. We also improve the result of Wan et al. for read-k decision trees, from H[f] ≤ O(k) · I[f] to H[f] ≤ O(√(k)) · I[f]. Finally, we suggest a direction for proving FEI for read-k DNFs, and prove the Fourier Min-Entropy/Influence (FMEI) Conjecture for regular read-k DNFs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Boolean functions

are one of the most basic objects in the theory of computer science. The Fourier analysis of Boolean functions has become prominent over the years as a powerful tool in the study of Boolean functions, with applications in many fields such as complexity theory, learning theory, social choice, inapproximability, metric spaces, random graphs, coding theory, etc. For a comprehensive survey, see the book

[O’D14].

For Boolean-valued functions, by applying Parseval’s identity we have and therefore the squared Fourier coefficients

can be viewed as a probability distribution

, named the spectral distribution of . The spectral entropy of is defined to be the Shannon entropy of , namely . This can be intuitively thought as how "spread out" the Fourier coefficients of are. The total influence of a function , one of the most basic measures of a Boolean function, can be defined as , the expected size of a subset according to the spectral distribution, and can be intuitively thought of measuring the concentration of on "high" levels.

The Fourier Entropy Influence conjecture, posed by Friedgut and Kalai [FK96] states that for any Boolean function the ratio of its spectral entropy and its total influence is upper-bounded by a universal constant.

Conjecture 1.

([FK96]) There exists a universal constant such that for all with influence and spectral entropy we have .

The original motivation for the conjecture in [FK96] emerged from studying threshold phenomena of monotone graph properties in random graphs. Specifically for a function that represents a monotone property of a graph with vertices (e.g. connectivity), FEI implies that . The best known bound as of today due to Bourgain and Kalai [BK97] is for every .

Proving Conjecture 1 will have other interesting applications. Probably the most important consequence of the conjecture is its implication of a variant of Mansour’s conjecture from 1995 [Man95] stating that if a Boolean function can be represented by a DNF formula with terms, then most of its Fourier weight is concentrated on a set of coefficients of size at most . Combined with results by Gopalan et al. [GKK08]

this in turn will result in an efficient learning algorithm for such DNFs in the agnostic model, a central open problem in computational learning theory. Furthermore, sufficiently strong versions of Mansour’s Conjecture would yield improved pseudorandom generators for DNF formulas. See

[Kal07], [OWZ11] for more details on this implication.

FEI is also closely related to the fundamental KKL theorem [KKL88] stating that for every Boolean function, . We define , the min-entropy of . It is easy to verify that . A natural relaxation of FEI is the following weaker Fourier Min-Entropy Influence conjecture:

Conjecture 2.

(FMEI) There exists some such that for any we have .

KKL can be directly derived from FMEI (and therefore is clearly implied by FEI). In the other direction, one can easily prove FMEI for monotone functions using KKL (see [OWZ11]). We note that FEI for monotone functions is still an open problem.

1.1 Prior Work

Despite many years of attention, Conjecture 1 remains open, but some significant steps towards proving it have been made. For example, a weaker folklore version of FEI, where instead of a universal constant we settle for a factor, is known to be true even for the more general case of real-valued Boolean functions.

Lemma 1 (Weak FEI).

Let be some function with . Then .

This can be proved in several different ways, as done in [KMS12],[OWZ11] and [WWW14]. It should be noted that is indeed tight for non-Boolean functions, so proofs of FEI will have to make use of the fact that is Boolean-valued. The tightness can be seen by the following example:

and it is easy to verify that , and also that and . For Boolean-valued functions this bound has been recently improved by Gopalan et al. in [GSTW16] to , where is max sensitivity of the function: the sensitivity of in a function , denoted , is the number of indices for which , and the max sensitivity of is defined as . Clearly, for all functions .

Furthermore, FEI has been verified for several families of Boolean functions. O’Donnel, Wright and Zhou [OWZ11] proved it for symmetric functions by using the fact that derivatives of symmetric functions are very noise sensitive. They also prove FEI for the class of read-once decision trees.

In another paper Das, Pal and Visavaliya [DPV11] show that FEI holds with universal constant for a random function, as is strongly concentrated around its mean , and the spectral entropy of a function is always bounded by . We give another proof of this fact (with a worse constant), by proving FEI for functions with entropy linear in , as is the case for random functions.

In [KMS12], Keller, Mossel and Schlank generalize FEI to the biased setting. Furthermore, for functions with almost all of their Fourier weight on the lowest levels, they upper-bound the spectral entropy by .

In the paper [OT13], O’Donnell and Tan study FEI under composition: given functions and , they ask what properties do and must satisfy for the FEI conjecture to hold for the disjoint composition ? To make progress they present a strengthening of FEI which they call FEI - a generalization of FEI to product distributions. They prove that FEI composes, in the sense that if and respect FEI with factor , then so does their composition. They also prove FEI with factor , where is the arity of (instead of being a constant). Together with their main result, this is enough to prove FEI for read-once formulas.

In [CKLS16], Chakraborty et al.

prove a relaxation of FEI, bounding the spectral entropy with higher moments of

, where the original conjecture needs this bound to include only the first moment of , namely . They also prove FEI for read-once formulas with a more elementary method than the one of O’Donnell and Tan.

Independently, [CKLS16] also give upper bounds on the entropy of a Boolean function in terms of several complexity measures - to name a few, they show that , and also that where is the average depth of a decision tree computing . This implies FEI for the class where and are some constants.

This raises the natural question, whether the requirement is actually necessary or merely an artifact of the proof. For the complexity measure and other measures strongly related to it, we manage to overcome this condition by making subtle changes to the proof technique of [CKLS16], generalizing the bound and thus proving FEI for the class of functions with constant . Another measure they use to bound the entropy, is the average depth of a decision tree computing (they show ). For this measure, the requirement seems critical, as will be explained in the next paragraph.

In [WWW14], Wan, Wright and Wu present a new perspective of FEI as a communication (or rather, compression) game: one player randomly samples a set according to the distribution , and wishes to send it to another player using a short representation. The price of the protocol is the expected number of bits in the representation of . For a function , we know from Shannon that the price of the protocol is lower bounded by the spectral entropy, so we are merely left with the challenge of finding a protocol for with expected price less than . They formalize this into the following lemma:

Lemma.

Let , and let be a prefix-free protocol on alphabet , except it outputs an empty string on the input . Then .

They use this technique combined with observations regarding the covariance of decision trees to prove a theorem (that is also known due to [CKLS16]) - that FEI holds for the class of functions computed by decision trees with constant average depth and . [WWW14] also provide a reduction, showing that removing the requirement from the latter theorem, would in fact result in proving FEI for all Boolean functions with . This gives more motivation to examine FEI for functions with low influence.

Using their protocol technique, [WWW14] also achieve for read-k decision trees, thus proving FEI for read-k decision trees where is constant. They explicitly conjecture that the correct coefficient is actually and provide a matching example. We improve their bound to , but share their belief that the correct bound could be .

In [Hod17], Hod improves the lower bound on the conjectured universal constant for FEI to via lexicographic functions, using composition techniques and biased Fourier analysis.

1.2 Our Results

Intrigued by the implicit and explicit difficulties of FEI for low influence functions, we prove FEI for functions with extremely low influence:

Theorem 2.

Let be some constant. Let with . Then .

This result may seem at first somewhat disappointing, as interesting functions usually don’t have such small total influence. Can we do better than this bound? Apparently not, at least without proving the full conjecture. Using a construction presented in [WWW14] we show that any improvement of the last theorem will result in proving FEI:

Theorem 3.

Let such that . Suppose that FEI holds for all with . Then FEI holds for all Boolean functions.

For example, proving FEI for the class of functions with will be enough to confirm Conjecture 1.

This result for functions with extremely low influence raises the question of the opposite extremal case - where the entropy is high, say, for some . We provide analogous results for this extremal case.

Theorem 4.

Let be some constant. For any with we have , where is the inverse of the binary entropy function.

Theorem 5.

Let such that . Suppose that FEI holds for all with . Then FEI holds for all Boolean functions.

For example, proving FEI for the class of functions with will confirm Conjecture 1. We also note that the other two extremal cases are easy, namely functions with exponentially low entropy and functions with total influence linear in .

Independently from our work on the extremal classes, we also provide some improvements on previously known results. First, we modify the bound of [CKLS16]

to include the influence and variance of the function, thereby showing that FEI holds for the class of functions with constant

, .

Theorem 6.

Let with . Then . In particular, from the edge isoperimetric inequality, we have .

As a direct corollary, we can deduce FEI for functions with some related complexity measures that are constant. We note that some of these results have been previously known.

Corollary 7.

FEI holds for functions with constant , constant sub-cube partition, constant degree, constant decision tree depth, constant decision tree size, constant granularity or constant sparsity.

We also build and improve on the work of [WWW14]. Inspired by their methods, we provide a hopefully promising direction towards proving FEI for read-k DNFs. We give an explicit protocol for the Tribes function which is a read-once DNF, and conjecture its possible generalization to a protocol for read-k DNFs, as a step towards read-k formulas, breaking the barrier of “read-once” as an assumption required for many of the results known today. As a first step, we prove FMEI for regular read-k DNFs, where by regular we mean (informally) that all clauses are more or less of the same width, and the number of clauses is exponential in that width.

Theorem 8.

Let be a regular read-k DNF, then .

We also improve the result of [WWW14] for read-k decision trees. [WWW14] define the tree covariance of a decision tree recursively as: , where represent the functions defined by the left and right children of the root of . They come up with a protocol for decision trees with price . Therefore, by proving they obtain FEI for read-k decision trees with constant . By improving the bound on the covariance to , we manage to also improve the constant achieved for FEI regarding this class.

Theorem 9.

Let be computed by a read-k decision tree. Then . As a result, FEI holds for read-k decision trees with constant .

We believe the tree covariance of a decision tree and its connection to other measures of the function it computes such as its variance and influence, might be of independent interest in the study of decision trees.

Finally, as an independent result, we refine the known connection between the size of a decision tree, and the spectral norm () of the function it computes. It is a well known fact that , the size of a decision tree being the number of nodes in it. Our improvement involves the covariance of the nodes in the decision tree, and is given by the following lemma:

Proposition 10.

For a Boolean function that is computed by a decision tree :

Where is the number of nodes that have at least one child that is a leaf. The sum of covariances is over all inner nodes of , i.e. nodes that have two non-leaf children. This improved bound is tight in some cases where the bound is far from it - for example, the parity function on variables with the natural tree that computes it.

2 Preliminaries

2.1 Fourier Analysis of Boolean Functions

It is well known that functions can be uniquely expressed as multi-linear polynomials:

where . This is known as the Fourier expansion of , and are the Fourier coefficients of the function. For Boolean-valued functions Parseval’s identity implies that , and therefore can be viewed as a probability distribution, named the spectral distribution of and denoted . Two of the central complexity measures of a Boolean function can be defined using its spectral distribution:

Definition.

The spectral entropy of a function is the Shannon-entropy of the squared Fourier coefficients, namely

Definition.

The influence of a function (sometimes referred to as its total influence) is

The influence of a Boolean function also has a nice combinatorial interpretation. For , the influence of a variable in is , namely the probability that for a uniformly random input flipping the ’th bit will affect the result. An equivalent definition for the total influence of a function is .

It is sometimes useful to classify the Fourier coefficients by their level, where the level of

is . The weight of at level is denoted . Note that . Additionally, we use the following notations: , and .

We also use the decision tree model of computation, see [O’D14] for a formal definition. Given a tree , we call the sub-tree corresponding to the edge leaving the root the left sub-tree (), and call the sub-tree corresponding to the edge leaving the root the right sub-tree (), and denote by and the corresponding functions to each sub-tree. We assume that no variable appears more than once in any root-to-leaf path of T (or else the tree can be easily simplified). For a node in we denote by the depth of - its distance from the root node. We say that is a read-k decision tree if no variable is queried at more than nodes of .

Given two functions define . Following the definitions of [WWW14], we define the covariance of a decision tree : for an internal node , let be the function computed by ’s left sub-tree and be the function computed by ’s right sub-tree. Then, define:

Note that can be equivalently defined recursively as , with the base case that if has depth 0.

A DNF over Boolean variables is the logical OR of terms, each of which is a logical AND of literals {, }. The number of literals in a term is called its width (sometimes we refer to it as the size of the term). A DNF is read-k if no variable appears in more than terms.

The Tribes function with width and tribes, is a read-once DNF on variables, where all terms are of width exactly :

For , we choose to be the largest integer such that , so will be as unbiased as possible. Then we denote by , defined only for such pairs of , to be the (essentially) unbiased Tribes function on variables. Due to Proposition 4.12 in [O’D14], , .

Finally, we present the definition of regular DNFs:

Definition.

Let . We say is a -regular DNF (or just “regular”), if there exists some s.t. the number of variables in each clause respects , and the number of clauses is .

Expanding on the mentioned notions of the Shannon entropy and the min-entropy , the Renyi entropy of a distribution (we discuss only ) is defines as follows:

Where are the probabilities of possible instances in - in our case, these are the squared Fourier coefficients.

It can be seen that for , the Renyi entropy converges to the Shannon entropy, therefore we denote . Furthermore, when , the Renyi entropy converges to the min-entropy . It is known that for a fixed distribution, the function is non-increasing in .

2.2 Edge Isoperimetric Inequality

The simplest form of the Edge Isoperimetric Inequality states that for any Boolean function , . We also rely on the following edge isoperimetric inequality, see e.g. Theorem 2.39 in [O’D14]:

Fact 11.

Let be a Boolean function. Denote , then .

Keeping the notation , it is easy to see that can be “replaced” by the variance, losing only a constant multiplicative factor:

Since , we have:

Lemma 12.

Let be a Boolean function, then .

Proof.

Where the second inequality is due to . ∎

Lemma 13.

Let be a Boolean function with , then .

Proof.

The requirement that is necessary, or else the term is non-positive. We derive the new inequality from the proof of Lemma 12:

2.3 Tensorization of FEI

Let , be two Boolean functions. Define to be their tensor product: , and .

In [Kal07] it has been noted that FEI tensorizes in the following sense:

Fact 14.

For Boolean functions and :

  • , where denotes the number of variables of the function .

We call the self-tensorization of . As stated in the following lemma, the tensorization technique allows us to deduce FEI for a class closed under self-tensorization (i.e., for all we have ) by proving FEI for that class with a sub-linear additive term. Obviously, the class of all Boolean functions is closed under self-tensorization.

Lemma 15.

Suppose we have a class of Boolean functions that is closed under self-tensorization, a constant and some function . If for all , then for all .

Proof.

Let be a Boolean function on variables. Define , and . since is closed under self-tensorization, and therefore we have

By Fact 14:

Dividing by , we get:

Fixing and taking to infinity, we get , and therefore . ∎

Additionally, we note that the min-entropy tensorizes as well: for and as stated above, , so a similar proof will suffice for an analogous result.

Lemma 16.

Suppose we have a class of Boolean functions that is closed under self-tensorization, a constant and some function . If for all , then for all .

3 FEI for Low Influence Functions

In this section we prove FEI for the class of functions with exponentially low influence (in ), and then show that improving this will imply Conjecture 1. To state this formally, we introduce some notations and consider the following classes of functions:

  • The class of Boolean functions on variables , and the class of all Boolean functions .

  • The class of functions with exponentially-low influence. For every constant , define .

  • The class of functions with “almost” exponentially-low influence. For every function such that , define .

  • The class of functions with influence larger than , .

Formally, we show that for any FEI holds for the class with constant . We then show that improving on this result by proving FEI for any will actually imply FEI for the class . As a simple corollary of Theorem 5 that we will later prove, this will imply FEI for , i.e. Conjecture 1.

3.1 Proving FEI for ELI

We restate and prove Theorem 2 as follows:

Theorem 17.

For all , .

Proof.

Let , and denote , where . We use the concentration method presented by [CKLS16] to show . We partition the Fourier coefficients to a family , and its complement . These families have Fourier weight of and respectively. By a known formula of entropy partition, we have:

where the entropies of the families are of the adequately normalized distributions and

is the binary entropy function. Note that , since contains only one element. Also note that , as . Therefore we have:

(3.1)

If then by our assumption it follows that . From this and a from Weak FEI (Lemma 1) we get so we are done. Otherwise, we can assume . To bound the second term of inequality 3.1, , we note that for , we have , so . Acknowledging the fact that and applying Lemma 12, we have:

(3.2)

We can bound the first term of inequality 3.1 by applying Lemma 13:

(3.3)

Inserting 3.2, 3.3 into 3.1 we obtain the wanted result:

There is also an alternative proof for theorem 17 using the protocol method of [WWW14]: Intuitively, consider the following trivial protocol: if the sampled is non-empty, send bits, where the ’th bit is set to if and otherwise is . If is sampled, the protocol sends the empty string. By Lemma 13 we have , so the average cost of this protocol will be: which also gives us FEI for with constant .

3.2 Proving FEI for AELI Implies FEI Completely

Lemma 18.

Let such that . Suppose that FEI holds for some class with universal constant , Then FEI holds for the class with constant .

Proof.

We follow exactly the same construction appearing in appendix E of [WWW14]. Let . For now we assume is balanced (i.e. that ), and deal with biased functions later.

Consider the function on variables defined as

(3.4)

is extremely biased, as it can get the value only when . By direct calculation, [WWW14] show that:

and also that:

We would like to argue that , so we need to pick a large enough accordingly, so that the following inequality will hold:

We also want to use self-tensorization on , so we need to ensure . So it would suffice to find such that:

We can pick . For such it is clear that , and also that , where the last inequality uses the fact that and that is monotone increasing - we can assume w.l.o.g that is a monotone function, or otherwise redefine with .

So by the fact that and assuming FEI for with universal constant :

The subclass of balanced functions in is closed under self-tensorization, so we can use the tensorization technique to get hereby completing the proof for balanced functions.

If is biased, we can define . is balanced, , so , and . Therefore we have:

Where the last inequality is the only place where we use the fact that (apart from the fact that is closed under self-tensorization). ∎

We would like to extend the lemma from to . If we examine for a moment the class , it is easy to see from Weak FEI that for all , . Therefore , so FEI holds for . By Lemma 18 and Theorem 5 to be proven in the next section, we can now deduce Theorem 3 as a simple corollary:

Theorem 19.

Let such that . Suppose that FEI holds for some class with universal constant , then FEI holds for the class with constant .

It is natural to ask whether this hardness result extends to FMEI, in the sense that proving FMEI for will imply FMEI for . The proof fails because the min-entropy of the original function vanishes, as becomes the largest coefficient of . Furthermore, FMEI is easy for functions with , and therefore also functions respecting the stronger condition .

Lemma 20.

Let such that . Then .

Proof.

The third inequality makes use of the fact that for , and the fourth inequality is due to the fact that . ∎

4 FEI for Functions With Entropy Linear in n

In the previous section, we proved FEI for functions with exponentially low influence. We have also matched this with a “hardness result”, showing that proving FEI for a class of functions with slightly higher influence will imply FEI for all Boolean functions.

These results raise the question of the other non-trivial extremal case - proving FEI for functions with high entropy. As , a natural interpretation of large entropy could be for some constant . In this section, we prove FEI for the class of functions with entropy linear in , and show that improving this to any will prove FEI for all Boolean functions. We consider the following classes of functions:

  • The class of functions with linearly-high entropy. For every constant , define .

  • The class of functions with “almost” linearly-high entropy. For every function such that , define .

Formally, we show that for any FEI holds for the class with constant . We then show that improving on this result by proving FEI for any will imply FEI for , i.e. Conjecture 1.

4.1 Proving FEI for LHE

We restate and prove Theorem 4 as follows:

Theorem 21.

Let . For all , , where is the inverse of the binary entropy function.

Proof.

We use the concentration method presented by [CKLS16], where our partition of the coefficients is of the form . Obviously, we have:

(4.1)

Recall that is the entropy of the normalized-to-1 distribution of the squared coefficients of sets in . Intuitively, we upper-bound by the fact that there are not too many subsets of size or less. For a constant and we can approximation the volume of the Hamming ball of radius around :

(4.2)

Therefore, . For the second and third term of equation 4.1, we trivially have: and .

So combining all of these, we obtain:

We start by focusing on functions with , and later extend our proof to all functions in , with . Observing that , we obtain:

(4.3)

We remove the term for simplicity, as it is negligible compared to the the other terms (to formalize this, we can replace by ). Now, dividing equation 4.3 by and rearranging it:

and finally,