# Low degree almost Boolean functions are sparse juntas

Nisan and Szegedy showed that low degree Boolean functions are juntas. Kindler and Safra showed that low degree functions which are almost Boolean are close to juntas. Their result holds with respect to μ_p for every constant p. When p is allowed to be very small, new phenomena emerge. For example, the function y_1 + ... + y_ϵ/p (where y_i ∈{0,1}) is close to Boolean but not close to a junta. We show that low degree functions which are almost Boolean are close to a new class of functions which we call *sparse juntas*. Roughly speaking, these are functions which on a random input look like juntas, in the sense that only a finite number of their monomials are non-zero. This extends a result of the second author for the degree 1 case. As applications of our result, we show that low degree almost Boolean functions must be very biased, and satisfy a large deviation bound. An interesting aspect of our proof is that it relies on a local-to-global agreement theorem. We cover the p-biased hypercube by many smaller dimensional copies of the uniform hypercube, and approximate our function locally via the Kindler--Safra theorem for constant p. We then stitch the local approximations together into one global function that is a sparse junta.

## Authors

• 9 publications
• 18 publications
• 15 publications
• ### Degree-d Chow Parameters Robustly Determine Degree-d PTFs (and Algorithmic Applications)

The degree-d Chow parameters of a Boolean function f: {-1,1}^n →R are it...
11/07/2018 ∙ by Ilias Diakonikolas, et al. ∙ 0

• ### Agreement tests on graphs and hypergraphs

Agreement tests are a generalization of low degree tests that capture a ...
11/26/2017 ∙ by Irit Dinur, et al. ∙ 0

• ### Biasing Boolean Functions and Collective Coin-Flipping Protocols over Arbitrary Product Distributions

The seminal result of Kahn, Kalai and Linial shows that a coalition of O...
02/20/2019 ∙ by Yuval Filmus, et al. ∙ 0

• ### Reduction From Non-Unique Games To Boolean Unique Games

We reduce the problem of proving a "Boolean Unique Games Conjecture" (wi...
06/23/2020 ∙ by Ronen Eldan, et al. ∙ 0

• ### BDDs Naturally Represent Boolean Functions, and ZDDs Naturally Represent Sets of Sets

This paper studies a difference between Binary Decision Diagrams (BDDs) ...
06/27/2018 ∙ by Kensuke Kojima, et al. ∙ 0

• ### Symbolic dynamics and rotation symmetric Boolean functions

We identify the weights wt(f_n) of a family {f_n} of rotation symmetric ...
09/19/2019 ∙ by Alexandru Chirvasitu, et al. ∙ 0

• ### Boolean functions: noise stability, non-interactive correlation, and mutual information

Let ϵ∈[0, 1/2] be the noise parameter and p>1. We study the isoperimetri...
01/13/2018 ∙ by Jiange Li, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We study the structure of “simple” Boolean functions in the -biased hypercube, for all values of , and in particular when . We introduce a new class of functions that we call sparse juntas which generalize the standard juntas. Our main result is that every Boolean function that has at most of its mass above degree , is close to a degree sparse junta. Throughout the paper we say that is -close to if .

Nisan and Szegedy showed that Boolean functions that are exactly low degree must be juntas [NS94], namely functions that depend on a constant number of coordinates. Classical theorems in the analysis of Boolean functions describe the structure of Boolean functions that are close to being “simple” functions, where closeness is measured with respect to the uniform measure. Notions of “simple” include functions that are noise-stable, or nearly low degree, or have low total-influence [Fri98, FKN02, Bou02, KS02]. These results invariably prove that the function depends on a few coordinates (a dictator or a junta). For example, Friedgut, Kalai and Naor [FKN02] prove that a function whose mass is almost all on Fourier levels must be a function that depends on at most one variable (dictator, anti-dictator or constant). Bourgain [Bou02] and Kindler and Safra [KS02] studied Boolean functions with small mass on the Fourier levels above . Kindler and Safra proved that such functions are close to juntas.

###### Theorem 1.1 (Kindler–Safra [Ks02, Kin03]).

Fix . For every there exists such that for every , if satisfies then there exists a degree  function (which necessarily depends on coordinates) satisfying . In particular, when and , .

The term junta was actually coined in an earlier paper of Friedgut who proved that any Boolean function with small total influence is close to a junta [Fri98].

Theorem 1.1 is a generalization to degree of the earlier theorem of Friedgut, Kalai and Naor [FKN02] mentioned above, which states that functions which are close to degree are close to dictators or constants. An alternative way of saying this is that given a function with only fraction of its -mass outside levels , if for all , then it must be -close to a function such that for all (it is easy to verify that such Boolean functions are exactly the dictators, anti-dictators or the constant functions).

It is natural to wonder if the condition that the range is Boolean, namely for all , can be replaced by for all , for any arbitrary finite set . What can be said about such a function that has of its mass outside levels ? The answer becomes more complicated as the size of grows, and the function need not depend on just one variable, as can be seen by the function that takes only three distinct values but depends on more than one variable. Nevertheless, we show that a similarly flavored statement is true: if the function takes values in a finite set and has only mass outside levels , then there is a finite set such that is close to a function whose Fourier coefficients belong to the set .

###### Theorem 1.2 (A-valued functions with low degree).

Let be a finite set, let , and let be a function that has at most fraction of its -mass outside levels , that is, . Then is -close to a function of degree with Fourier coefficients in a finite set .

This theorem is not difficult to prove given Theorem 1.1, but it turns out to be quite useful. In fact, generalizing from Boolean to -valued allows us to give an new proof of Theorem 1.1 that proceeds by induction on the degree (see Section 8).

Having warmed up, we turn to the main focus of this paper, which is understanding the structure of Boolean (or -valued) functions that are nearly degree in the -biased hypercube. The -biased hypercube is the set equipped with the measure (given by ). We think of as being possibly very small, for example .

The theorem of Kindler and Safra [KS02] continues to hold under the measure, but the quality of the approximation deteriorates with . Indeed, the class of junta functions does not seem to be the correct class of functions for approximating low degree functions that are -almost Boolean. This is demonstrated by the following simple example: Let be a degree function, and let be the Boolean function closest to . If then is -close to , and yet it depends on many coordinates. It turns out that this example is canonical: in previous work [Fil16], the second named author has proved that all functions that are nearly degree one, in , essentially look like this one.

The function

considered above is very biased: with probability roughly

, it is equal to zero. More generally, the result of [Fil16] implies that if is a degree 1 function which is -close to Boolean then is -close to a constant function. Still, one can hope for an even better approximation, with error of , and indeed this is possible: is -close to a linear function similar to the function considered above (or its negation).

Generalizing this theorem to higher degrees requires coming up with a new syntactic class of simple functions that are the good approximators for low degree Boolean functions. As before, constants give an -approximation for some , a fact which comes as a consequence of our main theorem (see Lemma 1.7), but our goal here will be to find an even better approximation, on the order of . The first step is to move away from the Fourier basis whose basis functions depend on and are thus non-canonical. Instead, we will rely on the -expansion of :

###### Definition 1.3 (y-expansion).

The -expansion of a function is the unique expansion where is a basis of functions given by and for , we define .

The -expansion is the standard expansion of as a multilinear polynomial in variables instead of variables. We stress that this is not the Fourier expansion of (under ), which is its expansion as a multilinear polynomial in input variables. The -expansion is better suited for working with for small . The result mentioned above [Fil16] states that any degree function that is close to being Boolean in the -biased hypercube can be approximated by a function whose -expansion coefficients are all in .

This motivates the following generalization:

###### Definition 1.4 (quantized polynomial).

Given a finite set , a function is said to be an -quantized polynomial of degree if all coefficients of the -expansion of belong to .

As part of our main result, stated below as Theorem 1.5, we show that for all , a low degree function that is -close under to being -valued, is close to an -quantized polynomial for some finite set . This can be nicely rephrased as follows: For all and sets , there exists such that for all :

If a function has degree and is -close under to an -valued function, then its -expansion is -close to being -quantized.

Observe that the -expansion is important for making such a statement. It could not be made for the Fourier expansion since the coefficients would have to depend on .

This generalizes Theorem 1.2 above since in the uniform setting a quantized polynomial that has bounded norm must be a junta. Indeed, substituting shows that if then (where is the coefficient of in the -expansion of ), and so Parseval’s identity shows that there is a constant number of non-zero with . Removing them can only increase the norm by a constant, and so applying the same reasoning inductively shows that is a junta.

Our main theorem gives a somewhat stronger syntactic characterization, showing that -valued functions with nearly low degree are close to being sparse juntas. These are quantized polynomials that have an additional structural property which we call bounded branching factor. The branching factor of a quantized polynomial is best explained by considering the hypergraph whose edges correspond to all non-zero coefficients in the -expansion of . This hypergraph has branching factor if for all subsets and integers , there are at most hyperedges in of cardinality containing .

While this is the syntactic definition, the meaning of having small branching factor is that the function is “empirically” a junta, because a typical input only leaves a constant number of monomials non-zero. This is why we call these functions sparse juntas. Finally, we can state our main theorem:

###### Theorem 1.5 (Main).

For every positive integer and finite set , there exists a finite set such that the following holds. For every and of degree there exists a function of degree that satisfies the following properties for :

1. .

2. is an -sparse junta, that is, it is an -quantized polynomial of degree with branching factor .

3. If then is the sum of coefficients of with probability .

We also show a converse to the above theorem (see Lemma 6.1) in the sense that the second and third properties are a complete characterization of degree  functions that are -close to (i.e., ).

As applications of our theorem we show a large deviation bound for degree  functions close to a finite set :

###### Lemma 1.6 (Large deviation bound).

Fix an integer  and a finite set . Suppose that is a degree  function satisfying with respect to for some . For large ,

 Pr[|f|≥t]≤exp(−Ω(t1/d)+O(ε/t2)).

We also prove that such functions must by very biased:

###### Lemma 1.7 (Sparse juntas are very biased).

Fix a constant and a finite set . There exist constants such that for all and , the following holds.

Suppose that is a degree  function with branching factor such that . Then there exists such that .

Combining this with our main theorem implies that if an -valued function is close to degree , it must be very biased.

#### A local-to-global aspect of the proof

Let us highlight an interesting aspect of the proof of our main theorem. Previous works analyzing the structure of Boolean functions rely on hypercontractivity. When the hypercontractive behavior breaks down, and this is responsible for the deterioration of the approximation in Theorem 1.1. Our proof doesn’t go down this path, and instead proceeds by breaking up the -biased hypercube into many small sub-cubes that are obtained by setting many variables to  (using the convention for the inputs). The measure on these sub-cubes becomes the uniform measure, and so we are able to approximate locally on them using the classical Kindler-Safra theorem, Theorem 1.2. This gives us a separate junta function on each sub-cube . Moving from local to global, we rely on a recent so-called agreement theorem proven by the authors [DFH17] that gives us a single global function that agrees with most of the local approximations (after ensuring that the local pieces typically agree with each other).

To complete the proof of our main theorem, we use a crucial feature of the agreement theorem proven in [DFH17], namely that agreement is reached by consensus. This means that each coefficient of the -expansion of is chosen by picking the most “popular” value appearing in all relevant . In turns out that this feature guarantees that has branching factor .

#### A new proof of the Kindler–Safra theorem

Our new proof of Theorem 1.1 demonstrates the power of our view of the theorem as stating that if a low degree function is close to being quantized, then its Fourier expansion is close to being quantized. Our inductive proof also makes essential use of the generalization to -valued, rather than just Boolean, functions: even when starting with a Boolean function, -valued functions arise in the proof.

Given a function of degree  which is close to a finite set , we use the theorem for degree (assumed to hold by induction) together with the -valued FKN theorem to show that the degree  and degree  coefficients are almost quantized (this is the heart of the proof). This allows us to replace the two highest levels of  with a quantized polynomial, which must be a junta. Removing these two levels altogether, we get an -valued function for some depending on . Applying the theorem for degree completes the proof.

### Related work

Understanding the structure of Boolean functions that are simple according to some measure such as being nearly low degree is a basic complexity goal. Similar structure theorems such as the KKL theorem [KKL88], Friedgut’s junta theorem [Fri98], and the FKN theorem [FKN02], have found numerous applications. The analogous questions for the

-biased hypercube are understood only to some extent, yet the questions are natural and play an important role in several areas in combinatorics and the theory of computation.

• A major motivation for studying Boolean functions under the measure comes from trying to understand the sharp threshold behavior of graph properties, and of satisfiability of random -CNF formulae.

A large area of combinatorics is concerned with understanding properties of graphs selected from the random graph model of Erdős and Rényi, . A graph property is described via a Boolean function whose input variables describe the edges of a graph and the function is iff the property is satisfied. Selecting a graph at random from the distribution is equivalent to selecting a random input to with distribution . The density of this function is the probability that the property holds, and so its fine behavior as increases from to

is the business of sharp threshold theorems. For many of the most interesting graph properties, such as connectivity and appearance of a triangle, a phase transition occurs for very small values of

(corresponding to ). Friedgut and Kalai [FK96] used the theorem of Kahn, Kalai and Linial [KKL88] to prove that every monotone graph property has a narrow threshold.

A famous theorem of Friedgut [Fri99] characterizes which graph and hypergraph properties have sharp threshold. As an application, Friedgut establishes the existence of a sharp threshold for the satisfiability of random -CNF formulae. This is done through analyzing the structure of -biased Boolean functions with low total influence, which corresponds to not having a sharp threshold. The same question was also studied by Bourgain [Bou99] and subsequently by Hatami [Hat12], who proved that such functions must be “pseudo-juntas” (see [O’D14, Chapter 10] for a discussion of these results). We recommend the nice recent survey [BK17, Section 3] for a description of some related questions and conjectures.

Our condition of having nearly degree is a strictly stronger condition than having low total influence, and indeed our sparse juntas are in particular pseudo-juntas. Unlike sparse juntas, the pseudo-junta property is not syntactic (it does not define a class of functions, but rather a property of the given function), and it is interesting to understand the relation between pseudo-juntas and sparse juntas.

Friedgut conjectured that every monotone function that has a coarse threshold is approximable by a narrow DNF, which is a function that can be written as . This is quite similar to our class of sparse juntas (in fact, they coincide for degree ), except that our functions are expressed as a sum of monomials rather than their maximum, and thus we must restrict ourselves to functions with bounded branching factor. The assumption of having a coarse threshold is weaker than having nearly degree , yet it is interesting whether our techniques can be applied toward resolution of this conjecture.

• Hardness of approximation: The -biased hypercube has been used as a gadget for proving hardness of approximation of vertex cover, where the relevant regime is some constant . Other variants of the hypercube have been used or suggested as gadgets for proving inapproximability, including the short code [BGH15], the real code [KM13], and the Grassmann code [KMS17]

. In all of these, understanding the structure of Boolean functions with nearly low degree seems crucial. In the Grassmann code, one considers subspaces of small dimension inside a large-dimensional vector space. Some conjectures were made in

[DKK16, DKK17] regarding the structure of Boolean functions whose domain is the set of subspaces and that have non-negligible mass on the space of functions that corresponds to having low degree. Thinking of subspaces as subsets of points, this is analogous to the -biased case, when is very small, on the order of . Toward understanding that question, it is natural to first pursue such a study on the simpler model of the -biased hypercube for very small , where the analysis is potentially easier since the space is a product space.

• Relatively recent work [KKM17] proves that Reed–Muller codes achieve capacity on the erasure channel, using the Bourgain–Kalai sharp threshold theorem for affine-invariant functions [BourgainKalai]. The regime of this result is only for codes with constant rate, and it seems that extending it to lower rates would require understanding the structure of affine-invariant functions under the -biased measure for small .

### Organization

The rest of the paper is organized as follows. We begin with a few preliminaries in Section 2, which includes the agreement testing results. In Section 3, we define the branching factor and discuss some of its properties. We generalize the classical Kindler-Safra theorem to -valued functions in Section 4. We then prove the main result of the paper (Theorem 1.5) in Section 5. In Section 6, we prove the converse to our main result. We discuss some applications in Section 7 and give an alternate proof to the classical Kindler-Safra theorem in Section 8.

## 2 Preliminaries

We will need the following definitions:

• We define .

• We define as an element in whose distance from is .

• For a function and a set , the function results from substituting zero to all coordinates outside of .

• For a function , the support of its -expansion naturals corresponds to a hypergraph which we sometimes refer to as the support of .

• For a set , is a distribution over subsets of in which each element of is chosen independently with probability .

• The triangle inequality states that . It implies that

 dist(x+y,A)2=mina∈A(x+y−a)2≤mina∈A[2(x−a)2+2y2]≤2dist(x,A)2+2y2.
• For any satisfying , the distribution is defined to be the distribution on pairs in which each element belongs only to with probability , only to with probability , and to both and with probability .

We will need the following theorems.

###### Theorem 2.1 (Nisan–Szegedy).

If is a degree function, then is a -junta.

###### Theorem 2.2 ((2,p) hypercontractivity).

Let , then for any function of degree at most , we have .

We also need the following result about quantization.

###### Lemma 2.3.

For every finite set and integer there exists a finite set such that the following holds. Suppose that . If all coefficients of the -expansion of belong to , then all coefficients of the -expansion of belong to .

###### Proof.

Let , and let (otherwise ). Since , we have

 ~g(A)=⋃A1∪A2=A~g1(A1)~g2(A2).

The lemma follows from the fact that the sum contains at most terms. ∎

### 2.1 Agreement testing

Agreement tests are a type of PCP tests that capture local-to-global phenomena. Our proof of the main result uses an agreement test recently analyzed by the authors [DFH17], which is an extension of the direct product test to higher dimensions. In the standard direct product test, one is given a ground set and an ensemble of local functions containing a local function for each subset . The direct product test is specified by the distribution over pairs of sets , in which each element is independently added to with probability , to with probability , to with probability , and to neither set with probability . Here, we assume and . The direct product testing results [DG08, IKW12, DS14] state that if the local functions agree most of the time, ie.,

 Pr(S1,S2)∼μp,q[fS1|S1∩S2=fS2|S1∩S2]=1−ε,

then there must exist a global function that explains most of the local functions:

 PrS∼μp[fS=G|S]=1−O(ε).

In recent work [DFH17], the authors extended this direct product to higher dimensions, wherein the local functions are functions not only on the vertices of but also on hyperedges supported by , i.e., instead of . Furthermore, they demonstrated that the function obtained by majority decoding serves as a good candidate for the global function. Formally:

###### Theorem 2.4 (Agreement theorem via majority decoding).

For every positive integer and alphabet , there exists a constant such that for all and and sufficiently large , the following holds. Let be an ensemble of functions satisfying

 PrS1,S2∼μp,q[fS1|S1∩S2≠fS2|S1∩S2]≤ε.

Then the global function defined by plurality decoding (ie., is the most popular value of over all containing , chosen according to the distribution , i.e., ) satisfies

 PrS∼μp[fS≠G|S]=Od,q(ε).

## 3 Branching factor

The analog of juntas for small are quantized functions with branching factor . Let us start by formally defining this concept,

###### Definition 3.1 (branching factor).

For any , a hypergraph over a vertex set is said to have branching factor if for all subsets and integers , there are at most hyperedges in of cardinality containing .

A function is said to have branching factor if the corresponding hypergraph (given by the support of the -expansion of ) has branching factor .

In what sense is a function with branching factor similar to a junta? If is a junta and , then is the sum of a bounded number of coefficients of the -expansion of . Let us call such a coefficient live. In other words, the coefficients left alive by are all for which .

We want a similar property to hold for a function with respect to an input for small . As a first approximation, we need the expected number of live coefficients to be bounded. If then the expected number of live coefficients is

 d∑e=0peNe, where Ne=|{|S|=e:~f(S)≠0}|.

This sum is bounded if for all . A drawback of this definition is that it is not closed under substitution: if the expected number of live coefficients of is bounded, this doesn’t guarantee the same property for . For example, consider the function

 f=y0(y1+⋯+y1/p2).

While the expected number of live coefficients is , if we substitute then the expected number of live coefficients jumps to . The recursive nature of the definition of branching factor guarantees that this cannot happen.

Functions with branching factor also have several other desirable properties, such as the large deviation bound proved in Section 7, and Lemma 3.4 below.

In the rest of this section we prove several elementary properties of the branching factor. We start by estimating the branching factor of a sum or product of functions.

###### Lemma 3.2.

Suppose that have degree and branching factor . Then and have branching factor , where the hidden constant depends on .

###### Proof.

The claim about is obvious, so let us consider . Given , we have to show that the number of non-zero coefficients in which extend by elements is .

If then for some such that . Let and , where , and are disjoint and disjoint from , so that . Denote the sizes of by .

There are options for . Given , there are at most non-zero coefficients in extending by elements, and for each such extension, there are options for . Given , there are at most non-zero coefficients in extending by elements. In total, we deduce that for each of the choices of , the number of non-zero coefficients extending by elements is . ∎

As mentioned above, substitution has a bounded effect on the branching factor.

###### Lemma 3.3.

If has branching factor then has branching factor .

###### Proof.

It’s enough to prove the theorem when . Let be given. We will show that the number of hyperedges in extending by elements is at most . If then this is clear. Otherwise, for each such hyperedge , either or belongs in . The former case includes all hyperedges of extending by elements, and the latter all hyperedges of extending by elements. Since has branching factor , we can upper bound the number of hyperedges by . ∎

One of the crucial properties of functions with branching factor is that given that a certain -coefficient is live, there is constant probability that no other -coefficient is live.

###### Lemma 3.4 (Uniqueness).

Suppose that has branching factor and degree , where . For every , the probability that and for all in the support of is .

###### Proof.

Let be the hypergraph formed by the support of (that is, is a hyperedge if ). Given that , the probability that for all is exactly equal to . Lemma 3.3 shows that has branching factor , and so it has hyperedges of size . The probability that each such edge survives is , and so the FKG lemma shows that given that , the probability that for all is at least

 d∏e=1(1−pe)O(p−e)=Ω(1).

This completes the proof, since . ∎

## 4 Generalized Kindler–Safra theorem to A-valued functions

In this section, we prove the following generalization of Kindler-Safra to quantized function (i.e, -valued functions for some finite set ). Everything that follows holds with respect to for fixed . All hidden constants depend continuously on .

###### Theorem 4.1.

For all integers and finite sets the following holds. If is a degree  and then is -close to a degree  function .

We start with the following easy claim which is an easy consequence of the Nisan-Szegedy theorem (Theorem 2.1).

###### Claim 4.2.

For all integers and finite sets there exists such that the following holds. If has degree then depends on at most coordinates.

###### Proof.

For all , define

 fa=∏b≠af−ba−b.

The function has degree at most and is Boolean, and so it depends on at most coordinates. Since

 f=∑a∈Aafa,

we see that depends on at most coordinates. ∎

Suppose we are dealing with degree functions which are close to some finite set (ie., ) and we wish to show that . The following trick (using hypercontractivity Theorem 2.2) shows that is suffices to show for some .

###### Claim 4.3.

Fix an integer , a finite set , and an exponent . If is a degree  function satisfying and then .

###### Proof.

We can assume that , since otherwise the theorem is trivial. Similarly, we can assume that , since adding can only decrease .

Let denote the element of closest to . Then

 O(ε)≥E[dist(h,A)2]≥E[h21z=0]=E[h2]−E[h21z≠0].

If then , and so for any integer . In particular, for , this shows that

 E[h21z≠0]=O(E[hk])=O(∥h∥kk)=O(∥h∥k2)=O(εk(α/2))=O(ε),

using hypercontractivity and . It follows that . ∎

###### Corollary 4.4.

Fix an integer , finite sets , and an exponent . If are degree  functions satisfying , , and , then .

###### Proof.

Let . The triangle inequality shows that . Also, . The lemma therefore shows that . ∎

We now generalize the Kindler–Safra theorem to the -valued setting, using the decomposition of creftype 4.2 and thus prove Theorem 4.1

###### Proof of Theorem 4.1.

Pick some arbitrary and arbitrary constant . The triangle inequality shows that . If , the conclusion of the theorem is trivially satisfied with . Therefore from now on we assume that .

For , define

 fa(x)=∏b≠af(x)−ba−b.

Also, let be the element in closest to , and let . Note . We will usually drop the argument from all these functions. Finally, define .

Our first goal is to bound in terms of . Let be a small constant. We consider two cases. If then

 dist(fa,{0,1})≤|fa|=|δ||y−b|∏b≠a,y|y−b+δ||a−b|.

If then , and otherwise . If then

 dist(fa,{0,1})≤|fa−1|=∣∣ ∣∣∏b≠a∣∣∣1+δa−b∣∣∣−1∣∣ ∣∣.

Once again, if then , and otherwise .

We can now obtain a rough bound on by considering separately the cases and . The first case is simple:

 E[dist(fa,{0,1})21|δ|≤δ0]≤O(E[δ2])=O(ε).

For the second case, we use Cauchy–Schwartz and the bound (recall is a constant):

 E[dist(fa,{0,1})21|δ|≥δ0]≤√E[δ2m]O(√ε).

Let . If then clearly , and otherwise . Therefore it always holds that . This shows that

 E[δ2m]≤C2m+E[f2m]=O(1)+∥f∥2m2m.

Since , we have . The triangle inequality shows that , and in total this case contributes . We conclude that

 E[dist(fa,{0,1})2]=O(√ε).

The triangle inequality also allows us to bound by , by writing it as a polynomial in and bounding separately all the summands.

The Kindler–Safra theorem shows that is -close to a Boolean junta depending on the variables . If then (since there are finitely many options for , up to the choice of ), and so . Choosing appropriately, we can assume that .

Define now , and note that this is an -valued junta of degree at most . The inequality shows that

 ∥f−g∥2=∥∥ ∥∥∑a∈Aa(fa−ga)∥∥ ∥∥2=O(∑a∈A∥fa−ga∥2)=O(√ε).

The theorem now follows directly from Corollary 4.4 (with ). ∎

## 5 Main result: sparse juntas

In this section, we prove our main result, an analog of the Kindler-Safra theorem for all .

###### Theorem 5.1 (Restatement of Theorem 1.5).

For every and of degree there exists a function of degree that satisfies the following properties for :

1. .

2. The coefficients of the -expansion of belong to a finite set (depending only on ).

3. The support of has branching factor .

4. If then is the sum of coefficients of with probability .

The following corollary (proved at the end of this section) for -valued functions which have light Fourier tails follows from the above the theorem.

###### Corollary 5.2.

Let be any positive integer and any finite set. For every and there exists a function of degree that satisfies the following properties for :

1. .

2. .

3. All other properties of (alone) stated in the theorem.

Given and alphabet , let be the constant given by the agreement theorem Theorem 2.4. For the rest of this section, we fix the constant , set and . All hidden constants will depend only on and . For all the prelimary claims till the proof of Theorem 5.1, we further assume that . Finally, as in the hypothesis of the theorem, we assume is a function from to of degree satisfying

The main result of this section extends the generalized Kindler-Safra theorem Theorem 4.1, which holds only for constant , to all values of via the agreement theorem Theorem 2.4. The idea is to consider, for each subset , a “restriction” of obtained by fixing the inputs outside to be . Namely, we define by where is the input that agrees with on the coordinates of and is zero outside of . We will find an approximate structure for each , and then stitch them together using the agreement theorem Theorem 2.4. We start by applying the generalized Kindler-Safra theorem to for subsets selected according to two constant values of (namely, and ).

###### Claim 5.3.

For every set , let

 εS:=Eμ1/4[dist(f|S,A)2], δS:=Eμ1/2[dist(f|S,A)2]

Then , and for every there exist -valued degree  juntas and such that and .

###### Proof.

If and then , and this explains why . The fact that