 # On parity decision trees for Fourier-sparse Boolean functions

We study parity decision trees for Boolean functions. The motivation of our study is the log-rank conjecture for XOR functions and its connection to Fourier analysis and parity decision tree complexity. Let f be a Boolean function with Fourier support S and Fourier sparsity k. 1) We prove via the probabilistic method that there exists a parity decision tree of depth O(sqrt k) that computes f. This matches the best known upper bound on the parity decision tree complexity of Boolean functions (Tsang, Wong, Xie, and Zhang, FOCS 2013). Moreover, while previous constructions (Tsang et al., FOCS 2013, Shpilka, Tal, and Volk, Comput. Complex. 2017) build the trees by carefully choosing the parities to be queried in each step, our proof shows that a naive sampling of the parities suffices. 2) We generalize the above result by showing that if the Fourier spectra of Boolean functions satisfy a natural "folding property", then the above proof can be adapted to establish existence of a tree of complexity polynomially smaller than O(sqrt k). We make a conjecture in this regard which, if true, implies that the communication complexity of an XOR function is bounded above by the fourth root of the rank of its communication matrix, improving upon the previously known upper bound of square root of rank (Tsang et al., FOCS 2013, Lovett, J. ACM. 2016). 3) It can be shown by elementary techniques that for any Boolean function f and all pairs (alpha, beta) of parities in S, there exists another pair (gamma, delta) of parities in S such that alpha + beta = gamma + delta. We show, among other results, that there must exist several gamma in F_2^n such that there are at least three pairs (alpha_1, alpha_2) of parities in S with alpha_1 + alpha_2 = gamma.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The log-rank conjecture [LS88] is a fundamental unsolved question in communication complexity that states that the deterministic communication complexity of a Boolean function is polynomially related to the logarithm of the rank (over real numbers) of its communication matrix. The importance of the conjecture stems from the fact that it proposes to characterize communication complexity, which is an interactive complexity measure, by the rank of a matrix which is a traditional and well-understood algebraic measure. In this work we focus on the important and well-studied class of XOR functions. Consider a two-party function whose value on any input depends only on the bitwise XOR of and , i.e., there exists a function such that for each , . Such a function is called an XOR function, and is denoted as . The log-rank conjecture and communication complexity of such an XOR function has interesting connections with the Fourier spectrum of . For example, it is known that the rank of the communication matrix of equals the Fourier sparsity of (henceforth referred to as [BC99]. The natural randomized analogue of the log-rank conjecture is the log-approximate-rank conjecture [LS09], which was recently refuted by Chattopadhyay, Mande, and Sherif [CMS19]. The quantum analogue of the log-rank conjecture was subsequently also refuted by Sinha and de Wolf [SdW19] and Anshu, Boddu, and Touchette [ABT19]. It is worth noting that an XOR function was used to refute these conjectures.

To design a cheap communication protocol for , an approach adopted by many works [STV17, TWXZ13, MO09] is to design a small-depth parity decision tree (henceforth referred to as PDT) for , and having a communication protocol simulate the tree; it is easy to see that the parity of a subset of bits of the string can be computed by the communicating parties by interchanging two bits. The parity decision tree complexity (henceforth referred to as PDT()) of thus places an asymptotic upper bound on the communication complexity of . The work of Hatami, Hosseini and Lovett [HHL18] shows that this approach is polynomially tight; they showed that is polynomially related to the deterministic communication complexity of . In light of this, the log-rank conjecture for XOR functions is readily seen to be equivalent to being polylogarithmic in .

However, we are currently very far from achieving this goal. Lovett [Lov16] showed that the deterministic communication complexity of any Boolean function is bounded above by . In particular, this implies that that the deterministic communication complexity of is . Improving upon a work of Shpilka et al. [STV17], Tsang et al. [TWXZ13] showed that . In addition to bounding instead of the communication complexity of , Tsang et al. achieved a quantitative improvement by a logarithmic factor over Lovett’s bound for the class of XOR functions. Sanyal [San19] showed that the simultaneous communication complexity of (characterized by the Fourier dimension of ) is bounded above by , and is tight (up to the factor) for the addressing function.

In this work we derive new understanding about the structure of Fourier spectra of Boolean functions. Aided by this insight we reprove the upper bound on (see Sections 3.1 and 3.2). We conditionally improve this bound by a polynomial factor, assuming a “folding property” of the Fourier spectra of Boolean functions (see Section 3.3). To prove these results, we make use of a simple necessary condition for a function to be Boolean (see Proposition 2.5). While we show that it is not a sufficient condition (see Theorem A.1 in Appendix A), it does enable us to prove the above results. In these proofs, we use Proposition 2.5 in conjunction with probabilistic and combinatorial arguments. Finally, we make progress towards establishing the folding property (see Section 3.4). Here we use the well-known characterization of Boolean functions given by two conditions, namely Parseval’s identity (Equation (2)) and a condition attributed to Titsworth (Equation (3)), in conjunction with combinatorial arguments.

### 1.1 Organization of this paper

In Section 2 we review some preliminaries and introduce the notation that we use in this paper. In this section we also introduce definitions and concepts that are needed to state our results formally. In Section 3 we motivate and formally state our results, and discuss proof techniques. The formal proofs of our main results can be found in Sections 45, and 6.

## 2 Notation and preliminaries

All logarithms in this paper are taken with base 2. As is standard, we use the notation () to convey that there exists a constant such that that (, , respectively). We use the notation to denote the set . For any set , we use the notation to denote the set of all subsets of of size exactly . We abuse notation and denote a generic element of as rather than . When we use the notation , the underlying distribution corresponds to being sampled uniformly at random from . We use the symbol “” to denote both coordinate-wise addition over as well as addition over reals; the meaning in use will be clear from context. For sets , denotes the sumset defined by . For a set and , we denote by the set . The above convention also extends to the symbol “

”. For a set of vectors

, we define to be the set of all -linear combinations of vectors in , i.e., .

Consider the vector space of functions from to , equipped with the following inner product.

 ⟨f,g⟩:=Ex∈Fn2[f(x)g(x)]=12n∑x∈Fn2f(x)g(x).

Let . For each , define (mod 2), and the associated character by . Observe that is the -valued parity of the bits ; due to this we will also refer to characters as parities. The set of parities forms an orthonormal (with respect to the above inner product) basis for this vector space. Hence, every function can be uniquely written as , where . The coefficients are called the Fourier coefficients of .

For any function and any set , define the function by for all . In other words, denotes the restriction of to .

Throughout this paper, for any Boolean function , we denote by the Fourier support of , i.e. . We also denote by the Fourier sparsity of , i.e. . The dependence of and on is suppressed and the underlying function will be clear from context.

The representation of Fourier coefficients as an expectation (over ) immediately yields the following observation about granularity of Fourier coefficients of Boolean functions.

###### Observation 2.1.

Let be any Boolean function. Then, for all , is an integral multiple of .

We next define plateaued functions.

###### Definition 2.2 (Plateaued functions).

A Boolean function is said to be plateaued if there exists such that for all .

Next we define the addressing function.

Let be an even power of . The addressing function is defined as

where for , and is the unique integer in whose binary representation is .

The Fourier sparsity of can be verified to be . We now define a notion of equivalence on elements of .

###### Definition 2.4.

For any Boolean function , we say a pair is equivalent to if .

In the above definition, if , then we say that the pairs and fold in the direction . We also say that the elements , and participate in the folding direction . It is not hard to verify that the notion of equivalence defined above does indeed form an equivalence relation. We will denote by the equivalence class of pairs that fold in the direction , i.e.,

 Oγ:={(α,β)∈(S2) ∣∣∣ α+β=γ}.

We suppress the dependence of on the underlying function , which will be clear from context. Unless mentioned otherwise, these are the equivalence classes under consideration throughout this paper.

For any Boolean function , we have for each :

 1=f2(x)=∑γ∈Fn2⎛⎜⎝∑(α1,α2)∈Fn2×Fn2:α1+α2=γˆf(α1)ˆf(α2)⎞⎟⎠χγ(x). (1)

Matching the constant term of each side of the above identity we have

 ∑α∈Fn2ˆf(α)2=1, (2)

which is commonly referred to as Parseval’s identity for Boolean functions. By matching the coefficient of each non-constant on each side of Equation (1) we obtain

 ∀γ≠∅,∑(α1,α2)∈Fn2×Fn2:α1+α2=γˆf(α1)ˆf(α2)=0. (3)

Equation (3) is attributed to Titsworth [Tit62]. The following proposition is an easy consequence of Equation (3). It provides a necessary condition for a subset of to be the Fourier support of a Boolean function.

###### Proposition 2.5.

Let be a Boolean function. Then, for all , there exists such that . In other words, .

The Fourier -norm of is defined as . By the Cauchy-Schwarz inequality and Equation (2), we have

 ∥ˆf∥1≤√k√∑α∈Fn2ˆf(α)2=√k. (4)

We next formally define parity decision trees.

A parity decision tree (PDT) is a binary tree whose leaf nodes are labeled in , each internal node is labeled by a parity and has two outgoing edges, labeled and . On an input , the tree’s computation proceeds from the root down as follows: compute as indicated by the node’s label and following the edge indicated by the value output, and continue in a similar fashion until a reaching a leaf, at which point the value of the leaf is output. When the computation reaches a particular internal node, the PDT is said to query the parity label of that node. The PDT is said to compute a function if its output equals the value of for all . The parity decision tree complexity of , denoted is defined as

 PDT(f):=minT:T is a PDT computing fdepth(T).

### 2.1 Restriction to an affine subspace

In this section we discuss the effect of restricting a function to an affine subspace, on the Fourier spectrum of .

###### Definition 2.6.

A set is called an affine subspace

if there exist linearly independent vectors

and such that . is called the co-dimension of .

Consider a set of vectors in . Define the set , and let to be the cosets of that have non-trivial intersection with . For each , let denote an arbitrary but fixed element in

. In light of this, we write the Fourier transform of

as

 f(x)=∑C∈C⎛⎝∑γ∈Gˆf(α(C)+γ)χγ(x)⎞⎠χα(C)(x), (5)

For any such fixed , the value of the sum that appears in Equation (5) is determined by the values . We denote this sum by .

For , let be the affine subspace . It follows immediately that the Fourier transform of is given by

 f∣Hb(x)=∑C∈CPC(b1,…,bt)χα(C)(x). (6)

In particular, for each , the Fourier sparsity of is bounded above by .

We note here that each element in is mapped to a unique element in . The elements of can thus be thought of as buckets that form a partition of . Keeping this view in mind we define the following.

###### Definition 2.7 (Bucket complexity).

Let be any Boolean function. Consider a set of vectors in . Let , and let denote the set of cosets of that have non-empty intersection with , that is, . Define the bucket complexity of with respect to , denoted , as

 B(f,G)=|C|.

We now make the following useful observation, which follows from Equation (6).

###### Observation 2.8.

Let and be as in Definition 2.7. Let be arbitrary. Let be the affine subspace . Let be the Fourier sparsity of . Then .

###### Definition 2.9 (Identification of characters).

For , and as in Definition 2.7 and any , we say that and are identified with respect to if , or equivalently, if and belong to the same coset in .

The following observation plays a key role in the results discussed in this paper.

###### Observation 2.10.

Let and be as in Definition 2.7. If there exists a set of size such that each is identified with some other with respect to , then .

###### Proof.

Since , there are at most cosets in that contain at least one element from . Next, each coset in that contains only elements from has at least 2 elements (by the hypothesis). Hence, the number of cosets containing only elements from is at most . Combining the above two, we have that . ∎

### 2.2 Folding properties of Boolean functions

###### Definition 2.11.

Let be any Boolean function. We say that is -folding if

 ∣∣∣{(α,β)∈(S2) ∣∣∣ |Oα+β|≥kℓ+1}∣∣∣≥δ(k2).

Proposition 2.5 implies that any Boolean function is -folding.

We next show by a simple averaging argument that if has “good folding properties”, then there are many , such that is large for many .

###### Claim 2.12.

Let be -folding with sufficiently large. Define

 U:={α∈S ∣∣ there exist at least δk/2 many β∈S∖{α} with |Oα+β|≥kℓ+1}.

Then .

###### Proof.

For each , define . By the hypothesis, . We have

 |U|⋅k+(k−|U|)⋅δk2≥∑α∈St(α)≥δk(k−1) ⟹ |U|(k−δk2)≥δk2−δk−δk22⟹|U|≥δ(k−2)2−δ,

implying for sufficiently large .

## 3 Our contributions

In this section we give a high-level account of our contributions in this paper. In Section 3.1 we discuss the PDT construction of Tsang et al. We motivate, state our results, and briefly discuss proof ideas in Sections 3.23.3, and 3.4.

### 3.1 Low bucket complexity implies shallow PDTs

The following lemma follows from [TWXZ13, Lemma 28] and Equation (4).

###### Lemma 3.1 (Tsang, Wong, Xie, and Zhang).

Let be any Boolean function. Then there exists an affine subspace of of co-dimension such that is constant on .

Let be the affine subspace obtained from Lemma 3.1, where . Define . We next observe that . To see this, note that since is constant, we have from Equation (6) that for each coset and any ,

 PC(b1,…,bt)={±1if 0n∈C0otherwise.

Since is a non-constant function, this implies that each has at least terms, i.e., each is identified with some other with respect to . Observation 2.10 implies that . Observation 2.8 implies that the Fourier sparsity of the restriction of to each coset of is at most .

This immediately leads to a recursive construction of a PDT for of depth as follows. The first step is to query the parities . After this step, each leaf of the partial tree obtained is a restriction of to some coset of . Next we recursively compute each leaf. Since after each batch of queries, the sparsity reduces by a factor of , the depth of the tree thus obtained is .

### 3.2 A random set of parities achieves low bucket complexity

Tsang et al. proved Lemma 3.1

by an iterative procedure in each step of which a single parity is carefully chosen. We show in this paper that a randomly sampled set of parities achieves the desired bucket complexity upper bound with high probability. More specifically, for a parameter

, consider the procedure SampleParity() described in Algorithm 1.

Our first result shows that the set returned by SampleParity satisfies with high probability.

###### Theorem 3.2.

Let be a Boolean function and be large enough. Let and be the random set of parities returned by SampleParity(). There exists a constant such that

 E[B(f,span R)]≤ck.

With high probability we have . By an argument analogous to the discussion in the previous section, Theorem 3.2 recovers the upper bound on . An additional insight that our work provides is that a PDT of depth can be obtained by a naive sampling procedure applied iteratively.

We note here that while Tsang et al. prove a bucket complexity upper bound of via Lemma 3.1 which restricts the function to a constant, we derive a bucket complexity upper bound of by analyzing the procedure SampleParity.

#### Proof idea.

Fix any . Proposition 2.5 implies that for every , there exists such that . Observe that if two parities in the set are chosen in , then is identified with the third parity in w.r.t. . Now, the expected number of for which the aforementioned identification occurs is seen by linearity of expectation to be , which is by the choice of . The crux of the proof is in strengthening this bound on expectation to conclude that with constant probability, there exists at least one such that the above identification occurs. Theorem 3.2 follows by linearity of expectation over , and an invocation of Observation 2.10.

We prove Theorem 3.2 in Section 4.2. In Section 4.1 we prove a weaker statement that admits a simpler proof, and yet contains some key ideas that go into the proof of Theorem 3.2.

### 3.3 Good folding yields better PDTs

Assume that for any Boolean function there exist such that . This is a weaker assumption on than it being -folding. Observation 2.10 implies that . This suggests the following PDT for . First the parity is queried at the root. Observation 2.8 implies that the Fourier sparsity of restricted to the affine subspace (of co-dimension 1) corresponding to each outcome of this query is at most

. Repeating this heuristic recursively for each leaf leads to a PDT of depth

.

We have now set up the backdrop to introduce our next contribution. In the preceding discussion we had assumed the following about any Boolean function : there exists a pair in with a large equivalence class. One implication of our next result is that if we instead assume that any Boolean function is -folding, the procedure SampleParity with set to achieves a bucket complexity upper bound of with high probability. By an argument analogous to the discussion in Section 3.1 (also see Corollary 3.4), this yields a PDT with depth . This is a quadratic improvement over the bound discussed in the last paragraph. Besides, it can be seen to recover (up to a logarithmic factor) our first result by setting , since any Boolean function is -folding by Proposition 2.5.

###### Theorem 3.3.

Let and . Let be -folding with sufficiently large. Set and let be the random subset of that SampleParity() returns. Then with probability at least , .

The proof of Theorem 3.3 proceeds along the lines of that of Theorem 3.2, but is more technical. We prove it in Section 5.

This yields the following corollary.

###### Corollary 3.4.

Let and . Suppose all Boolean functions with sufficiently large are -folding. Then,

 PDT(f)=˜O(k(1−ℓ)/2).
###### Proof.

Fix any Boolean function with sufficiently large . Let and be as in the statement of Theorem 3.3. Since is a constant, . By Theorem 3.3, we have , for some , with probability strictly greater than . By a Chernoff bound with probability strictly greater than . Finally, by a union bound, we have that with non-zero probability the set returned by SampleParity() satisfies both and , for some . Choose such an and consider the following PDT for , whose construction closely follows the discussion in Section 3.1.

First, query all parities in . Now, let be the affine subspace corresponding to an arbitrary leaf of this partial tree. By the properties of and Observation 2.8, we have that the Fourier sparsity of is at most . Repeat the same process inductively for each leaf. The depth of the resultant tree is at most . ∎

Corollary 3.4 naturally raises the question of whether all Boolean functions are -folding.

###### Question 3.5.

Do there exist constants such that every Boolean function is -folding?

An affirmative answer to Question 3.5 in conjunction with Corollary 3.4 and the discussion in Section 1 implies an upper bound on the communication complexity of XOR functions that is polynomially smaller than the best known bound of .

What is the largest for which all Boolean functions are -folding? The addressing function (see Definition 2.3) is -folding, and not -folding for any (see Appendix B). In light of this, we make the following conjecture.

###### Conjecture 3.6.

There exists a constant such that any Boolean function is -folding.

Assuming Conjecture 3.6, Corollary 3.4 would imply an upper bound of on the communication complexity of XOR functions .

### 3.4 Boolean functions have non-trivial folding properties

Recall that Conjecture 3.6 states that any Boolean function is -folding with and . Also recall from Proposition 2.5 that a necessary condition for a function to be Boolean valued is that it is -folding with and . We show in the appendix (see Theorem A.1) that the conditions in Proposition 2.5 are not sufficient for a function to be Boolean valued.

To the best of our knowledge, it was not known prior to our work whether any better bound than this was known for Boolean functions (in terms of , for any non-zero ). In particular, it was consistent with prior knowledge that there exist functions for which each equivalence class of contains exactly 2 elements. We rule out this possibility, and our contribution is a step towards Conjecture 3.6.

###### Theorem 3.7.

For any Boolean function with , and every , there exists such that .

In order to rule out the possibility mentioned above, it suffices to exhibit a single pair with . Theorem 3.7 further shows that every element participates in such a pair.

#### Proof idea

We prove this via a series of arguments. Define and . We first show that if there exists with for all , then both of the following hold.

1. Either or

is odd.

2. The function must be plateaued.

The proofs use Equation (3). Next, we show that for plateaued Boolean functions, both and are even, yielding a contradiction in view of the first bullet above. This proof involves a careful analysis of the Fourier coefficients and crucially uses Observation 2.1 and Equation (2).

A natural question raised by Theorem 3.7 is whether there exists a Boolean function and such that there exists only one element with . The following theorem answers this question in the positive, and sheds more light on the structure of such functions.

###### Theorem 3.8.
1. There exists a Boolean function and such that for all .

2. Let be any Boolean function. If there exists such that for all , then .

The proof of Part 2 of Theorem 3.8 follows along the lines of the proof of Theorem 3.7. The proof of Part 1 of Theorem 3.8 constructs such a function by applying a simple modification to the addressing function.

We prove Theorems 3.7 and 3.8 in Section 6.

## 4 Proof of Theorem 3.2

In this section we prove our first result, Theorem 3.2.

### 4.1 Warm up: sampling ˜O(k3/4) parities.

In this section we prove a quantitatively weaker statement. This admits a simpler proof and introduces many key ideas that go into our proof of Theorem 3.2.

###### Claim 4.1.

Let , and let be the set returned by SampleParity(). Then

 Pr[B(f,span R)≤k/2]≥1−1k1/3.

By a Chernoff bound, with high probability, .

###### Proof.

Fix any . By Proposition 2.5 we have that for each , there exist such that . Define . Note that the sets are not necessarily distinct. Define the multiset of unordered triples . For each , define . We now show that with high probability there exists such that . We consider two cases below.

Case 1: There exists such that .

Consider the multiset of unordered pairs . Each pair in can repeat at most thrice. Hence there are at least distinct pairs in . Moreover the distinct pairs in are disjoint. This can be inferred from the observation that the sum of the two elements in each pair in equals . Thus

 Pr[∀A∈A,A⊈R]≤(1−p2)k1/2/3=(1−4logkk1/2)k1/2/3≤1k4/3.
Case 2: For each , .

In this case each triple in has non-empty intersection with at most sets in . Thus one can greedily obtain a collection of at least disjoint triples in .

 Pr[∀T∈T,|T∩R|<2]≤(1−p2)k−13k1/2=(1−4logkk1/2)k−13k1/2,

which is at most for large enough .

From the above two cases it follows that with probability at least , there exists a triple such that . Assume existence of such a triple , and let . Let . Since , we have that , i.e., is identified with with respect to . By a union bound over all it follows that with probability at least , for every there exists a such that is identified with w.r.t. . The claim follows by Observation 2.10. ∎

### 4.2 Sampling O(k1/2) parities

We now proceed to prove Theorem 3.2 by refining the ideas developed in Section 4.1. Recall that by a Chernoff bound, with high probability (where is as in Theorem 3.2). We require the following inequality.

###### Proposition 4.2.

For any non-negative integer , and be such that . Then,

 (1−p)d≤1−12pd.
###### Proof.

The proof proceeds via induction on .

Base case: d=0.

The statement can be easily verified to be true; each side evaluates to 1.

Inductive step:

Assume that the statement is true for and all such that . We now show that the hypothesis holds for and all . We have

 (1−p)d+1= (1−p)⋅(1−p)d ≤ (1−p)(1−12pd) by inductive hypothesis, since pd≤p(d+1)≤1 = 1−(12p+12pd)−12p+12p2d ≤ 1−12p(d+1). since pd≤1

###### Proof of Theorem 3.2.

For technical reasons we instead consider a two-step probabilistic procedure. Define . Let and be the sets returned by two independent runs of SampleParity(), and let . Each is independently included in with probability equal to . Hence it suffices to prove that there exists a constant such that .

Fix any and let and be as in the proof of Claim 4.1. For , define