 # A Composition Theorem via Conflict Complexity

Let (·) stand for the bounded-error randomized query complexity. We show that for any relation f ⊆{0,1}^n ×S and partial Boolean function g ⊆{0,1}^n ×{0,1}, _1/3(f ∘ g^n) = Ω(_4/9(f) ·√(_1/3(g))). Independently of us, Gavinsky, Lee and Santha newcomp proved this result. By an example demonstrated in their work, this bound is optimal. We prove our result by introducing a novel complexity measure called the conflict complexity of a partial Boolean function g, denoted by χ(g), which may be of independent interest. We show that χ(g) = Ω(√((g))) and (f ∘ g^n) = Ω((f) ·χ(g)).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be a relation and be a partial Boolean function. In this work, we bound the bounded-error randomized query complexity of the composed relation from below in terms of the bounded-error query complexitites of and . Our main theorem is as follows.

###### Theorem 1 (Main Theorem).

For any relation and partial Boolean function ,

 \R1/3(f∘gn)=Ω(\R4/9(f)⋅√\R1/3(g)).

Prior to this work, Anshu et. al.  proved that . Although in the statement of their result is stated to be a Boolean function, their result holds even when is a partial Boolean function.

In the special case of being a total Boolean function, Ben-David and Kothari  showed that .

Gavinsky, Lee and Santha  independently proved Theorem 1 (possibly with different values for the error parameters). They also prove this bound to be tight by exhibiting an example that matches this bound. We believe that our proof is sufficiently different and significantly shorter and simpler than theirs. We draw on and refine the ideas developed in the works of Anshu et. al. and Ben-David and Kothari to prove our result.

We define a novel measure of complexity of a partial Boolean function that we refer to as the conflict complexity of , denoted by (see Section 3 for a definition). This quantity is inspired by the Sabotage complexity introduced by ben-David and Kothari. However, the two measures also have important differences. For example, we could show that for any partial function , and are related as follows.

###### Theorem 2.

For any partial Boolean function ,

 χ(g)=Ω(√\R1/3(g)).

See Section 3 for a proof of Throrem 2. Sabotage complexity is known to be similarly related to the bounded-error randomized query complexity (up to a logarithmic factor) when is a total Boolean function. For partial Boolean functions, unbounded separation is possible between sabotage complexity and .

We next prove the following composition theorem.

###### Theorem 3.

Let be an arbitrary set, be a relation and be a partial Boolean function. Then,

 \R1/3(f∘gn)=Ω(\R4/9(f)⋅χ(g)).

To prove Theorem 3 we draw on the techniques developed by Anshu et. al. and ben-David and Kothari. See Section 5 for a proof of Theorem 3. Theorem 1 follows from Theorems 2 and 3.

## 2 Preliminaries

A partial Boolean function is a relation in . For , is defined to tbe the set of strings in for which and . is referred to as the set of valid inputs to . We assume that for all strings , both and are in . For a string , refers to the unique bit such that

. All the probability distributions

over the domain of a partial Boolean function in this paper are assumed to be supported entirely on . Thus is well-defined for any in the support of .

###### Definition 1 (Bounded-error Randomized Query Complexity).

Let be any set. Let be any relation and . The 2-sided error randomized query complexity is the minimum number of queries made in the worst case by a randomized query algorithm (the worst case is over inputs and the internal randomness of ) that on each input satisfies (where the probability is over the internal randomness of ).

###### Definition 2 (Distributional Query Complexity).

Let be any relation, a distribution on the input space of , and . The distributional query complexity is the minimum number of queries made in the worst case (over inputs) by a deterministic query algorithm for which .

In particular, if is a function and is a randomized or distributional query algorithm computing with error , then , where the probability is over the respective sources of randomness.

The following theorem is von Neumann’s minimax principle stated for decision trees.

###### Fact 1 (minimax principle).

For any integer , set , and relation ,

 \Rϵ(h)=maxμ\Dμϵ(h).

Let be a probabilty distribution over . implies that is a random string drawn from . Let be arbitrary. Then is defined tobe the probability distribution obtained by conditioning on the event that the sampled string belongs to , i.e.,

 μ∣C(x)=⎧⎨⎩$0$if x∉Cμ(x)∑y∈Cμ(y)if x∈C

For a partial Boolean function , probability distribution and bit ,

 μb:=μ∣g−1(b).

Notice that and are defined with respect to some Boolean function , which will always be clear from the context.

###### Definition 3 (Subcube, Co-dimension).

A subset of is called a subcube if there exists a set of indices and an assignment function such that . The co-dimension of is defined to be .

Now we define composition of two relations.

###### Definition 4 (Composition of relations).

We now reproduce from the Section 1 the definition of composed relations. Let and be two relations. The composed relation is defined as follows: For and , if and only if there exists such that for each , and .

We will often view a deterministic query algorithm as a binary decision tree. In each vertex of the tree, an input variable is queried. Depending on the outcome of the query, the computation goes to a child of . The child of corresponding to outcome to the query made is denoted by .

It is well known that the set of inputs that lead the computation of a decision tree to a certain vertex forms a subcube. We will denote use the same symbol (e.g. ) to refer to a vertex as well as the subcube associated with it.

The depth of a vertex in a tree is the number of vertices on the unique path from the root of the tree to in the tree. Thus, the depth of the root is .

###### Definition 5.

Let be a decision tree on bits. Let and be two probability distributions with disjoint supports. Let be a vertex in . Let variable be queried at . Then,

 Δ(v):={|Prx∼η0[xi=0]−Prx∼η1[xi=0]|if v≠⊥.1if v=⊥.

Note that is defined with respect to distributions and . In our application, we will often consider a decision tree , a partial Boolean function and a probability distributions over the inputs. , for a vertex of , will then be assumed to be with respect to the distributions .

###### Claim 2.

Let be a decision tree on bits. Let be a partial Boolean function. Let be sampled from a distribution . Let be a vertex in . Let variable be queried at . Then,

where is with respect to the distributions .

###### Proof of Claim 2.

Define . Condition on the event . Let be the distribution over pairs of bits, where the bits are distributed independently according to the distributions of and respectively. We use the equivalence: . Now, an application of Pinsker’s inequality implies that

 \Div((b,xi)||(b⊗xi))≥2||(b,xi)−(b⊗xi)||21. (1)

Next, we bound . To this end, we fix bits , and bound . We have that,

 Pr[(b,xi)=(z1,z2)] =Pr[b=z1]Pr[xi=z2∣b=z1]. (2)

Now,

 Pr[(b⊗xi)=(z1,z2)] =Pr[b=z1]Pr[xi=z2] =Pr[b=z1](Pr[b=z1]Pr[xi=z2∣b=z1]+ Pr[b=¯¯¯¯¯z1]Pr[xi=z2∣b=¯¯¯¯¯z1]). (3)

Taking the absolute difference of (2) and (2) we have that,

 |Pr[(b,xi)=(z1,z2)]−Pr[(b⊗xi)=(z1,z2)]| =Pr[b=z1]⋅Pr[b=¯¯¯¯¯z1]⋅Δ(v)=Pr[b=0]⋅Pr[b=1]⋅Δ(v) (4)

The Claim follows by adding (4) over and using (1). ∎

## 3 Conflict Complexity

In this section, we introduce a randomized process (formally given in Algorithm 1). This process is going to play a central role in the proof of our composition theorem (Theorem 3). Later in the section, we use to define the conflict complexity of a partial Boolean function .

Let be any integer and be any deterministic query algorithm that runs on inputs in . can be though of as just a query procedure that queries various input variables, and then terminates without producing any output. Let be a generic input to , and stand for . For a vertex of denotes the subcube in corresponding to , i.e., . Recall from Section 2 that for , stands for the child of corresponding to the query outcome being . Let and be any two probability distributions supported on and respectively. Let be arbitrary. Now consider the probabilistic process given by Algorithm 1. Note that can be thought of as a randomized query algorithm on input , where a query to corresponds to an assignment of to in line 2. This view of will be adopted in Section 5.

We now prove an important structural result about which will be used many times in our proofs. Consider the following distribution on : For each , sample independently from .

Let be a vertex of . Let be the event that process reaches node , and be the event that for a random input sampled from , the computation of reaches node .

###### Claim 3.

For each vertex of ,

 Pr[AB(v)]=Pr[BB(v)].
###### Proof.

We will prove by induction on the depth of , i.e., the number of vertices on the unique path from the root to in .

Base case:

. is the root of . Thus .

Inductive step:

Assume that , and that the statement is true for all vertices at depth at most . Since , is not the root of . Let be the ancestor of , and variable be queried at . without loss of generality assume that is the child of corresponding to . We split the proof into the following two cases.

• Case 1: .

Condition on and . The probability that reaches is . Now, condition on and . The probability that reaches is exactly equal to the probability that the real number sampled at lies in , which is equal to . Thus,

 Pr[AB(v] =Pr[AB(u)].Pr[AB(v)∣AB(u)] =Pr[AB(u)]⋅Prxi∼μzi[x(j)i=0∣xi∈ui]. (5)

Now condition on . The probability that reaches is exactly equal to the probability that when is sampled according to the distribution conditioned on the event that . Note that in the distribution , the ’s are independently distributed. Thus,

 Pr[BB(v)] =Pr[BB(u)].Pr[BB(v)∣BB(u)] =Pr[BB(u)]⋅Prxi∼μzi[x(j)i=0∣xi∈ui]. (6)

By the inductive hypothesis, . The claim follows from (5) and (6).

• Case 2: . Let be the child of corresponding to . By an argument similar to Case 1, we have that

 Pr[AB(v′)]=Pr[BB(v′)]. (7)

Now,

 Pr[AB(v)] =Pr[AB(u)]−Pr[AB(v′)] =Pr[BB(u)]−Pr[AB(v′)]\ % \ \ \ \ (By inductive hypothesis) =Pr[BB(u)]−Pr[BB(v′)]\ % \ \ \ \ \ (By (???)) =Pr[BB(v].

Let , and be a decision tree that computes . Consider process on . Note that is set to with probability . To see this observe that as long as , the current subcube contains strings from the supports of both and , and hence from both and . If is not set to for the entire run of , then there exist inputs which belong to the same leaf of , contradicting the hypothesis that computes

. Let the random variable

stand for the value of the variable after the termination of . Note that is equal to the the index of the iteration of the while loop in which is set to . The distribution of depends on and , which in our applications will either be clear from the context, or clearly specified. Note that the distribution of is independent of the value of .

###### Definition 6.

The conflict complexity of a partial Boolean function with respect to distributions and supported on and respectively, and decision tree computing , is defined as:

 χ(μ0,μ1,B)=\E[N].\lx@notefootnoteAsobservedbefore,thechoicesof$μ0,μ1$and$B$arebuiltintothedefinitionof$N$.

The conflict complexity of is defined as:

 χ(g)=maxμ0,μ1minBχ(μ0,μ1,B).

Where the maximum is over distributions and supported on and respectively, and the minimum is over decision trees computing .

For a pair of distributions, let be the decision tree computing such that is minimized. We call such a decision tree an optimal decision tree for . We conclude this section by making an important observation about the structure of optimal decision trees. Let be any node of . Let and . Let denote the subtree of rooted at . We observe that is an optimal tree for and ; if it is not then we could replace it by an optimal tree for and , and for the resultant tree, the expected value of with respect to and will be smaller than that in . This will contradict the optimality of . This recursive sub-structure property of optimal trees will be helpful to us.

## 4 Conflict Complexity and Randomized Query Complexity

In this section, we will prove Theorem 2 (restated below). See 2

###### Proof.

We will bound the distributional query complexity of for each input distribution with rspect to error , , from above by . Theorem 2 will follow from the minimax principle (Fact 1), and the observation that the error can be brought down to by constantly many independent repetitions followed by a selection of the majority of the answers. It is enough to consider distributions supported on valid inputs of . To this end, fix a distribution supported only on .

Let . Let be the distribution obtained by conditioning on the event . Let be an optimal decision tree for distributions and . Clearly .

We first prove some structural results about . Let be run on a random input sampled according to . Let be the random vertex at which the -th query is made; If terminates before making queries, define . Let be any event which is a collection of possible transcripts of , such that . Recall from Section 2 that for any vertex of , is assumed to be with respect to the probability distribution .

###### Claim 4.
 10d∑t=1\E[Δ(vt)∣E]≥1320.
###### Proof.

Let us sample vertices of as follows:

1. Set

2. Run process for .

3. Let be the vertex in the beginning of the -th iteration of the while loop of Algorithm 1. Return . If the simulation stops after iterations, set for all .

By Claim 3, and since has the same distribution as that of where is sampled from , the vertices and have the same distribution. In the above sampling process for each , let be the event that in the beginning of the -th iteration of the while loop of Algorithm 1. Conditioned on , the probability that is set to in the -th iteration is 222Note that conditioned on , .. By union bound we have that,

 10d∑t=1\E[Δ(vt)∣E] =10d∑t=1\E[Δ(ut)∣E] ≥10d∑t=1Pr[Et∣E]⋅\E[Δ(ut)∣Et,E] ≥Pr⎡⎢⎣¯¯¯¯¯¯¯¯¯¯¯¯¯10d⋂t=1Et∣E⎤⎥⎦ ≥Pr⎡⎢⎣¯¯¯¯¯¯¯¯¯¯¯¯¯10d⋂t=1Et⎤⎥⎦−Pr[¯¯¯E]. (8)

Now, since , we have by Markov’s inequality that the probability that the process , when run for and the random bit generated as above333Recall that the distribution of is independent of ., sets to within first iterations of the while loop, is at least . Thus we have that,

 Pr[10d⋂t=1Et]c≥910. (9)

The claim follows from (4), (9) and the hypothesis . ∎

The next Lemma follows from Claim 4 and the recursive sub-structure property of optimal trees discussed in the last paragraph of Section 3.

###### Lemma 5.

Let be any positive integer. Then,

 10di∑t=1\E[Δ(vt)∣E]≥13i20.

Notice that if terminates before making queries, and .

###### Proof of Lemma 5.

For , let be any vertex at depth . Consider the subtree of rooted at . By the recursive sub-structure property of , is an optimal tree for distributions . Let be the random vertex at depth of , when is run on a random input from . By Claim 4, we have that,

 10d∑t=1\E[Δ(wt)∣E]≥1320. (10)

In (10), is with respect to distributions . Now, when is the random vertex , is the random vertex . Thus from (10) we have that,

 10(j+1)d∑t=10jd+1\E[Δ(vt)∣E]≥1320. (11)

The claim follows by adding (11) over . ∎

We now finish the proof of Theorem 2 by showing that . Let be distributed according to , and be run on . Let denote the event that in at most queries, the computation of reaches a vertex for which . Let denote the event that terminates after making at most queries. Let .

Consider the following decision tree : Start simulating . Terminate the simulation if one of the following events occurs. The outputs in each case is specified below.

1. (Event ) If terminates, terminate and output what outputs.

2. If queries have been made and the computation is at a vertex , terminate and output .

By construction, makes at most queries in the worst case. We shall show that . This will prove Theorem 2.

We split the proof into the following two cases.

Case :

.

First, condition on the event that the computation reaches a vertex for which holds. Thus one of and is at most . Hence, . Let be the random leaf of the subtree of rooted at at which the computation ends. The probability that errs is at most

 \Ex∼μ∣v[12−12∣∣∣Prx∼μ[g(x)=0∣x∈m]−Prx∼μ[g(x)=1∣x∈m]∣∣∣]. ≤12−12∣∣∣\Ex∼μ∣vPrx∼μ[g(x)=0∣x∈m]−\Ex∼μ∣vPrx∼μ[g(x)=1∣x∈m]∣∣∣ \ \ \ \ (By Jensen's inequality) =12−12∣∣∣Prx∼μ[g(x)=0∣x∈v]−Prx∼μ[g(x)=1∣x∈v]∣∣∣≤13.

Then, condition on the event . The probability that errs is .

Thus we have shown that conditioned on the probability that errs is at most . Thus the probability that errs is at most .

Case :

.

By Claim 5 we have that

 10d2∑t=1\E[Δv(t)∣E]≥13d20. (12)

Let be the tuple formed by the random input variable queried at the -th step by , and the outcome of the query; if terminates before -th step, . Notice that the vertex at which the -th query is made is determined by and vice versa. We have,

 \I(a1,…,a10d2:g(x)) =10d2∑i=1\I(ai:g(x)∣a1,…,ai−1)%    (Chainruleofmutualinformation) =10d2∑i=1\I(bi:g(x)∣vi) ≥3210d2∑i=1\E[1vi≠⊥⋅[Pr[g(x)=0∣x∈vi]⋅Pr[g(x)=1∣x∈vi]⋅Δ(vi)]2] \ \ \ (From Claim~{}???) ≥3210d2∑i=1Pr[E]⋅\E[[Pr[g(x)=0∣x∈vi−1]⋅Pr[g(x)=1∣x∈vi−1]⋅Δ(vi)]2∣E] \ \ \ \ (Conditioned on E,vi≠⊥) ≥3210d2∑i=134⋅19⋅\E[Δ(vi)2∣E] =8310d2∑i=1\E[Δ(vi)2∣E]\ \ \ \ \ \ (By the assumption Pr[¯¯¯E]≤14 ) ≥83⋅110d2⎛⎝10d2∑i=1\E[Δ(vi)∣E]⎞⎠2(By Cauchy-Schwarz % inequality) ≥110.\ \ \ \ (From~{}(???)) (13)

Hence, from (13) we have

 \Hen(g(x)∣a1,…av10d2)≤1−110=910. (14)

Let be the set of leaves of such that . For each , . Conditioned on , the probability that errs is at most . By Markov’s inequality and (14), it follows that . Thus