When Is Amplification Necessary for Composition in Randomized Query Complexity?

Suppose we have randomized decision trees for an outer function f and an inner function g. The natural approach for obtaining a randomized decision tree for the composed function (f∘ g^n)(x^1,…,x^n)=f(g(x^1),…,g(x^n)) involves amplifying the success probability of the decision tree for g, so that a union bound can be used to bound the error probability over all the coordinates. The amplification introduces a logarithmic factor cost overhead. We study the question: When is this log factor necessary? We show that when the outer function is parity or majority, the log factor can be necessary, even for models that are more powerful than plain randomized decision trees. Our results are related to, but qualitatively strengthen in various ways, known results about decision trees with noisy inputs.

Authors

• 11 publications
• 7 publications
• 12 publications
• 2 publications
• On Learning and Testing Decision Tree

In this paper, we study learning and testing decision tree of size and d...
08/10/2021 ∙ by Nader H. Bshouty, et al. ∙ 0

• Fourier Growth of Parity Decision Trees

We prove that for every parity decision tree of depth d on n variables, ...
03/22/2021 ∙ by Uma Girish, et al. ∙ 0

• One-way communication complexity and non-adaptive decision trees

We study the relationship between various one-way communication complexi...
05/05/2021 ∙ by Nikhil S. Mande, et al. ∙ 0

• Decision Trees for Function Evaluation - Simultaneous Optimization of Worst and Expected Cost

In several applications of automatic diagnosis and active learning a cen...
09/11/2013 ∙ by Ferdinando Cicalese, et al. ∙ 0

• Testing and reconstruction via decision trees

We study sublinear and local computation algorithms for decision trees, ...
12/16/2020 ∙ by Guy Blanc, et al. ∙ 0

• Neural Decision Trees

In this paper we propose a synergistic melting of neural networks and de...
02/23/2017 ∙ by Randall Balestriero, et al. ∙ 0

• Bi-National Delay Pattern Analysis For Commercial and Passenger Vehicles at Niagara Frontier Border

Border crossing delays between New York State and Southern Ontario cause...
11/13/2017 ∙ by Zhenhua Zhang, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A deterministic decision tree for computing a partial function is a binary tree where each internal node is labeled with an index from and each leaf is labeled with an output value from . On input , the computation follows a root-to-leaf path where at a node labeled with index , the value of is queried and the path goes to the left child if and to the right child if . The leaf reached on input must be labeled with the value (if the latter is defined). The cost of the decision tree is its depth, i.e., the maximum number of queries it makes over all inputs. The deterministic query complexity of is the minimum cost of any deterministic decision tree that computes . We will consider several more general models of decision trees (randomized, etc.), so we repurpose traditional complexity class notation to refer to the various associated query complexity measures. Since P is the traditional complexity class corresponding to deterministic computation, we let denote the deterministic query complexity of . (Some of the recent literature uses the notation , but this paper deals exclusively with decision trees, so we drop the dt superscript.)

A randomized decision tree is a probability distribution over deterministic decision trees. Computing

with error means that for every input (for which is defined), the probability that the output is not is at most . The cost of a randomized decision tree is the maximum depth of all the deterministic trees in its support. The randomized query complexity is the minimum cost of any randomized decision tree that computes with error . When we write with no specified, we mean . A basic fact about randomized computation is that the success probability can be amplified, with a multiplicative overhead in cost, by running several independent trials and taking the majority vote of the outputs: . See [BdW02] for a survey of classic results on query complexity.

If and are two partial functions, their composition is where (which is defined iff is defined for all and is defined). How does the randomized query complexity of depend on the randomized query complexities of and ? A simple observation is that to design a randomized decision tree for , we can take a -error randomized decision tree for and replace each query—say to the input bit of —with a -error randomized decision tree for evaluating . By a union bound, with probability at least all of the (at most ) evaluations of return the correct answer, and so with probability at least the final evaluation of is also correct. Since , we can write this upper bound as

 BPP(f∘gn) ≤ O(BPP(f)⋅BPP1/n(g)) ≤ O(BPP(f)⋅BPP(g)⋅logn). (1)

When is this tight? It will take some effort to suitably formulate this question. We begin by reviewing known related results.

1.1 When is amplification necessary?

As for general lower bounds (that hold for all and ), much work has gone into proving lower bounds on in terms of complexity measures of and that are defined using models more powerful than plain randomized query complexity [GJ16, AGJ17, BK18, BDG20, BB20]. In terms of just and , the state-of-the-art is that for all and [GLSS19]. Furthermore, it is known that the latter bound is sometimes tight: There exist partial boolean functions and such that and [GLSS19, BB20]. Thus (1) is far from being always tight, even without worrying about the need for amplification. However, it remains plausible that holds for all total and all partial . We take this as a working conjecture in this paper. This conjecture has been confirmed for some specific outer functions , such as the identity function [JKS10] (this is called a “direct sum” result) and the boolean functions Or, Xor (parity), and Maj (majority) [GJPW18]. These results, however, do not address the need for amplification in the upper bound (1). To formulate our question of whether (1) is tight, a first draft could be:

 Question A, with respect to a particular f:   Is (1) tight for all partial functions g?

This is not quite a fair question, for at least two reasons:

• Regarding the first inequality in (1): The simple upper bound actually shows (the union bound is only over queries that take place, not over all possible queries). So for simplicity, let us restrict our attention to satisfying , which is the case for Id, Or, Xor, and Maj.

• Regarding the second inequality in (1): Some functions satisfy (e.g., if ). So for simplicity, let us restrict our attention to satisfying , which (as we show later) is the case for two partial functions GapOr and GapMaj defined as follows ( denotes the Hamming weight of ):

 \textscGapOr(x)\coloneqq{0if |x|=01if |x|=m/2and\textscGapMaj(x)\coloneqq{0if |x|=m/31if |x|=2m/3.

Thus, a better formulation of Question A would be: Assuming , is (1) tight for all partial satisfying ? Even with these caveats, the answer is always “no.” It will be instructive to examine a counterexample. Let be the partial function such that indicates the location of the unique in , under the promise that . Then takes an input of length with the promise that there are exactly many s, either all in the left half or all in the right half, and outputs which half has the s. It turns out and provided (for similar reasons as GapOr itself) and yet for all : To compute , we can run an optimal randomized decision tree for and whenever it queries , we repeatedly query uniformly random bit positions of until we find a (so the value of is determined by which half we found a in). This has the same error probability as the randomized decision tree for , and the total number of queries to the bits of is in expectation, because for each it takes queries in expectation to locate a in . By Markov’s inequality, with high constant probability this halts after only total queries. Thus by aborting the computation if it attempts to make too many queries, we obtain a randomized decision tree for that always makes queries, with only a small hit in the error probability.

Blais and Brody [BB19] adjust the statement of Question A so the answer becomes “yes” in the case . Specifically, they weaken the right-hand side in such a way that the above counterexample is ruled out. Defining555[BB19] used the notation instead of . similarly to but where the cost of a randomized decision tree is the maximum over all inputs (on which is defined) of the expected number of queries, we now have for the from the counterexample. The theorem from [BB19] is when , in other words, (a “strong direct sum” result). [BB19] also explicitly asked whether similar results hold for other functions . The corresponding conjecture for is false (as we note below) while for and it remains open.

To make progress, we step back and ask a seemingly more innocuous version of the question:

 Question B, with respect to a particular f:   Is (1) tight for some partial function g?

It turns out the answer is “no” for and is “yes” for both and .

1.2 Decision trees with noisy inputs

Question B is related to “query complexity with noisy inputs” (introduced in [FRPU94]), so let us review the latter model: When input bit is queried, the wrong bit value is returned to the decision tree with some probability (and the correct value of is returned with the remaining probability). The “noise events” are independent across all queries, including multiple queries to the same input bit. Now the adversary gets to pick not only the input, but also the “noise probabilities.” [FRPU94] distinguishes between two extreme possibilites: A static adversary has a single common noise probability for all queries, while a dynamic adversary can choose a different noise probability for each node in the decision tree. In this paper we make a reasonable compromise: The adversary gets to choose a tuple of noise probabilities , and each query to returns with probability exactly . When a randomized decision tree computes with error probability , that means for every input and every noise probability tuple (with for each ), the output is with probability over the random noise and randomness of the decision tree. We invent the notation for the minimum cost of any randomized decision tree that computes on noisy inputs, with error probability . We have by repeating each query times and taking the majority vote (to drive the noise probabilities down to ), and using a union bound to absorb the noise probabilities into the error probability. The connection with composition is that , because to design a randomized decision tree for , we can take a -error randomized decision tree for with noisy inputs, and replace each query—say to —with a -error randomized decision tree for evaluating .

There is a similar connection for -sided error and -sided noise. When a randomized decision tree has -sided error , that means on -inputs the output is wrong with probability , and on -inputs the output is wrong with probability at most . We let denote the minimum cost of any randomized decision tree that computes with -sided error . Similarly, -sided noise means that when input bit is queried, if the actual value is then is returned with probability , and if the actual value is then is returned with probability . We invent the notation for the minimum cost of any randomized decision tree that computes on -sided noisy inputs, with -sided error probability . We have . The connection holds like in the -sided noise setting. We officially record these observations:

Observation

For all and ,

 BPP(f∘gn) ≤ BPP∗(f)⋅BPP(g)andBPP(f∘gn) ≤ BPP†(f)⋅RP(g).

The upshot is that noisy upper bounds imply composition upper bounds, and composition lower bounds imply noisy lower bounds. There are many proofs of the result [FRPU94, KK94, New09, GS10]:

Theorem (Or never necessitates amplification)

and thus for every partial function ,

 BPP(\textscOr∘gn) ≤ O(n⋅BPP(g)).

Theorem is not new, but in Appendix A we provide a particularly clean and elementary proof (related to, but more streamlined than, the proof in [KK94]). We mention that the proof straightforwardly generalizes to some other functions

, such as “odd-max-bit”:

iff the highest index of any in is odd.

We turn our attention to lower bounds. Various special-purpose techniques have been developed for proving query complexity lower bounds in the noisy setting [FRPU94, EP98, DR08, GS10]. However, a conceptual consequence of Observation is that special-purpose techniques are not generally necessary: We can just use techniques for lower bounding plain (non-noisy) randomized query complexity, applied to composed functions.

1.3 Lower bound for parity

[FRPU94] proved that and are . Although apparently not recorded in the literature, it is possible to generalize this result to show and are . However, we prove results even stronger than that, using the composition paradigm. Our results involve query complexity models that are more powerful than BPP, and even more powerful than the model from [BB19]. This follows a theme from a lot of prior work: Since BPP query complexity is rather subtle, we can make progress by studying related models that are somewhat more “well-behaved.”

• As observed in [BB19], the model is equivalent to one where the cost is the worst-case (rather than expected) number of queries, and a randomized decision tree is allowed to abort (i.e., output a special symbol ) with at most a small constant probability, and the output should be correct with high probability conditioned on not aborting.

• If we strengthen the above model by allowing the non-abort probability to be arbitrarily close to (rather than close to ), but require that the non-abort probabilities are approximately the same for all inputs (within some factor close to ), the resulting model has been called 2WAPP (“-sided weak almost-wide PP”) [GLM16, GJPW18]. The “-sided” version WAPP, defined later, will be relevant to us.

• If we further strengthen the model by allowing the non-abort probabilities to be completely unrelated for different inputs (and still arbitrarily close to ), the resulting model has been called PostBPP (“BPP with post-selection”) [GLM16, Cad18].

We first consider the last of these models. is the minimum cost of any randomized decision tree such that on every input (for which is defined), the probability of outputting is , and the probability of outputting is conditioned on not outputting . Trivially, . In fact, the PostBPP model is much more powerful than plain randomized query complexity; for example (noted in [GLM16]) it can efficiently compute the aforementioned odd-max-bit function: .

For the noisy input setting, and are defined in the natural way, and and hold like in Observation.

In Section 2 we prove something qualitatively much stronger than :

Theorem (Xor sometimes necessitates amplification)

For some partial function ,
namely with ,

 PostBPP(\textscXor∘gn) ≥ Ω(n⋅BPP1/n(g)) ≥ Ω(nlogn⋅BPP(g)).

In particular, .

Let us compare Theorem to two previous results.

• [EP98] proved that and that this lower bound holds even in the average-case setting (i.e., queries are needed in expectation to succeed with high probability over a uniformly random input, random noise, and randomness of the decision tree). Our proof of Theorem is simpler than the proof in [EP98] (though both proofs have a Fourier flavor), it also works in the average-case setting, and it yields a stronger result since the model is PostBPP instead of just (and the lower bound holds for composition rather than just noisy inputs). [DR08] presented a different simplified proof of the result from [EP98], but that proof does not generalize to .

• Our proof of Theorem shows something analogous, but incomparable, to the strong direct sum from [BB19]. As we explain in Section 2, our proof shows that holds for all (thus addressing a version of our Question A). Compared to the [BB19] result that for all , our result has the advantages of working for rather than and yielding a qualitatively stronger lower bound (PostBPP rather than on the left side), but the disadvantage of also requiring the qualitatively stronger type of lower bound on . Our result shows that if amplifying requires a log factor in a very strong sense (even PostBPP-type decision trees cannot avoid the log factor), then that log factor will be necessary when composing Xor with .

1.4 Lower bound for majority

Our main result strengthens the bound from [FRPU94], mainly by holding for the stronger model WAPP (rather than just BPP), but also by directly handling -sided noise and by holding for composition rather than just noisy inputs.

is the minimum cost of any randomized decision tree such that for some , on input the probability of outputting is in the range if , and in the range if . The subscript should always be specified, because unlike BPP and PostBPP, WAPP is not amenable to efficient amplification of the error parameter [GLM16]. For every constant , we have .

WAPP-type query complexity has several aliases, such as “approximate conical junta degree” and “approximate query complexity in expectation,” and it has recently played a central role in various randomized query (and communication) complexity lower bounds [KLdW15, GLM16, GJ16, GJPW18]. One can think of WAPP as a nonnegative version of approximate polynomial degree (which corresponds to the class AWPP); in other words, it is a classical analogue of the polynomial method used to lower bound quantum algorithms.

For the noisy input setting, and are defined in the natural way, and and hold like in Observation. We prove the following theorem, which shows that WAPP sometimes requires amplification, even in the one-sided noise setting.

Theorem (Maj sometimes necessitates amplification)

For some partial function ,
namely with , and some constant ,

 WAPPε(\textscMaj∘gn) ≥ Ω(n⋅BPP1/n(g)) ≥ Ω(nlogn⋅RP(g)).

In particular, .

This theorem should be contrasted with the work of Sherstov about making polynomials robust to noise [She13]. In that work, Sherstov showed that approximate polynomial degree never requires a log factor in the noisy input setting, nor in composition. That is to say, he improved the simple bound to for all Boolean functions , and showed . In contrast, for conical juntas (nonnegative linear combinations of conjunctions), Theorem shows that in a strong sense, the simple bound (for all constants and total Boolean functions ) cannot be improved: for some constant and some total , namely . Thus unlike polynomials, conical juntas cannot be made robust to noise.

Our proof of Theorem (in Section 3) introduces some technical ideas that may be useful for other randomized query complexity lower bounds.

By a simple reduction, Theorem for implies the same for (with instead of at the end of the statement), but we do not know of a simpler direct proof for the latter result. Theorem cannot be strengthened to have PostBPP in place of WAPP, because . However, Theorem does hold with Xor in place of Maj, by the same proof.

2 Proof of Theorem: Xor sometimes necessitates amplification

We first discuss a standard technique for proving randomized query complexity lower bounds, which will be useful in the proof of Theorem. For any conjunction and distribution over , we write . The number of literals in a conjunction is called its width.

Fact

Let be a partial function, and for each let be a distribution over . Then for every there exist a conjunction of width and a such that and .

Proof

Abbreviate as . Fix a randomized decision tree of cost computing with error conditioned on not aborting, and assume w.l.o.g. that for each outcome of the randomness, the corresponding deterministic tree is a perfect tree with leaves, all at depth . Consider the probability space where we sample input from the mixture , sample a deterministic decision tree as an outcome of the randomized decision tree, and sample a uniformly random leaf of . Let

be the indicator random variable for the event that

is the leaf reached by and its label is . Let be the indicator random variable for the event that is the leaf reached by and its label is . Conditioned on any particular and , the probability that is the leaf reached by is . Thus conditioned on any particular , if the non-abort probability is then and and thus . Over the whole probability space, we have , so by linearity the same must hold conditioned on some particular and with . Let be the conjunction of width such that iff reaches , and let be the label of . Then we have and similarly . Thus

 ε⋅C(Dz)−(1−ε)⋅C(D1−z) = 2⋅(ε⋅E[A|T,ℓ]+(1−ε)⋅E[B|T,ℓ]) ≥ 0.

Now we work toward proving Theorem. Throughout, is the input length of Xor, and is the input length of GapMaj. We have by outputting the bit at a uniformly random position from the input. We describe one way of seeing that provided . For , define

as the uniform distribution over

.

Fact

For every conjunction of width and for each ,

 C(Gz) ≤ 3w⋅C(G1−z).

Proof

By symmetry we just consider . Suppose has positive literals and negative literals (). Then

 C(G0) = (m−wm/3−u)/(mm/3) ≤ (m−wm/3)/(mm/3) = (2m/3)⋅(2m/3−1)⋯(2m/3−w+1)m⋅(m−1)⋯(m−w+1) ≤ (2/3)w,
 C(G1) = (m−wm/3−v)/(mm/3) ≥ (m−wm/3−w)/(mm/3) = (m/3)⋅(m/3−1)⋯(m/3−w+1)m⋅(m−1)⋯(m−w+1) ≥ (m/3−wm−w)w ≥ (m/3−m/7m−m/7)w = (2/9)w.

Thus .

Combining Fact and Fact (using , , , , and ) implies that , in other words we have , provided . If then holds anyway provided .

Hence, our result can be restated as follows.

provided .

Proof

We show . By Fact (using , , and ) it suffices to exhibit for each a distribution over , such that for every conjunction of width and for each , either or . Letting be the uniform distribution over , define as the mixture over of (i.e., is sampled by independently sampling for all ). Put succinctly, . Letting and and , we have since is uniform over . Since , our goal of showing “ or ” is equivalent to showing “ or .”

Now consider any conjunction of width such that , and write where is a conjunction. Since , for each we can write for some number with (so iff ). Let be the width of , so . Then for at least many values of , and for such note that by Fact, for each . The latter implies that . Thus

 ∣∣∏iai∣∣ = ∏i|ai| ≤ (1−n−1/4)n/2 ≤ e−n3/4/2 ≤ 1/4.

For , let be the character . Note that is if , is if , and is otherwise. Putting everything together,

 C(D1) = C(D)⋅(1−∏i∈[n]ai) ∈ C(D)⋅(1±1/4)

which implies since we are assuming . This concludes the proof of Theorem 1.3.

Using strong LP duality (as in [GL14]), it can be seen that Fact is a tight lower bound method up to constant factors: iff it is possible to prove this via Fact by exhibiting “hard input distributions” and (as we did for GapMaj in Fact). Since this was the only property of used in the proof of Theorem 1.3, this implies that holds for all , as we mentioned in Section 1.3.

3 Proof of Theorem: Maj sometimes necessitates amplification

We first discuss a standard technique for proving randomized query complexity lower bounds, which will be useful in the proof of Theorem. For any conjunction and distribution over , we write . The number of literals in a conjunction is called its width.

Fact

Let be a partial function, and let , , be three distributions, over , , and respectively. Then for every there exists a conjunction of width such that and and , where .

The key calculation underlying the proof of Fact is encapsulated in the following:

Fact

Let , ,

be three jointly distributed nonnegative random variables with

. For any , if and and , then there exists an outcome such that and and , where .

Proof (Proof of Fact)

Let . Suppose for contradiction that for every outcome , either or . Then can be partitioned into events and such that for every and for every . Letting and be the indicator random variables for these events, we have and thus either:

• , in which case

 E[P0] ≥ E[P0⋅IU] > δ⋅E[P1⋅IU] ≥ δ⋅√ε⋅(1−ε) = 2ε(1−ε) > ε,  or
• , in which case

 E[P2] ≥ E[P2⋅IV] > (1+δ)⋅E[P1⋅IV] ≥ (1+δ)⋅(1−√ε)⋅(1−ε) > 1

where the last inequality can be verified by a little calculus for .