# Noisy searching: simple, fast and correct

This work revisits the multiplicative weights update technique (MWU) which has a variety of applications, especially in learning and searching algorithms. In particular, the Bayesian update method is a well known version of MWU that is particularly applicable for the problem of searching in a given domain. An ideal scenario for that method is when the input distribution is known a priori and each single update maximizes the information gain. In this work we consider two search domains - linear orders (sorted arrays) and graphs, where the aim of the search is to locate an unknown target by performing as few queries as possible. Searching such domains is well understood when each query provides a correct answer and the input target distribution is uniform. Hence, we consider two generalizations: the noisy search both with arbitrary and adversarial (i.e., unknown) target distributions. We obtain several results providing full characterization of the query complexities in the three settings: adversarial Monte Carlo, adversarial Las Vegas and distributional Las Vegas. Our algorithms either improve, simplify or patch earlier ambiguities in the literature - see the works of Emamjomeh-Zadeh et al. [STOC 2016], Dereniowski et. al. [SOSA@SODA 2019] and Ben-Or and Hassidim [FOCS 2008]. In particular, all algorithms give strategies that provide the optimal number of queries up to lower-order terms. Our technical contribution lies in providing generic search techniques that are able to deal with the fact that, in general, queries guarantee only suboptimal information gain.

Comments

There are no comments yet.

## Authors

• 10 publications
• 4 publications
• 22 publications
• ### A Framework for Searching in Graphs in the Presence of Errors

We consider two types of searching models, where the goal is to design a...
04/05/2018 ∙ by Dariusz Dereniowski, et al. ∙ 0

read it

• ### Searching, Sorting, and Cake Cutting in Rounds

We study sorting and searching in rounds motivated by a cake cutting pro...
12/01/2020 ∙ by Simina Branzei, et al. ∙ 0

read it

• ### Minmax-optimal list searching with O(log_2log_2 n) average cost

We find a searching method on ordered lists that surprisingly outperform...
05/25/2021 ∙ by I. F. D. Oliveira, et al. ∙ 0

read it

• ### Decremental Strongly-Connected Components and Single-Source Reachability in Near-Linear Time

Computing the Strongly-Connected Components (SCCs) in a graph G=(V,E) is...
01/11/2019 ∙ by Aaron Bernstein, et al. ∙ 0

read it

• ### Competitive Sequencing with Noisy Advice

Several well-studied online resource allocation problems can be formulat...
11/09/2021 ∙ by Spyros Angelopoulos, et al. ∙ 0

read it

• ### Multi-finger binary search trees

We study multi-finger binary search trees (BSTs), a far-reaching extensi...
09/05/2018 ∙ by Parinya Chalermsook, et al. ∙ 0

read it

• ### A General Characterization of the Statistical Query Complexity

Statistical query (SQ) algorithms are algorithms that have access to an ...
08/07/2016 ∙ by Vitaly Feldman, et al. ∙ 0

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Problem statement

An adaptive search problem for a general search domain and an arbitrary adversary can be formulated as follows. The goal is to design an adaptive algorithm, also called a strategy, that finds a target that is initially unknown to the algorithm. The process is divided into steps: in each step the algorithm performs a query and receives an answer. The adaptiveness is understood in a way that the subsequent actions of the algorithm depend on the already received answers. Such a query-reply pair provides new information to the algorithm: it learns that some part of the search space

does not contain the target while its complement does. It is interesting, both from theoretical and practical point of view, to have error-resilient algorithms for such search process. This is modeled by the presence of noise: each reply can be erroneous with some fixed probability

, independently. The performance of a strategy is measured by the number of the queries performed. In general, the target is fixed prior to the beginning of the game. This selection is either made in the adversarial model (i.e., the adversary, knowing the algorithm, picks the target maximizing the strategy length) or in the randomized way, according to a public distribution of all possible targets (we call it the distributional model). The classical choice for the search domain is a sorted array, which leads to the well known binary search algorithm. In such case the natural queries are comparisons, although we point out that other queries, like arbitrary subsets, have also been studied. Interestingly, it is possible to have algorithms for the noisy scenarios that are information-theoretic optimal (up to lower order terms), for the classical binary search see e.g. [4]. A generalization of this search domain is to consider graph structures, which has been introduced first for trees [29] and then for general graphs [16]. For graphs, each query simply points to some vertex of the input graph , and the answer either states that is the target, or provides a neighbor of that is closer to the target than . We remark that the graph setting is a natural generalization of the classical “twenty questions game” (attributed to Rényi [32] and Ulam [35]).

### 1.2 Motivation and earlier techniques

Adaptive query algorithms play an important role in theoretical computer science, with classical binary search and its variations being the most fundamental example. They also provide a foundation for more complex computational problems like sorting — see e.g. [4] for a broader discussion. Historically speaking, the classical binary search with erroneous comparisons (or more general queries as well) has received significant attention, also in a simpler noise setting where the number of errors is bounded by a constant that is known a priori to the algorithm. The transition from sorted data (i.e., paths) to graph-like structures is natural both from the theoretical point of view and from the perspective of potential applications, e.g., in automatic software testing [3]

or in machine learning

[15]. An interesting spectrum of applications of binary search can be found in designing a series of tests in biological experiments [23].

A typical technique for dealing with noisy search is the multiplicative weights update (MWU): for each element the algorithm keeps a weight that represents the likelihood that is the target. Then is appropriately scaled depending on the answer to the query. This general tool is quite well understood and for further reading we refer to the survey of Arora et. al. [1].

A special case of this approach is the Bayesian learning, in which initial weights describe the probability distribution of the target (so called

prior distribution) and later on

is updated according to conditional probabilities of the observed answers. The process continues either until one element accumulates large enough likelihood of being the target, or until a predetermined number of queries has been performed. Thus, the subject of the analysis in that setting is the estimation of the convergence rate of such process; see e.g.

[20, 24, 34]

for examples of such an analysis for general types of prior distributions. We note that if a search model is not distributional (we do not know the prior probabilities), it often works very well to start with the uniform distribution and perform Bayesian updates anyway. We remark that the authors of

[4] combine such Bayesian learning technique with a random shifting of the input sequence as a tool to actually ensure the uniform initial distribution. We also note that such a random shifting will not work in the graph-theoretic setting due to the complex structure of the search space. Moreover, a closer analysis of [4] provides a framework that may give a tool for distributional graph search: each query is selected to maximize the amount of information to be learned. Unfortunately, it turns out that this cannot be transferred to graph search, since the algorithm in [4] strongly relies on the subtlety of the search model, where an incorrect answer, when negated, provides an opposite (correct) answer. This property makes it possible to estimate expected information gain after each query. In the graph search model, for each query there is more than one answer that constitutes an error, which makes such approach impossible. For the graph-theoretic case with noise, the multiplicative weight update proved to work well [13, 15, 16] and we further advance this approach in our work.

The distributional noisy model has been considered to date for binary search only [10]

, where the above-mentioned approaches are combined in the following way - the core of the method is a decision tree traversal. To make the algorithm resilient to errors, a verification and backtracking mechanisms are added to the traversal. In particular, two verification methods are used, depending on the cases selected for the algorithm: the separate majority vote and an observation that enough past queries had replies consistent with accepting the current one as correct. A disadvantage of applying this technique in our context is that it is already quite complex in this restricted case of binary search.

#### Issue of proof correctness in [4].

To motivate our work further, we also point out two flaws in the state-of-the art noisy binary searching [4]. Firstly, the paper bounds the expected number of steps of the algorithm to be the total information needed divided by the expected information gain. Such transition works (pushing expectancy to denominators) only through careful application of probabilistic tools, and might result in additional lower order terms. Secondly, more serious issue can be found in the proof of Lemma 2.6 in [4]: it takes all the queries asked by the algorithm, in sorted order of positions, denoted by , and considers for some the set of queries corresponding to such that , where denotes the target. Then, the number of ‘

’ answers is given by a binomial distribution

. However, the issue is that while the answers to the particular queries are independent random variables, the

position of queries depends on the answers to the previous queries. Thus when taking a subset of queries based on their positions, we cannot claim their independence and use binomial distribution.

### 1.3 Our contribution

In an ideal scenario, there always exists a query such that each possible reply subdivides the search space evenly, thus maximizing the information gain to the algorithm. In such a case the Bayesian updates are sufficient to get an optimal search time. However, there are two obstacles that prevent us from a straightforward application of Bayesian updates. First, typically perfect bisection of search space is not possible. This actually occurs both for binary search with comparisons and for graphs since some elements may have large weights. Thus, these problems can be seen more generally as a game of maximizing the information gain while performing queries. Second problem to overcame, particularly in the graph search, is the fact that the adversarial scenario may be seen as one in which the target distribution is not known a priori to the algorithm.

In terms of development of the new methods, we introduce some new ideas that considerably simplify the analysis and lead to tight bounds. One of those is our measure of progress: instead of keeping track how the total weight decreases after each query, we are interested how the total weight behaves with some elements excluded. In the graph search it is enough to look at the vertex with the largest current weight and it turns out that for the remaining vertices the required weight drop occurs (cf. Lemma 3.4). Thus, only one vertex is excluded at any point from the weight analysis. Interestingly, this vertex may be changing throughout the search. We point out that this approach does not adopt the previously used idea of putting such problematic vertices aside to search through them at some later stage. Instead, all vertices are uniformly treated throughout the entire search. This way of measuring progress allows to conclude that the expected information gain in a single step is precisely the optimal (see the proof of Lemma 3.9).

For binary search having only one excluded element seems to be infeasible. We propose a strategy where the excluded elements accumulate over the search but an analogous bound for the remaining weight can be proved (cf. Equation (4) and the corresponding Lemma 4.2

). A simple approach could be to find at each step the element that is the closest to perfect bisection of the weights, then query and exclude it. However, that might lead to too many excluded elements, which the second phase of the algorithm might not be able to handle in a desirable time bound. Thus, we propose the following approach: we process queries in epochs, where at the start of an epoch algorithm selects best possible bisecting element and excludes it. For the duration of the whole epoch the same query is repeated, updating the weights along the way. This limits the number of excluded elements, at the cost of having possibly sub-optimal queries for the duration of the epoch. However, we are able to show bounds on the

expected weight drop (cf. Lemma 4.2).

Our results are summarized in Table 1. For the graph searching we obtain the three following algorithmic results. The algorithms achieving the stated query complexities are provided in subsequent Section 3: Algorithms 3.5, 3.8 and 3.10 correspond to Theorems 1.1, 1.2 and 1.3, respectively.

###### Theorem 1.1.

For an arbitrary graph , a noise parameter and a confidence threshold , there exists an adaptive graph searching algorithm that after

 1I(p)(log2n+O(√lognlogδ−1)+O(logδ−1))

queries returns the target correctly with probability at least .

###### Theorem 1.2.

For an arbitrary graph , a noise parameter and a confidence threshold , there exists an adaptive Las Vegas graph searching algorithm that after the expected number of at most

 1I(p)(H(μ)+log2δ−1+1)

queries returns the target correctly with probability at least , provided that an initial target distribution is given as an input.

###### Theorem 1.3.

For an arbitrary graph , a noise parameter and a confidence threshold , there exists an adaptive Las Vegas graph searching algorithm that after the expected number of at most

 1I(p)(log2n+O(loglogn)+O(logδ−1))

queries returns the target correctly with probability at least .

In the statements above, is the entropy of the distribution defined as , and is the information function.

The binary search algorithms referred in the theorems below are in Section 4: Algorithms 4.4 and 4.6 correspond to Theorems 1.4 and 1.5.

###### Theorem 1.4.

For any and a confidence threshold , there exists an adaptive binary search algorithm for any linear order that after

 1I(p)(log2n+O(√lognlogδ−1)+O(logδ−1))

queries returns the target correctly with probability at least .

###### Theorem 1.5.

For any and a confidence threshold , there exists an adaptive binary search Las Vegas algorithm for any linear order that after the expected number of

 1I(p)(H(μ)+O(H2(μ))+O(logδ−1))

queries returns the target correctly with probability at least , provided that an initial target distribution is given as an input.

Where in the last statement . Using the random shifts as in [4], the case when the input distribution is unknown can be turned into the uniform distribution scenario and thus Theorem 1.5 gives the following.

###### Corollary 1.6.

For any and a confidence threshold , there exists an adaptive binary search Las Vegas algorithm that after the expected number of

 1I(p)(log2n+O(loglogn)+O(logδ−1))

queries returns the target correctly with probability at least .

Our contribution can be summarized as follows. In the graph scenario, we argue that the Bayesian updates technique is enough because the fact that no perfect bisection is possible can be handled on the level of the analysis, and is not imprinted in the algorithm. Here the contribution lies in a much simpler analysis than in prior works. Moreover, the analysis carries over to the remaining new results. In particular, the distributional case has not been considered before for graphs, thus Theorem 1.2 gives a new bound. The algorithm behind this theorem can be directly used to obtain a Las Vegas algorithm for the adversarial case (cf. Theorem 1.3). Here the twist lies in adjusting the confidence level appropriately and the details of the differences between these two settings are hidden in the analysis.

For the binary search, the Bayesian updates leave the algorithm with several candidates for the target. This seems difficult to avoid and we leave it as our main open question whether a one-phase multiplicative update can be sufficient to find the target. The new result for binary search is Theorem 1.4 for the adversarial noisy case. For the Las Vegas settings we provide Theorem 1.5 and its direct Corollary 1.6 which together give a simpler analysis than in prior works, correcting also some ambiguities present in the analysis in [4].

### 1.4 Related work

There are many variants of the interactive query games, depending on the structure of queries and the way erroneous replies occur. There is a substantial amount of literature that deals with fixed number of errors for arbitrary membership queries or comparison queries for binary search; we refer the reader to some surveys [11, 30]. Among the most successful tools for tackling binary search with errors, there is the idea of a volume [5, 31], which exploits the combinatorial structure of a possible distribution of errors. A natural approach of analyzing decision trees has been also successfully applied, see e.g. [17]. See [7, 16] for examples of partitioning strategies into stages, where in each stage the majority of elements is eliminated and only few ‘problematic’ ones remain. For a different reformulation (and asymptotically optimal search results) of the noisy search see [25].

Although the adversarial and noisy models are most widely studied, some other ones are also considered. As an example, we mention the (linearly) bounded error model in which it is guaranteed that the number of errors is a -fraction, , of (any initial prefix of) the number of queries, see e.g. [2, 7, 14]. Interestingly, it might be the case that different models are so strongly related that a good bound for one of them provides also tight bounds for the other ones, see e.g. [13] for an example of such analysis.

The first results regarding distributional query search with fixed number of errors is due to Shannon [33] where is has been shown a strategy using up to arbitrary queries on average. The Shannon-Fano-Elias code takes comparison queries on average. The distributional version has been recently re-considered in [10], where it is also assumed that at most answers can be erroneous. Then, an optimal strategy consists of comparison queries up to an additive factor of , where . This generalizes the result of Rivest et al. for the uniform distribution [31], and improves the codes from [33]. In view of the above, it is interesting to see what types of queries can be used for constructing effective strategies? In [9] is is shown that there exists a ‘natural’ set of queries allowing for construction of strategies of length , and it is also shown that this bound is asymptotically tight.

The theory of coding schemes for noisy communication is out of scope of this survey and we point to some recent works [8, 18, 19, 21, 28].

The first steps towards generalizing binary search to graph-theoretic setting are works on searching in partially ordered data [3, 26, 27]. Specifically for the node search that we consider in this work, the first results are due to Onak and Parys for the case of trees [29] and the recent work of Emamjomeh-Zadeh et al. for general graphs [15, 16]. In the former, an optimal linear-time algorithm for error-less case was given. It has been shown in [16] how to construct, for the noisy model with an input being an arbitrary graph of order , a strategy of length at most , where , with the confidence threshold . The strategy has been simplified and the query complexity further improved in [13] to reach an upper bound on the query complexity: . It is interesting to note that these strategies, when applied to lines, reach the optimal query complexities (up to lower order terms) for linear orders, thus matching the limits of binary search.111However one has to have in mind that graph queries applied to linear orders provide richer set of replies: , compared to in case of a binary search.

We refer the reader to a description of few interesting potential applications of this graph searching framework in machine learning, particularly for problems like learning a ranking, a clustering or a classifier

[15]. Also, some generalizations with non-uniform query times (where the duration of querying each vertex is dictated by its weight) have been also considered [12]. We finally also mention a somewhat related graph-theoretic model in which a walking agent is performing queries [6, 22].

## 2 Preliminaries

Denote be the noise parameter. It is useful to denote . Whenever we refer to a search space, we mean either an (undirected and unweighted) graph or a linear order. This term will be used for definitions that are common for both. Consequently, by an element of a search space we mean a vertex or an integer, respectively. In the following is the size of the search space, i.e., either the number of vertices in a graph or the number of integers in a linear order.

The strategies will maintain the weights for the elements of a search space . For an arbitrary , is -heavy if , where for any subset we write . -heavy elements will play a special role and we refer to them as heavy for brevity.

For the graph search, we will adopt a slightly weaker model in which an adaptive algorithm receives less information in some cases. This is done in somewhat artificial way for purely technical reasons, i.e., to simplify several arguments during analysis. In particular, the amount of information will depend on the current weights and the case when the algorithm receives less information than dictated by the graph searching model is when a heavy vertex is queried and the answer says that it is not the target.

Graph queries model specifics. Suppose that a vertex is queried. The reply either informs that is the target, which we call a yes-answer. Or, the reply gives a neighbor of that lies on the shortest path from to the target, and we call it a no-answer. However, for the sake of technical simplicity of the analysis, we if was heavy and no-answer was given, the algorithm reads it as a target is not reply (ignoring the direction the target might be).222This only makes the algorithms stronger, since it operates in a weaker replies model.

Note that any algorithmic guarantees for the above model carry over to the generic graph search model. To adopt the above, we will say that a vertex is compatible with the reply if and only if in case of a yes-answer, or the neighbor given in a no-answer when is not heavy lies on a shortest - path. Thus in particular, in case of a no-answer regarding a heavy , no vertex is compatible.

The weight of an element at the end of step is denoted by , with being the initial value. The initial values are set by an algorithm depending on a particular model. For distributional scenarios, the initial weight of each element equals its probability of being the target, that is, for each . For worsts case scenarios, the weights are set uniformly, for each .

We recall the Bayesian updates (Algorithm 2.1) that are at the core of our strategies.

###### Algorithm 2.1.
(Bayesian updates.) In a step , for each element of the search space do: if is compatible with the answer, then , if is not compatible with the answer, then .

The algorithm relies on the fact that Bayesian updates keep the weights to be the conditional probabilities, that is: at each step during the execution of algorithm,

 Pr(v=v∗ at step τ | conditioned on received answers)=wτ(v)wτ(V).

Denoting by the graph distance between and , i.e., the length of the shortest path between these vertices,

 q=argminv∈V∑u∈Vd(u,v)⋅ω(u)

is called a median of the graph.

For a query and a reply let us use to denote a set of all vertices consistent with that reply, i.e and for .

We note a fundamental bisection property of a median:

###### Lemma 2.2 (c.f. [16] Lemma 4).

If is a median, then .

###### Proof.

Suppose towards the contradiction that for some . Observe that since by moving from to we get closer to all vertices in . But by our assumption, hence , which yields a contradiction. ∎

Let be a sequence of i.i.d random variables. We say that a random variable is a stopping time (with respect to ) if is a function of . We will use the following version of Wald’s identity.

###### Theorem 2.3.

(Wald’s Identity) Let be i.i.d with finite mean, and be a stopping time with . Then .

## 3 Graph searching

We first analyze how the weights behave when in each step a median is queried and the Bayesian updates are made. This analysis is independent of the weight initialization and thus is common for all following graph searching algorithms. Essentially we prove that in expectation, that is in an amortized way, the total weight (with heaviest vertex excluded) decreases by half per step. For some steps this can be concluded directly (cf. Lemma 3.1). Lemmas 3.3 and 3.2 then refer to an interval of queries to the same heavy vertex . If such an interval has the property that it ended (Lemma 3.2), then the required weight drop can be claimed by the end of the interval. For this, informally speaking, the crucial fact is that received many no-answers during this interval. If a strategy is at a step that is within such interval, then Lemma 3.3 is used to make the claim on the total weight with the weight of excluded. Hence, at any point of the strategy the weight decreased appropriately as shown in Lemma 3.4.

###### Lemma 3.1 (see also [16, 13]).

If in a step there is no heavy vertex, then

 wt+1(V)≤wt(V)/2.
###### Proof.

Let be a query and an answer in step . If then by Lemma 2.2 and in case we have and thus the same bound holds. Then in both cases, . ∎

###### Lemma 3.2 (see also [13]).

Consider an interval of queries such that some is heavy in each query in and is not heavy at the sequence’s last query. Then

 wτ+k(V)≤wτ(V)/2k.
###### Proof.

First note that in each query in the interval , the queried vertex is . Hence consider any two queries and in such that they receive different replies. The contribution of these two queries is that for each vertex , its weight is multiplicatively scaled down by . Also, for a single no-answer in a query we get

 ωi+1(V)=pωi(x)+(1−p)ωi(V∖{x})≤ωi(V)/2

because for the heavy vertex . By assumption, the number of no-answers is at least the number of yes-answers in . Thus, the overall weight drop is as claimed in the lemma. ∎

###### Lemma 3.3.

Consider an interval of queries such that some is heavy in each query in , and remains heavy after the last query in . Then

 wτ+k(V∖{x})≤wτ(V)/2k.
###### Proof.

Recall that in each query in the interval , the queried vertex is . Assume that there were yes-answers in and no-answers, with . If , then . If , then we bound as follows: . ∎

The bound in the next lemma immediately follows from Lemmas 3.13.2 and 3.3. We say that an element is heaviest if for each . For each step , we denote by a heaviest vertex at this step, breaking ties arbitrarily.

###### Lemma 3.4.

If the initial weights satisfy , then

 ωτ(V∖{xτ})≤12τ.
###### Proof.

We consider the first queries and observe that they can be partitioned into a disjoint union of maximal intervals in which either there is a heavy vertex present (in the whole interval) or there is no heavy vertex (in the whole interval). We apply Lemma 3.1 for intervals with no heavy vertex and Lemmas 3.23.3 for intervals with heavy vertex present (note that Lemma 3.3 can be applied only to the last interval, if there exists a heavy vertex after we perform all queries). ∎

We note that will be our key measure of progress, despite the fact that the may be changing throughout the search, i.e., does not have to be the target at some stages of the search. When our strategies complete, however, is provably the target within the imposed confidence threshold.

### Proof of Theorem 1.1 (Worst-case strategy length)

In this section we prove Theorem 1.1. Take to be the smallest positive integer for which . One can verify that

 Q=log2n+O(√lognlogδ−1)+O(logδ−1)I(p). (1)

The solution is through Lemma A.1 and by bounding and . Such provides a sufficient strategy length in the adversarial scenario — see Algorithm 3.5.

###### Algorithm 3.5.
(Adversarial graph search.) Initialization: for each . In each step: query the median and perform the Bayesian updates (Algorithm 2.1). Stop condition: perform exactly queries with as in (1) and return the heaviest vertex.
###### Lemma 3.6.

If is the target, then after queries, with probability at least it holds

 wτ(v∗)≥1nΓ−√τ2lnδ−12−H(p)τ.
###### Proof.

After queries with at most erroneous replies, the weight of the target satisfies:

 wτ(v∗)≥1npℓ(1−p)τ−ℓ=1nΓpτ−ℓ2−H(p)τ.

Denote . Then by Hoeffding bound, with probability at least there is . Thus, after queries, the weight of the target satisfies, with probability

 wτ(v∗)≥1nΓ−a2−H(p)τ,

from which the claim follows. ∎

The following implies Theorem 1.1.

###### Corollary 3.7.

Algorithm 3.5 returns the target correctly with probability .

###### Proof.

By Lemma 3.6 and the definition of in (1), the following holds with probability :

 log2wQ(v∗)≥−log2n−√Q2lnδ−1log2Γ−H(p)Q≥−Q≥log2wQ(V∖{xQ}),

where the last inequality is due to Lemma 3.4. Since the weights of the vertices are non-negative at all times, the only way for this to happen is to have , that is the target being found correctly. ∎

### Proof of Theorem 1.2 (Las Vegas distributional search)

In this scenario the initial weights are set to be the given target distribution , and the stopping condition requires that some vertex accumulates large enough weight — see Algorithm 3.8.

###### Algorithm 3.8.
(Las Vegas distributional graph search.) Initialization: for each . In each step: query the median and perform the Bayesian updates. Stop condition: if for any in some step it holds , then return .
###### Lemma 3.9.

For any , Algorithm 3.8 stops and outputs the target after the expected number of steps.

###### Proof.

We measure the progress at any given moment by a random variable

. Observe that if the reply is erroneous in a step , then , and if it is correct, then .

For the sake of bounding the number of steps of the algorithm, we assume it is simulated indefinitely. Let be the smallest integer such that .

By Lemma 3.4 we have that , thus since . But if for any there is , then , since implies . Thus we deduce that . Additionally, from we get that is -heavy, hence bounds the strategy length.

From and the minimality of we deduce . In particular

 E[ζQ]≤log21δ+1. (2)

Let and observe that . Note that trivially , hence in particular . Also, ’s are independent and is a stopping time. Therefore, by using Wald’s identity (Theorem 2.3) we get

 E[ζQ]−ζ0=E[ζQ−ζ0]=E[X1+⋯+XQ]=E[Q]I(p).

Thus, by (2) and , we have

 1+log21δ≥E[ζQ]=ζ0+E[Q]I(p),

which results in a bound

 E[Q]≤log21/μ(v∗)+log21/δ+1I(p).\qed

Lemma 3.9 implies Theorem 1.2 by taking expectation over all possible target locations according to the input distribution .

### Proof of Theorem 1.3 (Las Vegas adversarial search)

The adversarial setting is resolved by taking the uniform initial target distribution and scaling down the threshold as shown in Algorithm 3.10.

###### Algorithm 3.10.
(Las Vegas adversarial graph search.) Run algorithm 3.8 with for all and confidence threshold
###### Lemma 3.11.

Algorithm 3.10 finds the target correctly with probability at least after expected number of queries.

###### Proof.

The time bound applies from Theorem 1.2. We argue about the correctness. Denote by the number of yes-answers required to go from a vertex being -heavy to being -heavy. For now assume that , we will deal with the other case later. For a non-target vertex to be declared by the algorithm as the target, it has to observe a suffix of the strategy being a random walk on a 1-dimensional discrete grid and transition probabilities for and for . We consider a random walk starting at position and ending when reaching either or and call it a subphase (w.l.o.g. we can assume that is even). Any execution of the algorithm can be partitioned into maximal in terms of containment, disjoint subphases. Each subphase starts when one particular heavy vertex receives more yes-answers than no-answers within the interval in which is heavy. Then, a subphase ends when either the algorithm declares to be the target or stops being heavy. By the standard analysis of the gamblers ruin problem, each subphase (where the heavy vertex is not the target) has failure probability . Let us denote by a random variable the number of subphases in the execution of the algorithm. Let be the length of -th subphase. By the standard analysis of the gamblers ruin problem,

 E[Fi]=A/21−2p−A1−2p11+(1−pp)A/2≥A/21−2p⎛⎜ ⎜⎝1−21+√1−δ′δ′⎞⎟ ⎟⎠=Ω(1ε2),

where the asymptotic holds since w.l.o.g. , and also since if , then , and otherwise . Let be the total length of all subphases. Observe that is a stopping time, hence we have by Theorem 2.3. By Theorem 1.2, holds for the strategy length . Since , .

By application of the union bound, the error probability for the whole procedure is bounded by for appropriately chosen constant in the definition of .

We now deal with case of . This requires , and (since if , appropriate choice of constant in enforces ) and so the expected strategy length is . By the union bound, algorithm receives a single erroneous response with probability at most . ∎

## 4 Binary search

The strategies for binary search setting will be partitioned into epochs. By an epoch we mean a sequence of queries to the same element. The lengths of the epochs are fixed in advance and oblivious to responses of the adversary. However, we allow for different epochs to be of different lengths - the length of -th epoch is denoted by (thus the -th epoch starts at step and ends at step ). After each query we perform Bayesian updates using Algorithm 2.1. The element to be queried is selected as follows: at the start of the execution of the algorithm, all elements are unmarked, and in the process of execution we gradually mark some elements. Denote the set of marked elements in step by (we write when the current step is clear from the context). At the start of an epoch, we find an element , which we call central, such that and . This element is the one repeatedly queried in the epoch. We fix the length of -th epoch, for each , to be

 Ei=max(116ε−2i−23,1). (3)

Algorithm 4.1 shows how the general framework of using epochs in our strategies.

###### Algorithm 4.1.
(Epochs in binary search.) Initialization: , set the initial weights. Run epoch : query the central element for steps, performing the Bayesian updates. End epoch : mark by adding it to , , and proceed to the next epoch.

In order to turn Algorithm 4.1 into a particular strategy, we will provide weight initialization and a stopping condition. We note that the stopping condition will not affect the starting points of the epochs but it may simply end the strategy during some epoch. As a result of the execution of the Algorithm 4.1 we get set of all the elements that we have queried. For each particular strategy we will ensure that the number of queries is large enough such that . On the other hand we will always be able to bound the size of efficiently. That way, we will be able to treat as a set of "potential targets" and the target selection will be done by performing another binary search within the set . For that we will use the algorithm of Feige et. al. [17] of query complexity .

We start the analysis of the running time by determining the expected weight drop during the search. To that end we scrutinize the behavior of the random variable (weight of the unmarked vertices) by analyzing a coupled process which we denote . It is defined as follows: for an epoch of length that starts in step and in which -answers and -answers occurred, with , we denote , and Since and , we obtain the following property

 wτ(V∖Mτ)≤Wτ. (4)
###### Lemma 4.2.

Let be the start of an epoch of length . Assume . Then

 E[Wτ+k]≤12k(1+O(ε4k2))Wτ. (5)
###### Proof.

We can assume without loss of generality that . We denote the number of -answers by and -answers by . We have then

 E[W−τ+k]=Wτ2(2(1−p)p)k=Wτ22−k(1−4ε2)k,
 E[W+τ+k]=Wτ2((1−p)2+p2)k=Wτ22−k(1+4ε2)k,

and

 E[Wτ+k]=Wτ22−k((1−4ε2)k+(1+4ε2)k)=2−k(1+O(ε4k2))Wτ.\qed

The total length of the first epochs is then . So in queries there are epochs.

We now bound the expected change of in queries, applying Lemma 4.2 to each epoch:

 E[WQ]=f∏i=12−Ei(1+O(ε4E2i))≤2−Q∞∏i=1(1+O(i−4/3))≤C2−Q (6)

for large enough . By application of Markov’s inequality, we reach a following corollary.

###### Corollary 4.3.

With probability at least it holds for any .

### Theorem 1.4: Worst-case strategy length

The strategy has a priori determined length of the part comprised of epochs leaving the candidate set of potential targets :

 Q=1I(p)(log2n+O(logδ−1)+O(√logδ−1logn)). (7)

Recall that the epochs’ lengths are set in (3). We argue (cf. Lemma 4.5) that after steps the target is in with sufficient probability. Since the is chosen so that the size of is polylogarithmic in and , any search algorithm of complexity is sufficient to finish the search; we use the one in [17]. See Algorithm 4.4 for a formal statement.

###### Algorithm 4.4.

(Adversarial binary search.)

Initialization: for each element .

Execute Algorithm 4.1 for exactly steps with as in (7).

Run the algorithm from [17] with confidence threshold to find the target in .

###### Lemma 4.5.

For any , Algorithm 4.4 finds the target correctly with probability at least in steps.

###### Proof.

By Corollary 4.3, with probability we have

 wQ(V∖MQ)≤WQ≤3Cδ2−Q.

We set so that

 1nΓ−√Q2ln3/δ2−H(p)Q>3Cδ2−Q. (8)

By using similar arguments as in Lemma 3.6, with probability at least it holds . This is possible only if . Since , the algorithm of Feige et. al. [17] finds the target correctly in steps with probability . The confidence threshold follows then from the union bound.

To finalize the proof, we observe that (8) requires

 I(p)Q>√Q2ln3/δlog2Γ+log2(3Cδ).

This can be solved to obtain as in (7) through Lemma A.1 and by bounding and . ∎

### 4.1 Theorem 1.5: Las Vegas distributional search

The lengths of the epochs remain the same, that is as in (3). The stop condition (for the execution of Algorithm 4.1) mimics the one for the graph case but takes the entire set of marked elements into account — see Algorithm 4.6.

###### Algorithm 4.6.
(Las Vegas distributional binary search.) Initialization: for each . Execute Algorithm 4.1 until in some step it holds . Run the algorithm from [17] with confidence threshold to find the target in .

The correctness of Algorithm 4.6 follows from the fact that at the time of finishing Algorithm 4.1 at a step , we have . Then the following search in the set incurs the additional error of at most . Theorem 1.5 hence follows from the lemma below and taking expectation over possible target locations.

###### Lemma 4.7.

Algorithm 4.6 terminates after the expected number of steps.

###### Proof.

We measure the progress at any given step by a random variable . If the answer in step is erroneous, then and otherwise .

For the sake of bounding the number of steps of the algorithm, we consider the first part of the algorithm running indefinitely. Let be the smallest integer such that .

We have