# An Efficient Noisy Binary Search in Graphs via Median Approximation

Consider a generalization of the classical binary search problem in linearly sorted data to the graph-theoretic setting. The goal is to design an adaptive query algorithm, called a strategy, that identifies an initially unknown target vertex in a graph by asking queries. Each query is conducted as follows: the strategy selects a vertex q and receives a reply v: if q is the target, then v=q, and if q is not the target, then v is a neighbor of q that lies on a shortest path to the target. Furthermore, there is a noise parameter 0≤ p<1/2, which means that each reply can be incorrect with probability p. The optimization criterion to be minimized is the overall number of queries asked by the strategy, called the query complexity. The query complexity is well understood to be O(ε^-2log n) for general graphs, where n is the order of the graph and ε=1/2-p. However, implementing such a strategy is computationally expensive, with each query requiring possibly O(n^2) operations. In this work we propose two efficient strategies that keep the optimal query complexity. The first strategy achieves the overall complexity of O(ε^-1nlog n) per a single query. The second strategy is dedicated to graphs of small diameter D and maximum degree Δ and has the average complexity of O(n+ε^-2DΔlog n) per query. We stress out that we develop an algorithmic tool of graph median approximation that is of independent interest: the median can be efficiently approximated by finding a vertex minimizing the sum of distances to a randomly sampled vertex subset of size O(ε^-2log n).

## Authors

• 11 publications
• 5 publications
• 22 publications
04/05/2018

### A Framework for Searching in Graphs in the Presence of Errors

We consider two types of searching models, where the goal is to design a...
10/12/2020

### Interval Query Problem on Cube-free Median Graphs

In this paper, we introduce the interval query problem on cube-free medi...
10/22/2009

### The Geometry of Generalized Binary Search

This paper investigates the problem of determining a binary-valued funct...
02/25/2019

### Succinct Data Structures for Families of Interval Graphs

We consider the problem of designing succinct data structures for interv...
12/18/2017

### The Power of Vertex Sparsifiers in Dynamic Graph Algorithms

We introduce a new algorithmic framework for designing dynamic graph alg...
01/05/2022

### Deterministic metric 1-median selection with very few queries

Given an n-point metric space (M,d), metric 1-median asks for a point p∈...
11/25/2021

### On Queries Determined by a Constant Number of Homomorphism Counts

It is well known [Lovász, 1967] that up to isomorphism a graph G is dete...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Our research problems originate in the classical “twenty questions game” proposed by Rényi [36] and Ulam [42]. The classical problem of binary search with erroneous comparisons received a considerable attention and optimal query complexity algorithms are known, see e.g. [8, 11, 21, 24, 35] for asymptotically best results. The binary search in linearly ordered data can be re-casted as a search on a path, where each query selects a vertex and reply gives whether the target element is , or is to the left or to the right of . This leads to the graph search problem introduced first for trees by Onak and Parys in [33] and then recently for general graphs by Emamjomeh-Zadeh et al. in [23]. We recall a following formal statement.

Problem formulation.

Consider an arbitrary simple graph whose one vertex is marked as the target. The target is unknown to the query algorithm. Each query points to a vertex , and a correct reply does the following: if , then the reply returns , and if , then the reply returns a neighbor of that belongs to a shortest path from to , breaking ties arbitrarily. We further assume that some replies can be incorrect: each query receives an erroneous reply (independently) with some fixed probability (the value of the noise parameter is known to the algorithm). The goal is to design an algorithm, also called a strategy performing as few queries as possible.

Typically in the applications of the adaptive query problems the main concern is the number of queries to be performed, i.e., their query complexity. This is due to the fact that the queries usually model a time consuming and complex event like making a software check to verify whether it contains a malfunctioning piece of code, c.f. Ben-Asher et al. [7], or asking users for some sort of feedback c.f. Emamjomeh-Zadeh and Kempe [22]. However, as a second measure the computational complexity comes into play and it is of practical interest to resolve the question of having an adaptive query algorithm that keeps an optimal query complexity and optimizes the computational cost as a second criterion. This may be especially useful in cases when queries are fast, like communication events over a noisy channel.

The asymptotics of the query complexity is quite well understood to be roughly (c.f. [20, 23]), where is the order of the graph, , and is the entropy. Thus, it is of theoretical and practical interest to know what is the optimal complexity of computing each particular query. This leads us to a general statement of the type of solution we seek.

Research question.

How much the computational complexity of an adaptive graph query algorithm can be improved without worsening the query complexity?

In this work we make the following assumption: a distance oracle is available to the algorithm and it gives the graph distance between any pair of vertices. This is dictated by the observation that the computation of multiple-pair shortest paths throughout the search would dominate the computational complexity. On the other hand, we note that this is only used to resolve (multiple times) the following for a query: given a vertex , its neighbor and an arbitrary vertex , does lie on a shortest path from to ? Thus, some weaker oracles can be assumed instead. We further comment on this assumption in the next section.

### 1.1 Motivation

To sketch potential practical scenarios of using graph queries we mention a set of examples given in [22]

. These examples are anchored in the field of machine learning, and since they have the same flavor with respect as how graphs are used, we refer to one of them. Consider a situation in which a

system wants to learn a clustering by asking queries. Each query presents a potential clustering to a user and if this is not the target clustering, then as a response the user either points two clusters that should be merged or points one cluster that should be split (but does not say how to split it). Thus, the goal is to construct a query algorithm to be used by the system. It turns out that learning the clustering can be done by asking queries on a graph: each vertex corresponds to a clustering and a reply of the user for will be aligned with one of the edges incident to . In other words, the reply can be associated with an edge outgoing from that lies on a shortest path to the desired target clustering. We emphasize some properties of this approach. First, the fact that the reply indeed reveals the shortest path to the target is an important property of the underlying graph used by the algorithm and thus the graph needs to be carefully defined to satisfy it. Second, the user is not aware of the fact that such a graph-theoretic approach is used, as only a series of proposed clustering is presented. Third, this approach is resilient to errors on the user side: the graph query algorithms easily handle the facts that some replies can be incorrect (the user may make a mistake, or may not be willing to reveal the truth). It has been shown [22]

that in a similar way one can approach the problems of learning a classifier or learning a ranking.

From the standpoint of complexity we can approach such scenarios in two ways. First, one can derive an algorithm that specifically targets a particular application. More precisely, if one considers one of the above applications, then it may turn out that e.g. it is not necessary to construct the entire graph but instead reconstruct only what is necessary to perform each query. The second way is the general approach taken in this work: to consider the underlying graph as an abstract data structure out of the context of how it is used in particular applications. We note that examples like the ones mentioned above reveal that some applications may be burdened by the fact that the underlying graph is large, in which case the computational complexity, or local search procedures may be more crucial.

We finally comment on our assumption that a shortest path oracle is provided to the algorithm. In the machine learning applications [22], the graphs may be constructed in such a way that knowing which objects represent two vertices is sufficient to conclude the distance between them, i.e., a low-complexity distance oracle can be indeed implemented. This can be seen as a special case of a general approach to achieve distance oracles in practice through the so called distance-labeling schemes (c.f. Gavoille et al. [26] and for practical approaches, c.f. Abraham et al. and Kosowski and Viennot [3, 30]). We finally note that having the exact distances between vertices is crucial for this problem: if the distance oracle is allowed to provide even just a -additive approximation of the exact distance, then each query algorithm needs to perform queries for some graphs c.f. Deligkas et al. [17]. We note that the distance oracle access can be replaced with a multi-source distance computation (e.g. using BFS), at the cost of replacing some of the factors in the cost functions with . Alternatively, a popular assumption borrowed from computational geometry is that we operate on a metric space with a metric (distance) function given.

### 1.2 Our Results and Techniques

For a query on a vertex with a reply , we say that a vertex is consistent with the reply if , or but lies on a shortest path between and ; the set of all such consistent vertices is denoted by . Our method is based on a multiplicative weight update (MWU): the algorithm keeps the weights for all vertices

, starting with a uniform assignment. The weight is representing the likelihood that a vertex is the target, although we point out that formally this is not a probability distribution. In MWU, the weight of each vertex that is not consistent with a reply is divided by an appropriately fixed constant

that depends on .

To keep the query complexity low, it is required that the queried vertex fulfills a measure of ‘centrality’ in a graph in the sense that a query to such a central vertex results in an adequate decrease in the total weight. This is a graph-theoretic analogue of the ‘central’ element comparison in the classical binary search. Two functions that have been used in the literature [17, 20, 22] to formalize this are

 Φ(v)=∑u∈Vd(u,v)⋅ω(u),andΛ(v)=maxu∈N(v)ω(N(v,u)),

where is the set of neighbors of in the graph, and is the distance between and . For brevity, for any , and .

###### Definition 1.1.

A vertex is called a median.

We note a fundamental bisection property of a median:

###### Lemma 1.2 (c.f. [23] section 2).

If is a median, then .

Such property is key for building efficient binary-search algorithms in graphs, see [20, 23]: e.g., for the noiseless case, repeatedly querying a median of , where is the subset of vertices that still can be a target, results in a strategy guaranteeing at most queries.

A disadvantage of using median is that it is computationally costly to find. Moreover, using its multiplicative approximation, that is, through a function such that for any constant , blows up the strategy length exponentially [17] and thus this approach is not suitable. On the other hand, approximating -minimizer is feasible, as noted also by [17].

Hence, we work towards a method of efficient median approximation through minimization. We believe that this algorithmic approach is of independent interest and can be used in different graph-theoretic problems. Interestingly, it turns out that we do not even need a multiplicative approximation of a -minimizer but we only need that is at most roughly half of the total weight. This is potentially usable in algorithms using generally understood graph bisection. (For an example of using such balanced separators for somewhat related search with persistent errors see e.g. Boczkowski et al. [10].) Formally, motivated by Lemma 1.2, we relax the notion of the median to the following.

###### Definition 1.3.

We say that a vertex is -close to a median, for some , when

To work-around the fact that is not efficient from the algorithmic standpoint, we introduce the following relaxation of :

 Φ∗(q)=∑v∈Sd(q,v),

where is a random sample of vertices with probability distribution proportional to . We can now formulate our main contribution in terms of new algorithmic tools:

Median approximation.

The relaxation of to provides, with high probability, a sufficient approximation of the median vertex in a graph.

We formalize this statement in the following way. Consider a sample size , where is the number of vertices of the graph. This allows us to say how to approximate the median efficiently through a local condition:

###### Theorem 1.4.

Let be a vertex such that for each it holds . Then, with high probability at least , the vertex is -close to a median.

As a consequence, we obtain:

###### Corollary 1.5.

Let . Then, the vertex is -close to a median with high probability at least .

Returning to our search problem, these are enough to both find the right query vertex in each step, keep the strategy length low, and have a centrality measure that is efficient in terms of computational complexity. This leads us to the following theorem that is based on MWU with some appropriately fixed scaling factor .

###### Theorem 1.6.

Let be the noise parameter for some . There exists an adaptive query algorithm that after asking queries returns the target correctly with high probability. The computational complexity of the algorithm is per query.

The algorithm behind the theorem iterates over the entire vertex set to find a -minimizer. We can refine this algorithm for graphs of low maximum degree and diameter . For that we use a local search whose direct application requires ‘visiting’ vertices to get to a -minimizer. However, we introduce two ideas to speed it up. The first one is adding another approximation layer on top of : it is not necessary to find the exact -minimizer but its approximation, which we do as follows. Whenever the local search moves from one vertex to its neighbor and the improvement from to is sufficiently small, then will do for the next query. The second one is to start the local search from the vertex queried in the previous step. These two ideas combined lead to the second main result.

###### Theorem 1.7.

Let for some . There exists an adaptive query algorithm that after asking queries returns the target correctly with high probability. The average computational complexity per query is for graphs with diameter and maximum degree .

### 1.3 Related Work

Median computation is one of the fundamental ways of finding central vertices of the graph, with huge impact on practical research [5, 6, 25, 27, 37, 41]. A significant amount of research has been devoted to efficient algorithms for finding medians of networks [34, 39, 40] or approximating the notion [13, 14]. We note the seminal work of Indyk [28] which includes approximation to -median in time in metric spaces – we note that the form of approximation there differs from ours, although the very-high level technique of using random sampling is common. Chechik et al. in [15] use (non-uniform) random sampling to answer queries on sum of distances to the queried vertices in graphs.

We also refer the reader to some recent work on the median computation in median graphs, see Beneteau et al. [9] and references therein. More related centrality measures of a graph are discussed in [1, 2, 12] in the context of fine-grained complexity, showing e.g. that efficient computation of a median vertex (in edge-weighted graphs) is equivalent under subcubic reductions to computation of All-Pairs Shortest Paths.

Substantial amount of research has been done on searching in sorted data (i.e., paths), which included investigations for fixed number of errors [4, 35], optimal strategies for arbitrary number of errors and various error models, including linearly bounded [21], prefix-bounded [11] and noisy/probabilistic [8, 29]. Also, a lot of research has been done on how different types of queries influence the search process — see [16] for a recent work and references therein. The mostly studied comparison queries for paths have been extended to graphs in two ways. First one is a generalization to partial orders [7, 31], although this does not further generalize well for arbitrary graphs [18]. It is worth noting that a lot of work has been devoted to the computational complexity of finding error-less strategies [19, 31, 32]. The second extension is by using the vertex queries studied in this work, for which much less is known in terms of complexity. It is worth to mention that the problem becomes equivalent to the vertex ranking problem for trees [38], but not for general graphs (see also [33]).

Similarly as in the case of the classical binary search, the graph structure guarantees that there always exists a vertex that adequately partitions the search space in the absence of errors [23]. The problem becomes much more challenging as this is no longer the case when errors are present. A centrality measure that works well for finding the right vertex to be queried is a median used in [20, 23]. However, as shown in [17], the median is sensitive to approximations in the following way. When the algorithm decides to query a -approximation of the median (minimizer of which is approximation of ), then some graphs require queries, where the approximation is understood as . This results holds for the error-less case. Furthermore, the authors introduce in [17] the potential (denoted by therein) and prove, also for the error-less case, that it guarantees queries, when in each step a -approximation of the -minimizer is queried. However, this issue has been considered from a theoretical perspective and no optimization considerations have been made. In particular, it was left open as to how to reduce the query complexity at an expense of working with such approximations. This, and the consideration of the noise are two our main improvements with respect to [17]. We also stress out that our definition of -closeness to a median differs from -approximations in the sense that our definition is much less strict: a vertex that is -close to a median may have the property that significantly deviates from .

Some complexity considerations have been touched in [22], from the perspective of targeting specific machine learning applications, where already the above-mentioned -minimizer has been used. To make the statements form that work comparable to our results, we have two distinguish two input size measures that apply. In [22], for a particular application an input consists of a specific machine learning instance, and denote its size by . In order to find a solution for this instance, a graph of size is constructed and an adaptive query algorithm is being run on this graph. It is assumed that is polynomial in . The diameter and maximum degree of are both assumed in [22] to be polylogarithmic in . A local search is used to find a vertex that approximates the -minimizer. For that, in each step a sampling is used for the approximation purposes: for each vertex along the local search, all its neighbors are tested for finding an approximation , giving the complexity of , where is the number of edges of . It is concluded that the overall complexity of performing a single query is .

### 1.4 Outline

We proceed in the paper as follows. Section 2 provides a ‘template’ strategy in which we simply query a vertex that is -close to a median. The strategy length is there fixed carefully to meet the tail bounds on the error probability. Then, in Section 3, we prove that our sample size is enough to ensure high success probability. Section 4 observes that the overall complexity of the algorithm can be reduced by avoiding recasting the entire sample in each step: it is enough to replace only a small fraction of the current sample when going from one step of the strategy to the next. We then combine these tools to prove our main theorems in Section 5, where for Theorem 1.7 we additionally make several observations on speeding-up the classical local search in a graph.

## 2 The Generic Strategy

As an intermediate convenient step, we recall the following adversarial error model: given a constant , if the strategy length is , then it is guaranteed that at most errors occurred throughout the search (their distribution may be arbitrary). We set our parameters as follows: let , , and assume without loss of generality that . Let . With these parameters, we provide Algorithm 1 that runs the multiplicative weight update with for steps. Then we prove (cf. Lemma 2.1) that this strategy length is sufficient for correct target detection in this error model. We write to denote the vertex weight in a step . (So, is the initial uniform weight assignment.)

###### Lemma 2.1.

If during the execution of Algorithm 1 over total queries there were at most errors, then the algorithm outputs the target.

###### Proof.

If a vertex at step satisfies , then we say that is heavy at step . We aim at proving that the overall weight decreases multiplicatively either by at least or per step. In the absence of a heavy vertex we get the first bound, and it is an immediate consequence of the Equation (1) below. If we get a heavy vertex at some point, none of these bounds may be true in this particular step (this phenomenon is inherent to the graph query model itself) but we show below that the second one holds in an amortized way (cf. Lemma 2.3). If at step there is no heavy vertex, then

 ωt+1≤⎛⎝12+δ+12−δΓ⎞⎠ωt=(1−2η+4ηδ)ωt=(1−η)2ωt. (1)

Assume otherwise that there is vertex that is heavy at step .

###### Lemma 2.2.

If at any step there is a heavy vertex , then is the only -close to a median vertex at this step.

###### Proof.

For any , we have that , i.e., is not -close to a median. On the other hand, , i.e., is -close to a median. ∎

The above lemma implies that if some is heavy then it will be queried in this particular step. The next lemma calculates the overall potential drop in a series of steps in which some vertex is heavy.

###### Lemma 2.3.

Consider the maximal consecutive segment of steps where some is heavy. That is, we pick such that is heavy in all steps and is not heavy in steps and . Then,

###### Proof.

First note that, by Lemma 2.2, is queried in each step in . For a query on , we say that a reply is a yes-answer if , and otherwise it is a no-answer. Denote by and the number of yes- and no-answers in , respectively. Note that . Moreover,

 ωt2(q) =(1Γ)bωt1(q), and ωt2(V∖{q})≤(1Γ)aωt1(V∖{q}).

The vertex being heavy at implies and similarly not being heavy at implies . Combining the equality and three inequalities above, we obtain .

We assume without loss of generality that all the yes-answers were given before all the no-answers in the range . Indeed, we observe that rearranging these answers does not change the state of the algorithm at step , and remains heavy for all of the . We have then, for the all of the yes-answers and first no-answers, the following:

 ωt1+2a≤(1Γ)aωt1=(1√Γ)2aωt1≤(Γ+12Γ)2aωt1, (2)

where the first inequality is due to the fact that each of the pairs (a pair understood as a no-answer and a yes-answer) scales down each vertex by at least a factor of , while in the last inequality we have used .

For the remaining steps of , the weight of decreases by a factor of . Thus, for each , using that is heavy in step :

 ωt+1≤ωt(V∖{q})+ωt(q)Γ≤ωt(12−δ)+ωtΓ(12+δ)≤ωt2+ωt2Γ=Γ+12Γ⋅ωt.

Thus, which together with (2) completes the proof of Lemma 2.3. ∎

Let be the target, and be the output of Algorithm 1. Assume w.l.o.g. that the algorithm run for steps. Since

 τ′≥10log2nη2≥log2nrlog2(1−4η)−2log2(1−η),

where the inequality follows from when , we obtain a bound

 (1−4η)rτ′≥(1−η)2τ′⋅n. (3)

We assume that the algorithm outputs an incorrect vertex , and show that it leads to a contradiction. We consider the state of the weights after steps. We consider two cases.

1. There is no heavy vertex after steps. We observe that the starting weight satisfies , and by the bound on the number of errors accumulated on target vertex (it cannot be more than ), we have By Equation (1) and Lemma 2.3, we know that every step contributed at least a factor or multiplicatively to the total weight. Thus, by (3), which leads to a contradiction.

2. Returned vertex is heavy after steps. We append at the end of the strategy a virtual sequence of identical query-answers: algorithm queries , and receives an no-answer pointing towards . Here, is chosen to be minimal such that after steps is no longer heavy (it exists, since each such query increases by 1, and leaves unchanged). However, at the end of round is minimal (possibly not necessarily uniquely minimal). We note that appending those steps did not increase the total number of errors from the answerer, and all of the queries were asked to a heavy vertex . This reduces this case to the previous one, with increased value of . ∎

We now transit from the adversarial search to the noisy setting. This is done by using Algorithm 1 as a black box with being fixed appropriately. Recall that , and we will use the following dependence of on (note that by taking smaller than we accommodate the necessary tail bound in the lemma below, i.e., we ensure that the event of having more than errors is sufficiently unlikely).

###### Lemma 2.4.

Run Algorithm 1 with , where . If an answer to each query was erroneous with probability at most , independently, then the algorithm outputs the target vertex with a high probability of at least .

###### Proof.

Recall in Algorithm 1. Denote by the overall number of errors that have occurred during the execution of the algorithm. The expected number of errors is . By the Hoeffding inequality,

 Pr[L≥r⋅τ]≤exp(−2τ(r−p)2)=exp(−20log2n)≤n−3.

Thus with high probability number of errors is bounded so that we can apply Lemma 2.1 (which in itself gives a deterministic guarantee). ∎

## 3 Sampling Guarantees

To take the ‘random sampling’ counterparts of and , consider a to be a multiset of vertices sampled from with repetitions, with sampling probabilities . That is, for each , we have and choices made for are fully independent. To such an we refer as a random sample. We then define the following potentials

 Φ∗(v)=∑u∈Sd(u,v)andΛ∗(v)=maxu∈N(v)|S∩N(u,v)|,

where the intersection of a multiset with some set is defined as a multiset .

We note a specific detail regarding these functions – we will prove and use the fact that in order to find a vertex that is -close to a median (a vertex we need to query), it is enough to pick an approximation of the -minimizer. This is slightly counterintuitive, since -closeness is defined in terms of which has a similar meaning to . However, the subtlety here is due to a complexity issue — it is easier to recompute the upon updating the sample .

We denote and assume in the rest of the paper that . In this section we prove that this choice of is sufficient, and then Section 4 deals with the complexity issues of the sampling method. The following is shown in the appendix:

###### Lemma 3.1.

For any , there is with a high probability at least .

###### Proof.

Consider any neighbor of . Denote by the indicator variable that . Observe that and and so . By a standard application of Hoeffding bound there is

 Pr(E[∑Xi]−(∑iXi)≥sδ2)≤e−2s(δ/2)2=1n4.

So

 ω(N(v,u))ω≤|S∩N(v,u)|s+δ/2

holds with probability at least .

Taking union bound over at most neighbors , we have that with probability at least the following hold

 Λ(v) =maxu∈N(v)ω(N(v,u)) =(Λ∗(v)s+δ/2)⋅ω.

###### Lemma 3.2.

Let be a vertex such that . Then, .

###### Proof.

To see that suppose, towards a contradiction, that , i.e. there is , such that . Denote and . Using and , we get

 Φ∗(v)≤|A+|−|A−|+∑u∈Sd(q,u)

Hence, we can prove Theorem 1.4: Combining Lemma 3.2 and Lemma 3.1,

 Λ(q)≤(Λ∗(q)s+δ/2)⋅ω≤(12+δ/2+δ/2)⋅ω

with probability at least .

## 4 Maintaining the Sample

We now discuss the complexity of maintaining the sample upon the vertex weight updates. Given a sample set at step , the next sample is computed by a call to Algorithm 2 below.

The correctness of Algorithm 2 is given by Lemma 4.1. Its proof follows the cases in the pseudo-code to show that both the vertices that remain in the sample and the new ones meet the probability requirements for a random sample.

###### Lemma 4.1.

Suppose that in Algorithm 1, after each weight update the current random sample is recalculated by a call to Algorithm 2. Then, with high probability at least , at most resampling operations occur at each step.

###### Proof.

Recall that after querying a vertex at step and receiving an answer , the weights are updated as follows. For each :

• if , then ,

• if , then ,

where we recall that .

Consider a vertex . Assume the for every , . We have two cases:

1. If , then

 Pr(xt+1=u) =Pr(xt=u)+Pr(xt∉N(q,v))⋅(1−1Γ)⋅Pr(xt+1 is sampled as u) =ωt(u)ωt+ωt(V∖N(q,v))ωt(1−1Γ)ωt+1(u)ωt+1 =ωt+1(u)ωt(1+ωt(V∖N(q,v))ωt+1(1−1Γ)) =ωt+1(u)ωt⋅ωt+1+ωt(V∖N(q,v))(1−1Γ)ωt+1 =ωt+1(u)ωt⋅ωt(N(q,v))+ωt(V∖N(q,v))1Γ+ωt(V∖N(q,v))(1−1Γ)ωt+1 =ωt+1(u)ωt⋅ωtωt+1=ωt+1(u)ωt+1.
2. Otherwise, if , then

 Pr(xt+1=u) =Pr(xt=u)⋅1Γ+Pr(xt∉N(q,v))⋅(1−1Γ)⋅Pr(xt+1 is sampled as u) =ωt(u)ωt1Γ+ωt(V∖N(q,v))ωt(1−1Γ)ωt+1(u)ωt+1 =ωt+1(u)ωt(1+ωt(V∖N(q,v))ωt+1(1−1Γ)) =ωt+1(u)ωt+1.

This proves that probabilities for each sample are maintained between steps.

We now bound the actual number of resampling operations necessary. Observe that each element of is re-sampled with probability at most . Let denote number of re-sampled vertices. , and then by Chernoff bound

We comment on the computational complexity of sampling according to a distribution.

###### Observation 4.2.

Sampling vertices according to distribution can be done in operations.

The time for sampling is for generating sorted list of real-values picked uniformly at random from , and for linear scan of all of the weights from .

## 5 Proofs of the Main Theorems

See 1.6

###### Proof.

First, assume without loss of generality that , as otherwise the claimed one-step complexity is . This can be met by an algorithm that at each step queries a median vertex, see [23].

Run Algorithm 1 that performs queries by Lemma 2.4. The algorithm maintains a sample at each step by using Algorithm 2. By Corollary 1.5, the probability that each step of the algorithm indeed uses a vertex that is -close to a median is . After each query, the algorithm updates the weights in time , and vertices are re-sampled by Lemma 4.1, for the cost of which is subsumed by other terms. Thus the cost of maintaining the values of is per vertex, or in total, which is the dominant cost for the algorithm, with the update being performed as:

 Φ∗(v)←Φ∗(v)−∑u∈St+1∖Std(u,v)+∑u∈St∖St+1d(u,v).

Taking a union bound over all steps, we obtain the high success probability . ∎

Now we turn out attention to the proof of Theorem 1.7, where a local search is used. This is a natural approach that gives an improvement for low-degree low-diameter graphs. The two ‘twists’ that we add are early termination (see the pseudo-code shown as Algorithm 3) and resuming from the vertex that is the output of the previous execution of the local search (which is used in the proof of Theorem 1.7). The former allows us to directly bound the number of iterations; cf. Observation 5.1.

###### Observation 5.1.

If Algorithm 3 run with an input vertex returns a vertex , then the number of iterations is upper-bounded by .

See 1.7

###### Proof.

First, w.l.o.g. assume that , by the same reasoning as in the proof of Theorem 1.6.

By Lemma 2.4, Algorithm 1 that performs queries. We consider the following modification to Algorithm 1. As before, the algorithm updates weights in time and maintains a sample at each step (by using Algorithm 2) in time which is subsumed by other terms. However, instead of choosing a vertex that is -close to a median in line 1, the updated algorithm runs Algorithm 3 with the previously queried vertex as an input, and sets the output vertex to be the vertex to be queried. In other words, at each step , it uses Algorithm 3 with input which returns , and queries . The algorithm initializes arbitrarily.

By Lemma 1.4, is -close to a median. By Observation 5.1, we bound the total number of iterations done by Algorithm 3 by

 K ≤τ−1∑t=0(1+Φ∗t+1(vt)−Φ∗t+1(vt+1)δs) =τ+Φ∗1(v0)+∑τ−1t=1(Φ∗t+1(vt)−Φ∗t(vt))−Φ∗τ(vτ)δs ≤τ+sD+2τsεDδs=O(Dτ),

where we used that holds with high probability by Lemma 4.1. Each iteration in Algorithm 3 has complexity making the total complexity of the algorithm to be . ∎

## 6 Open Problems

Having an algorithm that keeps an optimal query complexity and obtains a low computational complexity, one can ask what are the possible tradeoffs between the two? Another question is how much further the computational complexity can be decreased? Also, are there any possible lower bounds that can reveal the limits of what is not achievable in the context of these problems? Regarding the centrality measures we consider, we propose an efficient median approximation. Motivated by this, another question is what are other possible vertex-functions that may allow for further improvements, e.g. in the complexity?

## References

• [1] Amir Abboud, Fabrizio Grandoni, and Virginia Vassilevska Williams. Subcubic equivalences between graph centrality problems, APSP and diameter. In SODA 2015, pages 1681–1697.
• [2] Amir Abboud, Virginia Vassilevska Williams, and Joshua R. Wang. Approximation and fixed parameter subquadratic algorithms for radius and diameter in sparse graphs. In SODA 2016, pages 377–391.
• [3] Ittai Abraham, Daniel Delling, Amos Fiat, Andrew V. Goldberg, and Renato F. Werneck. Highway dimension and provably efficient shortest path algorithms. J. ACM, 63(5):41:1–41:26, 2016.
• [4] Martin Aigner. Searching with lies. J. Comb. Theory, Ser. A, 74(1):43–56, 1996.
• [5] Alex Bavelas. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America, 22(6):725–730, Nov 1950.
• [6] Murray A. Beauchamp. An improved index of centrality. Behavioral Science, 10(2):161–163, 1965.
• [7] Yosi Ben-Asher, Eitan Farchi, and Ilan Newman. Optimal search in trees. SIAM J. Comput., 28(6):2090–2102, 1999.
• [8] Michael Ben-Or and Avinatan Hassidim. The bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In FOCS 2008, pages 221–230.
• [9] Laurine Bénéteau, Jérémie Chalopin, Victor Chepoi, and Yann Vaxès. Medians in median graphs in linear time. CoRR, abs/1907.10398, 2019. arXiv:1907.10398.
• [10] Lucas Boczkowski, Amos Korman, and Yoav Rodeh. Searching a tree with permanently noisy advice. In ESA 2018, pages 54:1–54:13.
• [11] Ryan S. Borgstrom and S. Rao Kosaraju. Comparison-based search in the presence of errors. In STOC 1993, pages 130–136.
• [12] Sergio Cabello. Subquadratic algorithms for the diameter and the sum of pairwise distances in planar graphs. In SODA 2017, pages 2143–2152.
• [13] Domenico Cantone, Gianluca Cincotti, Alfredo Ferro, and Alfredo Pulvirenti. An efficient approximate algorithm for the 1-median problem in metric spaces. SIAM Journal on Optimization, 16(2):434–451, 2005.
• [14] Ching-Lueh Chang. Some results on approximate 1-median selection in metric spaces. Theor. Comput. Sci., 426:1–12, 2012.
• [15] Shiri Chechik, Edith Cohen, and Haim Kaplan. Average distance queries through weighted samples in graphs and metric spaces: High scalability with tight statistical guarantees. In APPROX-RANDOM 2015, pages 659–679.
• [16] Yuval Dagan, Yuval Filmus, Ariel Gabizon, and Shay Moran. Twenty (simple) questions. In STOC 2017, pages 9–21.
• [17] Argyrios Deligkas, George B. Mertzios, and Paul G. Spirakis. Binary search in graphs revisited. Algorithmica, 81(5):1757–1780, 2019.
• [18] Dariusz Dereniowski. Edge ranking and searching in partial orders. Discrete Applied Mathematics, 156(13):2493–2500, 2008.
• [19] Dariusz Dereniowski, Adrian Kosowski, Przemyslaw Uznański, and Mengchuan Zou. Approximation strategies for generalized binary search in weighted trees. In ICALP 2017, pages 84:1–84:14.
• [20] Dariusz Dereniowski, Stefan Tiegel, Przemyslaw Uznański, and Daniel Wolleb-Graf. A framework for searching in graphs in the presence of errors. In SOSA@SODA 2019, pages 4:1–4:17.
• [21] Aditi Dhagat, Péter Gács, and Peter Winkler. On playing "twenty questions" with a liar. In SODA 1992, pages 16–22.
• [22] Ehsan Emamjomeh-Zadeh and David Kempe. A general framework for robust interactive learning. In NIPS 2017, pages 7085–7094.
• [23] Ehsan Emamjomeh-Zadeh, David Kempe, and Vikrant Singhal. Deterministic and probabilistic binary search in graphs. In STOC 2016, pages 519–532.
• [24] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisy information. SIAM J. Comput., 23(5):1001–1018, 1994.
• [25] Linton C Freeman. Centrality in social networks conceptual clarification. Social networks, 1(3):215–239, 1978.
• [26] Cyril Gavoille, David Peleg, Stéphane Pérennes, and Ran Raz. Distance labeling in graphs. J. Algorithms, 53(1):85–112, 2004.
• [27] S Louis Hakimi. Optimum locations of switching centers and the absolute centers and medians of a graph. Operations research, 12(3):450–459, 1964.
• [28] Piotr Indyk. Sublinear time algorithms for metric space problems. In STOC 1999, pages 428–434.
• [29] Richard M. Karp and Robert Kleinberg. Noisy binary search and its applications. In SODA 2007, pages 881–890.
• [30] Adrian Kosowski and Laurent Viennot. Beyond highway dimension: Small distance labels using tree skeletons. In SODA 2017, pages 1462–1478.
• [31] Tak Wah Lam and Fung Ling Yue. Optimal edge ranking of trees in linear time. Algorithmica, 30(1):12–33, 2001.
• [32] Shay Mozes, Krzysztof Onak, and Oren Weimann. Finding an optimal tree searching strategy in linear time. In SODA 2008, pages 1096–1105.
• [33] Krzysztof Onak and Pawel Parys. Generalization of binary search: Searching in trees and forest-like partial orders. In FOCS 2006, pages 379–388.
• [34] Lawrence M. Ostresh. On the convergence of a class of iterative methods for solving the weber location problem. Operations Research, 26(4):597–609, Aug 1978.
• [35] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and Joel Spencer. Coping with errors in binary search procedures. J. Comput. Syst. Sci., 20(3):396–404, 1980.
• [36] Alfréd Rényi. On a problem of information theory. MTA Mat. Kut. Int. Kozl., 6B:505–516, 1961.
• [37] Gert Sabidussi. The centrality index of a graph. Psychometrika, 31(4):581–603, Dec 1966.
• [38] Alejandro A. Schäffer. Optimal node ranking of trees in linear time. Inf. Process. Lett., 33(2):91–96, 1989.
• [39] Koji Tabata, Atsuyoshi Nakamura, and Mineichi Kudo. Fast approximation algorithm for the 1-median problem. In DS 2012, pages 169–183.
• [40] Koji Tabata, Atsuyoshi Nakamura, and Mineichi Kudo. An efficient approximate algorithm for the 1-median problem on a graph. IEICE Trans. Inf. Syst., 100-D(5):994–1002, 2017.
• [41] Barbaros C Tansel, Richard L Francis, and Timothy J Lowe. State of the art—location on networks: a survey. part i: the p-center and p-median problems. Management science, 29(4):482–497, 1983.
• [42] Stanislaw M. Ulam. Adventures of a Mathematician. Scribner, New York, 1976.