Bipartite Independent Set Oracles and Beyond: Can it Even Count Triangles in Polylogarithmic Queries?

10/08/2021
by   Arijit Bishnu, et al.
0

Beame et al. [ITCS 2018] introduced and used the Bipartite Independent Set (BIS) and Independent Set (IS) oracle access to an unknown, simple, unweighted and undirected graph and solved the edge estimation problem. The introduction of this oracle set forth a series of works in a short span of time that either solved open questions mentioned by Beame et al. or were generalizations of their work as in Dell and Lapinskas [STOC 2018], Dell, Lapinskas and Meeks [SODA 2020], Bhattacharya et al. [ISAAC 2019 and arXiv 2019], Chen et al. [SODA 2020]. Edge estimation using BIS can be done using polylogarithmic queries, while IS queries need sub-linear but more than polylogarithmic queries. Chen et al. improved Beame et al.'s upper bound result for edge estimation using IS and also showed an almost matching lower bound. This result was significant because this lower bound result on was the first lower bound result for independent set based oracles; till date no lower bound results exist for BIS. On the other hand, Beame et al. in their introductory work asked a few open questions out of which one was if structures of higher order than edges can be estimated using polylogarithmic number of BIS queries. Motivated by this question, we resolve in the negative by showing a lower bound (greater than polylogarithmic) for estimating the number of triangles using BIS. While doing so, we prove the first lower bound result involving BIS. We also provide a matching upper bound. Till now, query oracles were used for commensurate jobs – and for edge estimation, for triangle estimation, for hyperedge estimation. Ours is a work that uses a lower order oracle access, like to estimate a higher order structure like triangle.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/09/2019

Nearly optimal edge estimation with independent set queries

We study the problem of estimating the number of edges of an unknown, un...
06/15/2018

Bounds on the number of 2-level polytopes, cones and configurations

We prove an upper bound of the form 2^O(d^2 polylog d) on the number of ...
08/02/2018

Triangle Estimation using Polylogarithmic Queries

Estimating the number of triangles in a graph is one of the most fundame...
02/10/2019

Set Cover in Sub-linear Time

We study the classic set cover problem from the perspective of sub-linea...
10/05/2018

Linear Queries Estimation with Local Differential Privacy

We study the problem of estimating a set of d linear queries with respec...
05/23/2018

Construnctions of LOCC indistinguishable set of generalized Bell states

In this paper, we mainly consider the local indistinguishability of the ...
11/20/2017

Edge Estimation with Independent Set Oracles

We study the problem of estimating the number of edges in a graph with a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The starting point of this work is based on an open question of Beame et al. [BeameHRRS18, DBLP:journals/talg/BeameHRRS20], who introduced a new query oracle named Bipartite Independent Set (BIS) access to a undirected graph (henceforth, the graph will mean a undirected simple graph) to solve the problem of edge estimation using polylogarithmic many queries. We resolve, using matching upper and lower bounds, the query complexity of triangle estimation. Our result implies that BIS access cannot estimate the number of triangles, the next higher order structure to edge in a graph, using polylogarithmic queries.

1.1 Query oracle access to a graph.

Motivated by connections to group testing, to emptiness versus counting questions in computational geometry, and to the complexity of decision versus counting problems, Beame et al. introduced the Bipartite Independent Set (shortened as BIS) and Independent Set (shortened as IS) oracles as a counterpoint to the local queries [GoldreichR02, Feige06, GoldreichR08]. The BIS query oracle can be seen in the lineage of the query oracles [Stockmeyer83, Stockmeyer85, RonT16] that go beyond the local queries. Let us start by looking into the formal definitions of BIS and IS.

Definition 1.1.

(Bipartite Independent Set) Given disjoint subsets , a BIS query answers whether there exists an edge between and in .

Definition 1.2.

(Independent Set) Given a subset , a IS query answers whether there exists an edge between vertices of in .

The introduction of this new type of oracle access to a graph spawned a series of works that either solved open questions [DellL18, DellLM19] mentioned in Beame et al. or were generalizations [DellLM19, Bhatta-abs-1808-00691, abs-1908-04196]. Beame et al. used BIS and IS queries to estimate the number of edges in a graph [BeameHRRS18, DBLP:journals/talg/BeameHRRS20]. One of their striking observations was that BIS queries were more effective than IS queries for estimating edges. This observation also fits in with the fact that IS queries can be simulated in a randomized fashion using polylogarithmic BIS queries 111Let us consider an IS query with input . Let us partition into two parts and by putting each vertex in to or independently and uniformly at random. Then we make a BIS query with inputs and , and report is an independent set if and only if BIS reports that there is no edge with one endpoint in each of and

. Observe that we will be correct with at least probability

. We can boost up the probability by repeating the above procedure suitable number of times.. Edge estimation using BIS was also solved in [DellL18-stoc18] albeit in a higher query complexity than [BeameHRRS18]. There were later generalizations of the BIS oracle to estimate higher order structures like triangles and hyperedges [DellLM19, BhattaISAAC, abs-1908-04196]. On the IS front, Beame et al.’s result for edge estimation using IS oracle was improved in [CLW-soda-2020] with an almost matching lower bound. One can observe the interest generated in these (bipartite) independent set based oracles in a short span of time. The results are summarized in Table 1; a cursory glance would tell us that commensurate higher order queries were needed for estimating higher order structures (Tripartite Independent Set (shortened as TIS) for counting triangles, Colorful Independence Oracle (shortened as CID) for hyperedges) if polylogarithmic number of queries is the benchmark. We provide the definitions of TIS and CID below.

Definition 1.3.

(Tripartite Independent Set)[BhattaISAAC]: Given three disjoint subsets of the vertex set of a graph , the TIS oracle reports whether there exists a triangle having endpoints in and .

Definition 1.4.

(Colorful Independence Oracle) [BishnuGKM018, DellLM19]: Given pairwise disjoint subsets of vertices of a hypergraph ( is the vertex set of the hypergraph ) as input, CID query oracle answers whether , where denotes the number of hyperedges in having exactly one vertex in each , .

Work Oracle used Structure Upper bound Any other
estimated Lower bound problem solved?
 [GoldreichR08] Local edge Approximating
average distance.
 [ChenLW19] IS edge
 [BeameHRRS18] BIS edge Edge estimation
using IS queries.
 [BhattaISAAC] TIS triangle
 [DellLM19], CID hyperedge  [DellLM19] resolved
 [abs-1908-04196] Q2 in positive.
 [EdenLRS15] Local triangle
 [DBLP:conf/innovations/AssadiKK19] Local+ triangle Estimated number
Random Edge of arbitrary subgraphs.
This work BIS triangle
Table 1: The whole gamut of results involving Local queries [G2017], BIS, IS and its generalizations. is the maximum number of triangles on an edge. Both these results estimate the number of hyperedges in a -uniform hypergraph, where is treated as a constant. Here and denote the number of vertices, edges and triangles in a graph , respectively. and hide a multiplicative factor of and , respectively.

1.2 The open questions suggested by Beame et al.

For a work that has spawned many interesting results in such a short span of time, let us focus on the open problems and future research directions mentioned in [BeameHRRS18, DBLP:journals/talg/BeameHRRS20].

  • Can the number of cliques be estimated using polylogarithmic number of BIS queries?

  • Can polylogarithmic number of BIS queries sample an edge uniformly at random?

  • Can BIS or IS queries possibly be used in combination with local queries for graph parameter estimation problems?

  • What other oracles, besides subset queries, allow estimating graph parameters with a polylogarithmic number of queries?

The answers to the questions and a discussion.

Only Q2 has been resolved till now in the positive [DellLM19] as can be observed from Table 1. At its core, Q1 asks if a query oracle can step up, i.e., if it can estimate a structure that is of a higher order than what the oracle was designed for. The framing of Q1 seems that Beame et al. expected a polylogarithmic query complexity for estimation of the number of cliques using BIS. Pertinent to these questions, we also want to bring to focus a work [abs-2006-14015] where the authors mention that it seems to them that estimation of higher order structures will require higher order queries (see the discussion after Proposition 23 of [abs-2006-14015]). They showed that many BIS queries are required to separate triangle free graph instances from graph instances having at least one triangle. This lower bound follows directly from the communication complexity of triangle freeness testing [DBLP:conf/soda/Bar-YossefKS02]. However, the full complexity of triangle estimation when emptiness queries like BIS are available remains elusive. It seems to us that the observations in  [BeameHRRS18, DBLP:journals/talg/BeameHRRS20] and  [abs-2006-14015] about the power of BIS in estimating higher order structures stand in contrast. In this backdrop, we place our results by answering Q1 in the negative with a lower bound involving BIS in this paper. BIS has an inherent asymmetry in its structure in the following sense – when BIS says that there exists no edge between two disjoint sets, then BIS stands as a witness to the existence of two sets of vertices having no interdependence, while a yes answer implies that there can be any number of edges, varying from one to the product of the cardinality of the two sets, going across the two sets. We feel that this property of BIS gives it its power, but on the other hand, also makes it difficult to analyze. That is probably the reason why works related to upper bound for BIS and its generalizations exist, whereas works on lower bound were not forthcoming. Though not on BIS, the work of Chen et al. using IS queries was the first to discuss a lower bound on independent set based oracles. Our work goes one step further in being the first one to prove a lower bound for the BIS oracle. We resolve the open question by showing a requisite lower bound involving BIS for estimating triangles. Our result even goes further – if we want to estimate the number of triangles using a polylogarithmic number of queries, then even a stronger query than BIS (named as Edge Emptiness (see Definition 1.5)) is hopeless (see Theorem 1.6)!

1.3 A stronger oracle than Bis, our main result and its consequences.

Now we define Edge Emptiness (shortened as EE) query oracle which is stronger than both BIS and IS. The Edge Emptiness query is a form of a subset query [Stockmeyer83, Stockmeyer85, RonT16] where a subset query with a subset asks whether is empty or not, where is also a subset of the universe . The Edge Emptiness query operates with being the set of all vertex pairs in , being the set of edges in , and being a subset of pairs of vertices of .

Definition 1.5.

(Edge Emptiness) Given a subset , a EE query answers whether there exists an such that is an edge in .

Note that each BIS query can be simulated by an EE query 222Let us consider a BIS query with inpus and . Let be the set of vertex pairs with one vertex from each of and . We call EE oracle with input , and report there is an edge having one vertex in each of and if and only if the EE oracle reports that there exists an that forms an edge in . Similarly, we can simulate an IS query with input by using an EE query with input .. We prove our lower bound in terms of the stronger EE queries that will directly imply the lower bound in terms of BIS. But we prove matching upper bound in terms of BIS. Our main results are stated below in an informal setting. The formal statements are given in Theorems 3.1 and 4.1.

Theorem 1.6 (Main lower bound (informal statement)).

Let . Any (randomized) algorithm that has BIS query access to a graph with vertices and edges, requires many BIS queries to decide whether the number of triangles in is at most or at least .

Theorem 1.7 (Main upper bound (informal statement)).

There exists an algorithm, that has BIS query access to a graph , finds a -approximation to the number of triangles in with high probabilility, and makes many BIS queries in expectaton. Here denote the number of vertices, edges and triangles in .

Note that Edge Emptiness query is the strongest subset query on edges of the graph. Informally speaking, our lower bound states that no subset query on edges can estimate the number of triangles in a graph by using polylogarithmic many queries. However, the results of Bhattacharya et al. [abs-1908-04196] and Dell et al. [DellLM19] imply that polylogarithmic many TIS queries are enough to estimate the number of triangles in the graph. Note that TIS query is also a subset query on triangles in the graph. To complement our lower bound result, we also give an algorithm (see Theorem 1.7) for estimating the number of triangles in a graph with BIS queries that matches our lower bound. Here we would also like to mention that the number of BIS queries our algorithm uses is less than that of the number of local queries [G2017] needed to estimate the number of triangles in a graph. This implies that we are resolving Q3 in positive in the sense that BIS queries are efficient queries for triangle estimation vis-a-vis local queries [EdenLRS15] coupled with even random edge queries [DBLP:conf/innovations/AssadiKK19] (see Table 1).

1.4 Notations

Throughout the paper, the graphs are undirected and simple. For a graph , and denote the set of vertices and edges, respectively; , and the number of triangles is , unless otherwise specified. We use to denote the set of vertex pairs in . Note that . For , represents the set of vertices that belong to at least one pair in . The neighborhood of a vertex is denoted by , and is called the degree of vertex in . denotes the set , that is, the set of common neighbors of and in . If , denotes the set of vertices that forms triangles with as one of their edges. The induced degree of a vertex in is the cardinality of . For , the subgraph of induced by is denoted by . Note that . For two disjoint sets , the bipartite subgraph of induced by and is denoted by . Note that is the set of edges having one vertex in and the other vertex in .

Throughout the paper, is the approximation parameter. When we say is a -approximation of , then . Polylogarithmic means . and hide a multiplicative factor of and , respectively. We have avoided floor and ceiling for simplicity of presentation. The constants in this paper are not taken optimally. We have taken them to let the calculation work with clarity. However, those can be changed to other suitable and appropriate constants.

1.5 Paper organization

We start with the technical overview of our lower and upper bounds in Section 2.1 and Section 2.2, respectively. The detailed lower and upper bound proofs are in Section 3 and Section 4, respectively. The missing proofs are presented in Appendix A. In Appendix B, we state some useful probability results.

2 Technical overview

2.1 Overview for the proof of our lower bound (Theorem 1.6)

Let us consider as in Theorem 1.6. We prove the desired bound for BIS (stated in Theorem 1.6) by proving the lower bound is when and when for EE query access, where is a suitably chosen constant.

The idea for the lower bound of when :

We prove by using Yao’s method [DBLP:books/crc/99/0001R99]. There are two distributions and (as described below) from which is sampled satisfying . Note that, for each ,  333Without loss of generality, we assume that is an integer. The proof can be extended to any graph having edges by adding suitable number of isolated vertices., and with a probability of at least . But the number of triangles in each is at least two factor more than that of the number of triangles in any , with a probability of at least .

  • The vertex set (with ) is partitioned into four parts uniformly at random. Vertex set forms a biclique with vertex set and vertex set forms a biclique with vertex set . Then every vertex pair , with and , is added as an edge to graph with probability ;

  • The vertex set (with ) is partitioned into four parts uniformly at random. Vertex set forms a biclique with vertex set and vertex set forms a biclique with vertex set . Then every vertex pair , with and , is added as an edge to graph with probability . Then each vertex of is sampled with probability . Let be the sampled set. Each vertex of is connected to every vertex of with an edge;

The constants, including , in the order notations above are suitably set to have the followings:

When :

The number of triangles in each graph is at most , with a probability of at least ;

When :

with a probability of at least . Hence, the number of triangles in each is at least , with a probability of at least .

Now, consider a particular EE query with input . Here, we divide the discussion into two parts, based on and , where is a threshold. If we query with the number of vertex pairs more than the threshold, chances are more we will not be able to distinguish between and . When , we can show that there exists a vertex pair such that is an edge in , with a probability of at least , irrespective of whether or . Intuitively, this is because the number of vertices and edges in are and , respectively. So, EE queries with input such that will not be useful to distinguish whether or .

We prove the desired lower bound by proving many EE queries are necessary to decide whether or with a probability of at least . Note that when . So, the number of EE queries needed to decide whether or , is at least the number of EE queries needed to touch at least one vertex of when . Here, by touching at least a vertex of , we mean . As we have argued that only EE query with input with can be useful, the probability that we touch a vertex in with such a query is at most . Hence the number of EE queries to touch at least a vertex of , is at least , that is, .

To let the the above discussion work, when , must be at least . But with a probability of at least . Because of this, we take in the above discussion. The formal statement of the lower bound, when , is given in Lemma 3.2 in Section 3. What we have discussed here is just an overview, the formal proof of Lemma 3.2 is much more invloved and delicate, which is presented in Section 3.1.

The idea for the lower bound of when :

Let us consider an algorithm , having EE query access to an unknown graph , that decides whether the number of triangles in is at most or at least with a probability of at least , where the parameter satisfies . We prove the desired lower bound by reducing from the case in a graph to the case in a graph . After the reduction, we get the lower bound for the case as we have already established the lower bound for any .

Let be the unknown graph to which we have EE query access and , where and . The unknown graph (for algorithm ) is such that and , where is a triangle-free graph having many vertices (disjoint from ), and many edges. We choose the constants in and such that  444This is to satisfy the requirement of algorithm . Note that , , and the number of triangles in is same as that of . Also, an EE query to graph can be answered by an EE query to graph . Hence, because of our lower bound in the case of ,

The lower bound for the number of EE queries made by algorithm
The number of EE queries required by any algorithm that estimates the number of triangles in

The formal statement of the lower bound, when , is given in Lemma 3.3 in Section 3, and the proof is presented in Section 3.2.

2.2 Overview for our upper bound (Theorem 1.7)

We establish the upper bound claimed in Theorem 1.7 by giving two algorithms that report a -approximation to the number of triangles in the graph:

  • that makes many BIS queries;

  • that makes many BIS queries.

Informally speaking, our final algorithm Triangle-Est calls and when and , respectively. Observe that, if Triangle-Est knows within a constant factor, then it can decide which one to use among and . If Triangle-Est does not know within a constant factor, then it starts from a guess and makes a geometric search on until the output of Triangle-Est is consistent with . Depending on whether or , Triangle-Est decides which one among and to call. This guessing technique is very standard by now in property testing literature [GoldreichR08, EdenLRS15, EdenRS18, DBLP:conf/innovations/AssadiKK19]. Another point to note is that we do not know . However, we can estimate by using many BIS queries (see Table 1). An estimate of will perfectly work for us in this case.

Algorithm :

Algorithm Triangle-Est-High is inspired by the triangle estimation algorithm of Assadi et al. [DBLP:conf/innovations/AssadiKK19], where we have Adjacency, Degree, Random neighbor and Random Edge queries. Please see Section 4.2 for formal definitions of these queries. Note that the algorithm by Assadi et al. can be suitably modified even if we have approximate versions of Degree, Random neighbor and Random Edge queries. Also refer Section 4.2 for formal definitions of approximate version of the above queries. By Corollary 4.8, many BIS queries are enough to simulate the approximate versions of Degree and Random neighbor, with a probability of at least . By Proposition 4.7, approximate version of Random Edge queries can also be simulated by many BIS queries, with a probability of at least . Putting everything together, we get for triangle estimation that makes many BIS queries. The formal statement of the corresponding triangle estimation result is given in Lemma 4.2, and algorithm is described in Section 4.2.

Algorithm :

This algorithm is inspired by the two pass streaming algorithm for triangle estimation by McGregor et al. [DBLP:conf/pods/McGregorVV16]. Basically, we show that the steps of McGregor et al.’s algorithm can be executed by using BIS queries. To do so, we have used the fact that, given any , all the edges of the subgraph induced by can be enumerated by using many BIS queries (see Proposition 4.4 for the formal statement). The formal statement of the corresponding triangle estimation result is given in Lemma 4.3, and algorithm is described in Section 4.3 along with its correctness proof and query complexity analysis.

3 Lower bound for estimating triangles using Edge Emptiness queries

In this Section, we prove the main lower bound result as sketched in Theorem 1.6; the formal theorem statement is stated below. As mentioned earlier, the lower bound proofs will be for the stronger query oracle EE. This will imply the lower bound for BIS.

Theorem 3.1 (Main lower bound result).

Let be such that . Any (randomized) algorithm that has EE oracle access to a graph must make many EE queries to decide whether the number of triangles in is at most or at least with a probability of at least , where has many vertices, many edges.

We prove the above theorem by proving Lemmas 3.2 and 3.3, as stated below. Note that Lemmas 3.2 and 3.3 talk about the desired lower bound when the number of triangles in the graph is large () and small (), respectively.

Lemma 3.2 (Lower bound when there are large number of triangles).

Let be such that . Any (randomized) algorithm that has EE oracle access to a graph must make many EE queries to decide whether the number of triangles in is at most or at least with a probability of at least , where has many vertices, many edges.

Lemma 3.3 (Lower bound when there are small number of triangles).

Let be such that . Any (randomized) algorithm that has EE oracle access to a graph must make many EE queries to decide whether the number of triangles in is at most or at least with a probability of at least , where has many vertices and many edges.

We first show Lemma 3.2 in Section 3.1, and then Lemma 3.3 in Section 3.2. Note that the proof of Lemma 3.3 will use Lemma 3.2.

3.1 Proof of Lemma 3.2

Without loss of generality, assume that is an integer. We prove for the case when . But, we can make the proof work for any by adding many isolated vertices. Note that here. We further assume that , and . Otherwise, the stated lower bound of trivially follows as hides a multiplicative factor of .

We use Yao’s min-max principle to prove the lower bound. To do so, we consider two distributions and on graphs where

  • Any graph has many vertices;

  • Any graph has many edges with a probability of at least ;

  • The number of triangles in any graph is at most with a probability of at least , and any graph has at least many triangles with a probability of at least .

Note that, if we can show that any deterministic algorithm that distinguishes graphs from and , with a probability of at least , must make many EE queries, then we are done with the proof of Lemma 3.2.

3.1.1 The (hard) distribution for the input, its properties, and the proof set up

  • A graph is sampled as follows:

    • Partition the vertex set into parts , by initializing as empty sets, and then putting each vertex in into one of the parts uniformly at random and independent of other vertices;

    • Connect each vertex of with every vertex of with an edge to form a biclique. Also, connect each vertex of with every vertex of with an edge to form another biclique;

    • For every where and , add edge to with probability .

  • A graph is sampled as follows:

    • Partition the vertex set into parts , by initializing as empty sets, and then putting each vertex in into one of the partitions uniformly at random and independent of other vertices;

    • Connect each vertex of with every vertex of with an edge to form a biclique. Also, connect each vertex of with every vertex of with an edge to form another biclique;

    • For every where and , add edge to with probability .

    • Select by putting each into with a probability of at least , independently, and then, add each edge in to .

The following observation establishes the number of vertices, edges, and the number of triangles in the graphs that can be sampled from . The proof uses large deviation inequalities (see Lemma B.1 and  B.4 in Appendix B), and is presented in Appendix A.1.

Observation 3.4 (Properties of the graph ).
  • For , the number of vertices in is . Also, holds with a probability of at least , and the number of edges in is with a probability of at least ;

  • If , then there are at most triangles in with a probability of at least ,

  • If , with a probability of at least , and there are at least many triangles in with a probability of at least .

The following remark is regarding the connection between graphs in and that in . This will be used later in our proof, particularly in the proof of Claim 3.12.

Remark 1 (A graph can be generated from a graph ).

Let us first generate a graph . Select by putting each into with a probability of at least , and then, add each edge in to to generate , then (the resulting graph) .

The following observation says that a (with some condition) forms an edge with a probability of at least a constant. It will be used while we prove Claim 3.11.

Observation 3.5 (Any vertex pair is an edge in with constant probability).

Let , and we are in the process of generating . Let at most one of and has been put into one of the parts out of and . Then is an edge in with probability at least .

The above observation follows from the description of – each vertex in is put into one of the parts out of uniformly at random, each vertex of is connected with every vertex in , and each vertex of is connected with every vertex in .

In order to prove Lemma 3.2, by contradiction, assume that there is a randomized algorithm that makes many EE queries and decides whether the number of triangles in the input graph is at most or at least , with a probability of at least . Then there exists a deterministic algorithm ALG that makes many EE queries and decides the following (when the input graph be such that both and holds with probability ) –

(Here and denote the probability of the event under the conditional space and , respectively.) Hence, we will be done with the proof of Lemma 3.2 by showing the following lemma.

Lemma 3.6 (Lower bound on the number of Ee queries when ).

Let the unknown graph be such that and hold with equal probabilities. Consider any deterministic algorithm ALG that has EE access to , and makes many EE queries to . Then

Next, we define an augmented EE oracle ( oracle). is tailor-made for the graphs coming from . Moreover, it is stronger than EE, that is, any EE query can be simulated by a query. We will prove the claimed lower bound in Lemma 3.6 when we have access to oracle. Note that this will imply Lemma 3.6.

Before getting into the formal description of oracle, note that the algorithm (with access) maintains a four tuple data structure initialized with . With each query to oracle, the oracle updates the data structure and returns the updated data structure to the algorithm. Note that the updated data structure is a function of all previously made queries, and it is enough to answer corresponding EE queries.

3.1.2 Augmented Edge Emptiness oracle ():

Before describing the query oracle and its interplay with the algorithm, first we present the data structure that the algorithm maintains with the help of oracle. The data structure keeps track of the following information.

Information maintained by :
  • is a subset of that have been seen by the algorithm till now, is the set of vertices present in any vertex pair in .

  • such that means the algorithm knows that is an edge in , means that is not an edge in .

  • , where

Intuitively speaking, unless the algorithm knows about the presence of some vertex in , it cannot distinguish whether the unknown graph or . So, we define the notion of good and bad vertices, along with good and bad data structures. This notion will be used later in our proof.

Definition 3.7 (Bad vertex).

A vertex is said to be a bad vertex if . is said to be good if there does not exist any bad vertex in .

oracle and its interplay with the algorithm:

The algorithm initializes the data structure with , . So, and are initialized with trivial functions with domain . At the beginning of each round, the algorithm queries the oracle with a subset deterministically. Note that the choice of depends on the current status of the data structure. Now, we explain how oracle responds to the query and how the data structure is updated.

  • If , the oracle sets , and changes accordingly. The oracle also sets the function and as per their definitions, and then it sends the updated data structure to the algorithm.

  • Otherwise (if ), the oracle finds a random subset such that The oracle checks if there is a pair such that is an edge. If yes, then the oracle responds as in (1) with being replaced by . If no, the oracle sends the data structure corresponding to the entire graph along with a Failure signal 555We later argue that Failure signal is sent with a very low probability..

Owing to the way oracle updates the data structure after each query, we can make some assumptions on the inputs to the oracle, as described in Remark 2. It will actually be useful when we prove Claim 3.11.

Remark 2 (Some assumptions on the query).

Let be the data structure just before the algorithm makes EE query with input , and let be the data structure updated by oracle after the algorithm makes query with input . Without loss of generality, we assume that

  • is disjoint from . It is because maintains whether is an edge or not for each ;

  • When , there does not exist and in . It is because the oracle updates the data structure in the same way in each of the following three cases when – (i) and are in , (ii) and , and (iii) and . By the description of oracle and its interplay with the algorithm, in all the three cases, the updated data structure contains labels of all the three vertices along with the information whether and form edges in or not. So, instead of having both and in with , it is equivalent to have exactly one among and in .

In the following observation, we formally show that oracle is stronger than that of EE. Then we prove Lemma 3.9 that says that queries are necessary to distinguish between and . Note that Lemma 3.9 will imply Lemma 3.6.

Observation 3.8 ( is stronger than Ee).

Let . Each EE query to can be simulated by using an query to .

Proof.

Let us consider an EE query with input . We make a query with the same input , and answer the EE query as follows depending on whether or .

:

The oracle updates the data structure and let be the updated data structure. It contains the the information about each whether it forms an edge in or not. So, from , the EE query with input can be answered as follows: there exists an edge with if and only if .

:

In this case, the oracle finds a random subset such that . It checks if there is an such that is an edge. If yes, the updated data structure contains the the information about each whether it forms an edge. In this case, we can report that there exists an such that is an edge in . If there is no such that is an edge, then (by the description of EE oracle and its interplay with the algorithm) the oracle sends the data structure corresponding to the entire graph. Obviously, we can report whether there exists an such that or not.

Hence, in any case, we can report the answer to EE query with input . ∎

We are left with proving the following technical lemma. As noted earlier, this will imply Lemma 3.6.

Lemma 3.9 (Lower bound on the number of queries when ).

Let the unknown graph be such that and hold with equal probabilities. Consider any deterministic algorithm that has access to , and makes many queries to . Then

3.1.3 Proof of Lemma 3.9

For clarity of explanation, we first describe

as a decision tree. Then we will prove Lemma 

3.9.

Decision tree view of :

  • Each internal node of is labeled with a nonempty subset and each leaf node is labeled with YES or NO;

  • Each edge in the tree is labeled with a data structure ;

  • The algorithm starts the execution from the root node by setting as the current node. Note that for the root node , and and are the trivial functions. As the algorithm is deterministic, the first query is same irrespective of the graph that we are querying. By making that query, we get an updated data structure from the oracle and let be the edge that is labeled with the updated data structure. Then sets as the current node.

  • If the current node is not a leaf node in , makes a query with a subset , where is determined by the label of the node . Note that satisfies the condition described in Remark 2. The oracle updates the knowledge structure and moves to a child of depending on the updated data structure;

  • If the current node is a leaf node in , report YES or NO according to the label of .

Now, we define the notion of good and bad nodes in . The following definition is inspired from Definition 3.7.

Definition 3.10 (Bad node in the decision tree).

Let be a node of and be the current data structure. is said to be good if there does not exist in such that . Otherwise, is a bad node.

If , then will never encounter a bad node. In other words, when reaches a bad node of the tree , then it can (easily) decide . However, the inverse in not true. From this fact, consider two claims (Claims