The graph matching problem
is a well-studied computational problem in a great many areas of computer science. Some examples include machine learning[CSS07]CL12]BBM05], computational biology [SXB08, VCP11], social network analysis [KL14], and de-anonimzation [NS09].111See the surveys [LR13, CFSV04], the latter of which is titled “Thirty Years of Graph Matching in Pattern Recognition”. The graph matching problem is the task of computing, given a pair of vertex graphs, the permutation
where we identify the graphs with their adjacency matrices, and write for the matrix obtained by permuting the rows and columns according to (i.e., the matrix where is the permutation matrix corresponding to ).
1.1 The Correlated Erdös-Rényi model
The graph matching problem can be thought of as a noisy (and hence harder) variant of the graph isomorphism problem. In fact, the graph matching problem is NP hard in the worst case.222If we allow weights and self-loops it is equivalent to the quadratic assignment problem [Law63, BÇPP99]. O’Donnell et al. also show that graph matching is hard to approximate assuming Feige’s Random 3SAT hypothesis [OWWZ14]. Hence, much of the existing work is focused on practical heuristics or specific generative models. In a 2011 paper, Pedarsani et al. [PG11] introduced the correlated Erdös-Rényi model as a case study for a de-anonymization task. This is the model in which the pair is generated as follows:333Some works also studied a more general variant where and use different subsampling parameters .
We sample a “base graph” from the Erdös-Rényi distribution .
We sample at random in (the set of permutations on the elements ).
We let be a randomly subsampled subgraph of obtained by including every edge of in
We let be an independently subsampled subgraph of obtained by including every edge of in with probability independently.
Given , our goal is to recover . Though initially introduced as a toy model for a specific application, the problem of recovering in is a natural and well-motivated statistical inference problem, and it has since received a lot of attention in the information theory and statistics communities (c.f., [YG13, KL14, LFP14, KHG15, CK16, CK17, MX18]).
Below we will use (or
for short, when the parameters are clear from the context) to denote the “structured” joint distribution above on triplesof pairs of graphs and a permutation such that is a noisy version of . One can see that the graphs and are individually distributed according to the Erdös-Rényi distribution , but there is significant correlation between and . It can be shown that as long as , the permutation will be the one that minimizes the right-hand side of (1), and hence it is possible to recover information theoretically. Indeed, Cullina and Kivayash [CK16, CK17] precisely characterized the parameters for which information theoretic recovery is possible. Specifically, they showed recovery is possible if and impossible when .
However, none of these works have given efficient algorithms. Yartseva and Grossglauser [YG13] analyzed a simple algorithm known as Percolation Graph Matching (PGM), which was used successfully by Narayanan and Shmatikov [NS09] to de-anonymize many real-world networks. (Similar algorithms were also analyzed by [KL14, KHG15, LFP14].) This algorithm starts with a ”seed set” of vertices in that are mapped by to , and for which the mapping is given. It propagates this information according to a simple percolation, until it recovers the original permutation. Yartseva and Grossglauser gave precise characterization of the size of the seed set required as a function of and [YG13]. Specifically, in the case that and (where the expected degree of and is ), the size of the seed set required is . In the general setting when one is not given such a seed set, we would require about steps to obtain it by brute force, which yields an time algorithm in this regime. Lyzinski et al. [LFF16] also gave negative results for popular convex relaxations for graph matching on random correlated graphs.
Subsequent to our work, Mossel and Xu [MX18] obtained new algorithms for the seeded setting based on a delicate analysis of local neighborhoods. Notably, they achieve recovery at the information-theoretic threshold. Though the exact dependence of the seed set size on the average degree is complicated to state, roughly speaking whenever for , their seed set has size , giving quasi-polynomial time algorithms. However, when , the seed set size is with the constant in the exponent depending on the difference of and .
|Cullina & Kivayash||info-theoretic|
|Yartseva & Grossglauser||percolation|
|Mossel & Xu||seeded local||if ,|
1.2 Our results
In this work we give quasipolynomial time algorithms for recovering the hidden permutation in the model for every constant (and even slightly sub-constant) and a wide range of .
Theorem 1.1 (Recovery).
For every and , if or , then there is a quasipolynomial-time randomized algorithm such that with high probability over and over the choices of , .
One can see that we obtain (nearly) efficient recovery even for sub-polynomial degrees. As discussed in Section 3, our results are more general and handle (slightly) sub-constant noise . See Theorem 3.2 for a precise statement of the parameters (including the in the minimum sparsity ). To the best of our knowledge, the best previously known algorithms for any required subexponential (i.e., ) time.
At first, the requirement that the average degree be in a union of two disjoint intervals may seem strange. Indeed, modulo a combinatorial conjecture, our algorithm works for all values of . In order to give this conjecture, we need the following definition; for the sake of exposition, we have pared it down. For the full requirements, see Theorem 4.1.
Definition 1.2 (simplified version).
Let be positive integers. We say that is a -test family if is a set of -vertex -edge graphs, such that each has no non-trivial automorphisms, every strict subgraph of has edge density , and further for pairs of distinct , no shared subgraph of has density larger than . Finally, we also require .444 Notice that there are only graphs on vertices and edges, so the size requirement on is quite stringent.
For all sufficiently large integers , for every integer such that , there exists a -test family.
A proof of this conjecture would immediately extend Theorem 1.1 to every . In fact, our proof of Theorem 1.1 proceeds by establishing this conjecture for and . We find it difficult to believe that the existence of a -test family would be discontinuous in as a function of ; however our techniques for the two regimes are different, and while we did not make a special effort to optimize the constants or , it seems that completely filling in the gap requires some delicate and technical combinatorial arguments.
We also consider the potentially easier “hypothesis testing” task of distinguishing a pair of graphs sampled from from a pair that is drawn from the “null distribution” of two independent samples from . For this problem we give a polynomial time algorithm for a range of values of .
Theorem 1.4 (Distinguishing).
For arbitrarily small and for every , if or , then there is a pseudo-polynomial time555 The algorithm is pseudo-polynomial because it depends on the bit complexity of . deterministic algorithm that distinguishes with probability at least666We can amplify this to probability , but this incurs a dependence on in the exponent of the runtime. between the case that are sampled from and the case that they are sampled from .
See Theorem 2.2 for the full settings of parameters that we achieve for distinguishing.
1.3 Approach and techniques
In this section we illustrate our approach and techniques. For the sake of simplicity and concreteness we first focus on the following task. Given a pair of graphs , distinguish between the following two cases for (i.e., graphs of average degree ):
- Null case:
are drawn from the distribution of two independent graphs from the Erdös-Rényi distribution .
- Planted/structured case:
are drawn from the distribution . That is, we sample from and a random permutation , and both and are independently subsampled subgraphs of where each edge is kept with probability . The labels of the vertices of are additionally permuted according to .
Before we present our approach to solve this problem, we explain some of the challenges. In the case the graphs are completely unrelated, and there is no permutation of the vertices so that and overlap on more than a fraction of the edges, while in the case they are “roughly isomorphic”, in the sense that there is a permutation that will make them agree on about a quarter of their edges. Since random graphs are in fact an easy instance of the graph isomorphism problem, we could perhaps hope that known graph isomorphism algorithms will actually succeed in this “random noisy” case as well. Alas, it turns out not to be the case.
We now present some rough intuition why common graph isomorphism heuristics fail in our setting. (If you are not interested in seeing why some algorithms fail but rather only why our approach succeeds, feel free to skip ahead to Section 1.3.1.) Let’s start with one of the simplest possible heuristics for graph isomorphism: sort the vertices of and according to their degrees and then match them to each other. If and are isomorphic via some permutation then it will of course be the case that the degree of every vertex of is equal to the degree of in . Generally, even in a random graph, this heuristic will not completely recover the isomorphism since we will have many ties: vertices with identical degrees. Nevertheless, this approach would map many vertices correctly, and in particular the highest degree vertex in a random graph is likely to be unique and so be mapped correctly.
However, in the noisy setting, even the highest degree vertex is unlikely to be the same in both graphs. The reason is that in a random graph of average degree
, the degrees of all the vertices are roughly distributed as independent Poisson random variable with expectation. The vertex with highest degree in is likely to have degree which is standard deviations higher than the mean. But since the graphs are only -correlated, the corresponding matched vertex is likely to have degree which is only higher than the mean. It can be calculated that this means that is extremely unlikely to be the highest degree vertex of . In fact, we expect that about vertices will have degree larger than s.
In the context of graph isomorphism algorithms, we often go beyond the degree to look at the degree profile of a vertex , which is the set of degrees of all the neighbors of . In the case that and are isomorphic via , the degree profiles of and are identical. However, in the noisy case when and are only -correlated, the degree profiles of and are quite far apart. About a quarter of the neighbors of and will be matched, but for them the degrees are only roughly correlated, rather than equal. Moreover the other three quarters of neighbors will not be matched, and for them the degrees in both graphs will just be independent Poisson variables.
Another common heuristic for graph isomorphism is to match and
by taking their top eigenvectors and sorting them (breaking ties using lower order eigenvectors). Once again this will fail in our case, because even if the permutation was the identity, the top eigenvector ofis likely to be very different from the top eigenvector of . This is for similar reasons as before: the top eigenvector of
is the vectorsuch that the quantity is standard deviations higher than the mean for some particular value . However, it is likely that the will only be or so standard deviations higher than the mean, and hence will not be the top eigenvector of .
One could also imagine using a different heuristic, such as cycle counts, to distinguish—in Section 2, we discuss the shortcomings of such “simple” heuristics in detail.
1.3.1 The “black swan” approach
Now that we have appreciated the failure of the canonical graph isomorphism algorithms, we describe our approach. Our approach can be thought of as “using a flock of black swans”. Specifically, suppose that is an -sized subgraph that is a “black swan,” in the sense that it has extremely low probability of appearing as a subgraph of a random graph drawn from .777For technical reasons, for the distinguishing section we actually take and only use for recovery, but we discuss here the case for intuition. Another way to say it is that where is the subgraph count of in , or the number of subgraphs of isomorphic to .888More formally, is the number of injective homomorphisms of to , divided by the number of automorphisms of . That is, if has vertex set and has vertex set , then .
If we are lucky and appears in both and , then we can conclude that it is most likely that the vertices of in are mapped to the vertices of in , since the probability that both copies appear by chance is . This is much smaller than , which is the probability that appears in and those edges were not dropped in . If we are further lucky (or chose our swan carefully) so that has no non-trivial automorphism, then it turns out that we can in such a case deduce the precise permutation of the vertices of in to the vertices of in .
The above does not seem helpful in designing an algorithm to recover the permutation, or even to distinguish between and since by its nature as a “black swan”, most of the times will not appear as a subgraph of , and hence we would not be able to use it. Our approach is to use a flock of such swans, which are a set of graphs such that the probability of every individual graph occurring as a subgraph is very small, but the probability of some graph occurring is very high. We carefully designate properties of the family (which we call the test graph family) so that when the events that and occur as subgraphs are roughly independent.
Already, this allows us to use the common occurrences of graphs in this family to deduce whether came from the null distribution (in which case such occurrences will be rare) or whether they came from the structured distribution (in which case they will be more frequent). In particular, if we define , and the polynomial
then the value of will be noticeably higher when are drawn from than when they are drawn from . This will result in an efficient algorithm to distinguish the two distributions. We also use the “swans” for recovery, as we will discuss below.
1.3.2 Constructing the flock of black swans
It turns out that demonstrating the existence999We note that since the graphs in the family will be of size , and counting the number of occurrences of a graph on vertices takes time , once we fix we can perform brute-force enumeration over all graphs on vertices with negligible effect on the asymptotic runtime. For this reason, demonstrating the existence of a family is enough. (The construction need not be algorithmic, though ours is). of a family of “swans,” or “test graphs,” is a delicate task, as we need to satisfy several properties that are in opposition to one another. On one hand, we want the family to be large, so that we can compensate for the fact that each member is a “black swan” and appears with very small probability. On the other hand, we need each member of the family to have a small number of edges. Suppose that one of our swans, , has edges. If it appears in the base graph , then it only survives in both and with probability . That is, the correlation between and decays exponentially in the number of edges of , and if is too large we cannot expect it to help us recover. As a side effect, keeping small helps with the computational efficiency of the task of finding these occurrences. A third constraint is that we need the events that each member of the family occurs to be roughly independent. It is very easy to come up with a large family of graphs for which the events of them co-occurring together are highly correlated, but such a family would not be useful for our algorithm. Ensuring this independence amounts to obtaining control over the edge density of the common subgraphs that appear in pairs of distinct test graphs. Luckily, we are able to demonstrate the existence of families of graphs achieving the desired properties, though this require some care.
The above discussion applies to the distinguishing problem of telling and apart. However, if we can ensure that the joint occurrences of our family of test graphs cover all the vertices of and , then we can actually recover the permutation. This underlies our recovery algorithm. To simultaneously ensure these conditions we need to make the number of vertices of each logarithmic rather than constant, which results in a quasipolynomial time algorithm.
Properties of the test graphs.
We now describe more precisely (though still not in full formality, see Section 4) the properties that our family of “black swans” or test graphs needs to satisfy so the above algorithm will succeed:
- Low likelihood of appearing.
Each graph in our test family will have vertices and edges. To ensure that it is indeed a “black swan”, we require that for slightly subconstant. In particular, in the regime this will require . In fact, we will set to be almost exactly . Note that when, say, these are graphs with less than, say edges, so that the average degree is close to .
- Strict balance.
This condition is well-known in the random graphs literature, and it ensures that the random variable is well behaved. It states that for every in the family , every induced subgraph of with vertices and edges has strictly smaller edge density, . We will actually require a strengthened, quantitative notion of strict balance, in which the density of is related to its size.
- Intersection balance.
To ensure that for every pair of distinct graphs in our family the random variables and will be asymptotically independent when , we will need to have even tighter control over the density of their common subgraphs. We will require that for every two such graphs, any subgraph of their intersection satisfies the stronger condition that for some sufficiently large .
- No non-trivial automorphism.
To ensure we can recover the permutation correctly from an occurrence of in and an occurrence in , we require that every in has no non-trivial automorphism.
Finally to ensure that there actually will be many subgraphs from this family in our graph, we will require that . (For distinguishing, it will suffice that .)
We conjecture that a family achieving these properties can be obtained with any density (see Conjecture 1.3
). However, at the moment we only demonstrate the existence of such families of graphs with certain densities, which is why our algorithms do not work for all ranges of.101010More accurately, we do have conjectured constructions that demonstrate the existence of such graphs for all densities, but have not yet been able to analyze them in all regimes.
We now illustrate one such construction. First and foremost, it can be shown that for integer , random -regular graphs satisfy the requirements of strict balance and trivial automorphism group. Further, a sufficiently large fraction of the set of -regular random graphs will satisfy the intersection balance property. So for graphs with where , we easily have such a family.
However, the above construction does not give us graphs of all densities, and in particular does not allow us to handle the most interesting regime of sparse Erdös-Rényi correlated graphs (e.g., or so) which requires test graph of density roughly for some small and in particular a non integer average degree of roughly . Here is one example for such a construction when is for some large integer . We start with a random -regular graph on vertices (and hence edges). We then subdivide every edge by inserting intermediate vertices into it, and so turning it into a path of length . The resulting graph will have edges and vertices, and one can verify that . Moreover, it can be shown that the densest subgraphs of will “respect” the underlying structure, in the sense that for every original edge of , a subgraph of maximizing the density will either include all the corresponding path or none of it. Using this observation, and the expansion properties of random graphs, it is possible to show that strict balance condition and even the intersection balance condition hold. Moreover, we can also use known properties of random graphs to rule out non-trivial automorphism. Finally, since the number of 3-regular graphs on vertices is , for we get a super exponential (i.e., ) number of graphs, which will allow us to get a sufficiently large family. We will furthermore need to make the notion of strict balance quantitative. For this and the remaining details, see Section 4, where we also give our constructions for other values of .
1.4 Related work
As mentioned above, there is an extremely large body of literature on the graph matching problem. We discussed above the works on correlated Erdös-Rényi graphs, but people have also studied other generative models such as power law graphs and others (e.g., see [JLG15]).
On a technical level, our work is inspired by recent works on sum-of-squares, and using low degree polynomials for inference problems [HKP17]. In particular, our starting point is a low degree distinguisher from the planted and structured distributions. However, there are some differences with the prior works. These works typically searched for objects such as cuts, vectors, or assignments that are less structured than searching for permutations. Moreover, unlike prior works where the polynomial distinguishers used fairly simple polynomials (such as counting edges, triangles, cycles, etc..), we need to use subgraphs with more complex structure. This is related to the fact that despite this inspiration, our algorithm at the moment is not a sum-of-squares algorithm. It remains an open problem whether the natural sum-of-squares relaxation for (1) captures our algorithm.
In our analysis we draw on the vast literature on analyzing the distribution of subgraph counts (e.g., see [JLR11]). Our setting is however somewhat different as we need to construct a family of graphs with related but not identical properties to those studied in prior works; in particular, some differences arise because of the fact that the graphs are correlated, and the fact that we work in the regime where the graphs have size growing with and appear times in expectation.
In Section 2 we give our distinguishing algorithm between and (Theorem 1.4). Then in Section 3 we build on this algorithm to obtain a recovery algorithm that recovers the “ground truth” permutation from drawn from . This algorithm builds upon and extends the techniques of our distinguishing algorithm. Both the recovery algorithm and distinguishing algorithms use as a “black box” the existence of families of “test graphs” (the black swans) that satisfy certain properties. In Section 4 we show how to construct such test families.
For a graph and a subset of the vertices , we use to denote the vertex-induced subgraph on and we use to denote the set of edges with both endpoints in . We use
to denote the variance of the random variable. For an event , is the 0-1 indicator that occurs. For a two graphs , we use to indicate that and are isomorphic, and to indicate that contains as an edge-induced subgraph. We use to denote the falling factorial, . We will also use standard big- notation, and we will use to denote that .
2 Distinguishing the null and structured distributions
In this section, we give an algorithm for the following distinguishing problem:
Problem 2.1 (Distinguishing).
We are given two -vertex graphs , sampled equally likely from one of the following distributions:
The null distribution, : and are sampled independently from .
The structured distribution, : First, a graph is sampled. Then, we independently sample from by subsampling every edge with probability . Finally, we set to be a copy of in which the vertex labels have been permuted according to a uniformly random permutation .
Our goal is to decide with probability whether were sampled from or .
This section will be devoted to a proof of the following theorem, which is a generalization and directly implies Theorem 1.4:
Theorem 2.2 (Distinguishing algorithm, restatement).
For arbitrarily small , if or and if for constant , there is a time algorithm that distinguishes with probability at least111111We can amplify this to probability by incurring extra runtime, gaining a dependence on in the exponent of . between the case that are sampled from and the case that they are sampled from .
In particular, if then the algorithm runs in polynomial time.
Recall that for graphs , we define the subgraph count to be the number of subgraphs of isomorphic to . Since sampled from are correlated, if a subgraph appears in then it is more likely to also appear in , and the subgraph counts are correlated. The following lemma uses this approach to give a certain “test”: a polynomial that has zero mean when is chosen from the null distribution, but positive mean when they are chosen from the structured distribution. This test will not be good enough, since even in the structured case, it will be unlikely that the polynomial takes a non-zero value, but will serve as our starting point.
Let be a graph with vertices and edges, define the subgraph count-deviation correlation polynomial
where are two vertex graphs and the expectation is taken over from the Erdös-Rényi distribution . Then in the structured distribution,
where is the subgraph of which minimizes .
Note that under the null distribution, are independent. Therefore
On the other hand, in the structured distribution, and are correlated. That is,
For an ordered subset of vertices of size , we define to be the indicator that contains as a labeled subgraph (at times we will drop the parameter for the sake of conciseness). Expanding and into sums of such indicators, we have
where we use to denote all ordered subsets of vertices of . The is due to the fact that the sum is over ordered subset of and of size and thus it counts the number of labeled ordered copies of in as well as . To avoid over-counting, we divide the number of automorphisms of and get the factor.
We recall that originally, we identified and both with the set . For each summand, the value of the expectation is determined by the number of edges shared between the realization of on and the realization of on , where is the random permutation we applied to the vertices of . Without loss of generality, suppose that was the identity permutation (for notational convenience). Then let be the number of edges in the intersection of as realized on and when both are identified with . Then letting be the number of automorphisms of , we have
We can more elegantly express this quantity as a sum over all subgraphs , upon which the copy of on and the copy of on may intersect. So we may re-group the sum according to these unlabeled edge-induced subgraphs that give the intersection. Further, for each we can define the number to be the number of ways one can obtain a graph by taking two ordered, labeled copies of and intersecting them on a subgraph isomorphic to .
Specifically, to have such graphs with as an intersection, one must (a) choose a copy of in for the copy, (b) choose a copy of in for the copy, (c) choose an automorphism between the copies. Thus, for each subgraph , we have .
Now, let us move from the summation over ordered subsets to the summation over unlabeled edge-induced subgraphs , we have
as there are ways of intersecting two copies of on the subgraph , and for each such type of intersection there are choices of vertices for .
To finish off the proof, we observe that by following an identical sequence of manipulations we can re-write the squared expectation in the same manner,
The difference is of course that because the expectations were taken separately, the intersection has no effect on the exponent of . This allows us to re-write Eq. 3:
where we have used that if , the terms cancel.
To obtain the final conclusion, we use that is bounded away from , and that and are independent of . ∎
The need for test sets.
Lemma 2.3 guarantees that the count-deviation polynomial has larger expected value under than . However, this fact alone does not prove that is a distinguisher. To illustrate this point, let us for simplicity suppose that we are in the regime where for constant. In this case, Lemma 2.3 gives us that
up to lower-order terms (assuming that has no subgraph with ). On the other hand, a simple calculation gives an optimistic bound on the standard deviation of under of
So the standard deviation in the null case is too large for us to reliably detect the offset expectation.
Our solution is to identify a “test set” of graphs
, such that the estimatorsfor are close to independent.121212Here we mean in the sense that the variance of their average is asymptotically equal to an average of independent estimators. If we had trials, intuitively we expect the standard deviation to decrease by a factor of . So long as we satisfy
the variance in the null case may be sufficiently small that we reliably distinguish.
In order to translate this cartoon sketch into a reality, we will require some additional properties of our test set which will be crucial in controlling the variance.
2.1 Test subgraphs for distinguishing
Given a graph with edges and vertices, in expectation there are copies of in . Because of our prevailing intuition that random graphs are well-behaved, we might naively expect that the number of copies of is concentrated around its mean; however some simple examples demonstrate that this is not always the case.
Example: the need for balance.
Consider for example the graph given by appending a “hair”, or a path of length , to a clique of size (see Fig. 3). has vertices, edges, and automorphisms, so in with , we have
So in expectation, appears in times.
However, if we restrict our attention to the clique which is a subgraph of , in expectation
So while the expected number of copies of is constant, is not expected to appear even once! The large expected count of is due to a small-probability event; if appears (with polynomially small probability), then we will see many copies of .
This issue is well known in the study of random graphs (it is among the first topics in the textbook of Janson et al. [JLR11]). From that well-developed literature, we borrow the following concept:
Definition 2.4 (Balanced graph).
The density of a graph is the ratio of its edges to the vertices.
A graph with edge density is called balanced if it has no strict subgraphs of density larger than . If all strict subgraphs of have density strictly smaller than , then is called strictly balanced.
If a graph is expected to appear at least once, then the balancedness of a graph is what determines whether the number of copies of in is well-concentrated around its mean. For example, Lemma 2.3 already allows us the following observation:
If is a graph of fixed size such that , then if is not balanced, .
This follows from applying Lemma 2.3 with , in which case , and taking to be the densest subgraph of , which must have .
To ensure asymptotic independence, we will require that each of the graphs n our test set be strictly balanced.
Theorem (see Theorem 4.1).
Let be a rational number, so that or . Let be a sufficiently large even integer. There exists a test set of graphs , each on vertices and containing edges, which satisfies the following properties:
Every is strictly balanced.
Every has no non-trivial automorphisms.
, for a constant independent of .
2.2 Concentration for distinguisher
We are now ready to prove that there is a poly-time computable distinguishing polynomial with bounded variance. The existing results on concentration of subgraph counts is not sufficient for us here, because we are interested in the setting of correlated graphs. We will bound the variance directly.
Suppose that for with or . Then there exists a polynomial such that , and
Further, is a sum of subgraph count-deviation correlation polynomials for subgraphs of size , and is computable in time , where the in the exponent of hides only a dependence on the size of the representation of as a ratio of two integers. When , the algorithm runs in polynomial time.
Proof of Theorem 2.6.
Choose to be a sufficiently large even integer such that , where is the constant from Theorem 4.1 so that . Let be the test set of subgraphs guaranteed by Theorem 4.1 with vertices and edges, so that . By this setting of parameters we have then that .
Define the polynomial to be the average of over ,
where we have also used that for every .
We define the following quantity, which will show up repeatedly in our variance bounds:
The bounds from Lemma 2.3 gives us the expectation of under and . Using the balancedness properties of our test set, we will bound the variance of under and .
Variance bound for .
Because and are independent and identically distributed, we have
Now, we will re-write the expression within the square as we did in the proof of Lemma 2.3, when bounding the expectation of under . We will sum over graphs which are subgraphs of both and . For each such , let and let . Then, letting be the graph given by taking the union of and and identifying the vertices and edges on the subgraph , the first term in Eq. 9 can be re-written as follows.