1 Introduction
In this paper we study property testing of graphs in the bounded degree model. The input is a graph on vertices where all the vertices have degree at most . Given a graph property , we say that is far from satisfying if edges need to be added or removed from to satisfy . A property testing algorithm for is an algorithm that accepts every graph satisfying with probability at least and rejects every graph that is far from satisfying with probability at least .
is represented as an oracle that returns the th neighbor of any vertex for any . If is larger than the degree of , a special symbol is returned. The goal of property testing is to find an algorithm with an efficient query complexity, defined as the number of oracle queries that the algorithm performs. This framework of property testing of graphs was developed by Goldreich and Ron [6] and has been applied to study various properties such as bipartiteness [5] and 3colorability [6]. See [6] and [10] for more examples.
Our paper deals with a generalization of property testing. We are interested in testing for a family of properties that depends on a single parameter and is nested, satisfying for all . Our goal is an algorithm which accepts graphs satisfying with probability at least and rejects graphs that are far from satisfying with probability at least where . A diagram for this generalization of property testing is shown in Figure 1.
We are interested in the property of clusterability as defined by Czumaj, Peng, and Sohler in [3]. Roughly speaking, a graph is clusterable if it can be partitioned into at most clusters where vertices in the same cluster are “wellconnected.” The connectedness of the clusters is measured in terms of their inner conductance, defined below. The idea of using conductance for graph clustering has been studied in numerous works, such as [11].
Testing for clusterability is inspired by expansion testing which has been studied extensively. A graph is called an expander if every of size at most has neighborhood of size at least . Czumaj and Sohler [4] showed that an algorithm proposed by Goldreich and Ron in [7] can distinguish between expanders and graphs which are far from having expansion at least in the bounded degree model. This work was subsequently improved by Kale and Seshadhri [8] and then by Nachmias and Shapira [9] who showed that the same algorithm distinguishes graphs which are expanders from graphs which are far from
expanders. The work of Nachmias and Shapira also shows that expansion testing is related to the second eigenvalue of the Laplacian matrix. In addition, as shown by
[2], testing for clusterability is related to the st eigenvalue of the Laplacian so clusterability testing is a natural extension of expansion testing.We now define conductance which is closely related to expansion. Let such that . The conductance of is defined to be where is the number of edges between and . The conductance of is defined to be the minimum conductance over all subsets and is denoted . Now for any , let denote the induced subgraph of on the vertex set defined by . We let denote the conductance of this subgraph. To avoid confusion we call the inner conductance.
We say is clusterable if there exits a partition of into at most subsets such that for all . This definition is slightly different from the one used by Czumaj, Peng, and Sohler in [3] because their definition also requires to be bounded by for all . The algorithm of Czumaj, Peng, and Sohler accepts all clusterable graphs with probability at least and rejects all graphs that are far from clusterable where and where depends only on [3]. Our work improves upon this result by removing the dependency for the case of .
Our main result is an algorithm in the bounded degree model that accepts every clusterable graph with probability at least and rejects every graph that is far from clusterable with probability at least if where is a parameter that we can choose which affects the query complexity. Our algorithm has query complexity where denotes a polynomial in , and .
The work of Czumaj et al. for testing clusterability uses property testing of distributions, such as testing the norm of a discrete distribution and testing the closeness of two discrete distributions. For some work on testing the norm of a discrete distribution and testing closeness of discrete distributions, see [4] and [1] respectively.
Our work was concurrent with the work of Chiplunkar et al. [2] who give an algorithm that for any fixed accepts clusterable from graph with probability at least and rejects every graph that is far from clusterable with probability at least using queries. This matches the query bound achieved by Nachmias and Shapira in the expander testing setting (). The algorithm of Chiplunkar et al. also looks at the st largest eigenvalue of a transformation of the lazy random walk matrix , and accepts if this eigenvalue is below a certain threshold. We essentially employ the same approach in the case of . Our proof of correctness is rather different, relying on the geometric properties of the endpoint distributions of random walks on the input graph to deduce the size of the eigenvalues of .
We present our algorithm, ClusterTest, in Section 1.1. We prove that ClusterTest accepts clusterable graphs in Section 3.1 and that it rejects graphs that are far from clusterable in Section 3.2.
1.1 Definitions
Definition 1.1 (Graph clusterability).
is clusterable if the conductance of is at least or can be partitioned into two subsets and such that the inner conductance of is at least for each
The motivating idea in designing ClusterTest is to compute the rank of where is the lazy random walk matrix and is the matrix of all ’s. Essentially, we show when is clusterable, is “close” to a rank matrix while when is far from clusterable, is not “close” to rank . The intuition for this comes from Lemma 3.3 which tells us that the third eigenvalue of is small if is clusterable.
Because computing the eigenvalues of is too expensive, we instead look at the eigenvalues of by principal submatrices of
. These principal submatrices are the Gram matrices of the endpoint distribution vectors of random walks on
minus . This allows us to show that if is clusterable then we can expect all of these submatrices to have at least one small eigenvalues while if is far from clusterable, both of the eigenvalues of most of these principal submatrices are large. This is essentially what our algorithm tests for.Before we present our algorithm we introduce some standard definitions and tools that we use. Given a graph with maximum degree , we work with the lazy random walk matrix defined as follows: the off diagonal entries of are times the corresponding entry in the adjacency matrix while the diagonal entries of are set so that the columns of add to which corresponds to adding self loops of the appropriate weights in . We then define the Laplacian matrix as . Our definition of the Laplacian follows the convention used in [3] so that we can easily use some of their results.
Let denote the eigenvalues of and let denote the corresponding orthonormal eigenvectors. Let denote the eigenvalues of where for . For , we define
to be the probability distribution of the endpoint of a length
lazy random walk that starts at vertex . That is,(1) 
where is the vector with in the entry corresponding to the vertex and elsewhere. Because , we typically work with for convenience. From Eq. (1), we have
(2) 
1.2 Preliminary Results
In this paper always denotes the norm unless stated otherwise. We need the following classical result from [12] which roughly states that eigenvalues are stable under small perturbations.
Proposition 1.2 (Weyl’s Inequality).
Let and suppose has eigenvalues and has eigenvalues . Furthermore, suppose where denotes the Frobenius norm. Then for all .
Our work relies on estimating dot products and norms of various distributions where we view distributions over
elements as vectors in . To estimate these quantitites, we use the following result about distribution property testing.Theorem 1.3 (Theorem 1.2 in [1]).
Let be two distributions with . There is an algorithm which computes an estimate of that is accurate to within additive error with probability at least and requires samples from each of the distributions and for some absolute constant .
Theorem 1.4 (Lemma 3.2 in [3]).
Let be a distribution over elements. There is an algorithm that accepts if and rejects if with probability at least and requires samples from . A condition on the input is that it must be at least .
2 Algorithm
We now describe our algorithm ClusterTest. Our algorithm performs multiple lazy random walks on the input graph and uses the distribution testing results from Theorems 1.3 and 1.4 to approximate a principal submatrix of . As shown in Sections 3.1 and 3.2, our choice of in Theorem 2.1 is large enough so that a random walk of length mixes well in the case that is clusterable and small enough so that the random walk does not mix well if is far from clusterable.
We use the notation for the by submatrix of with rows and columns indexed by the vertices and . Noting that , we can write
Note that we assume depends on the parameter that is inputted into
We now present our main theorem about the guarantees of ClusterTest.
Theorem 2.1 (Main Theorem).
Let be an vertex graph with maximum degree at most . For any we set
where and the constants and are defined in Theorem 1.3 and Lemmas 3.3, 3.5, and 3.10 respectively. Then,

ClusterTest with the parameters defined above accepts every clusterable graph with probability at least .

ClusterTest with the parameters defined above rejects every graph that is far from clusterable for any with probability at least .
Furthermore, the query complexity of ClusterTest is .
3 Proof of Main Theorem
3.1 Completeness: accepting clusterable graphs
In this section we show that ClusterTest with the parameters defined in Theorem 2.1 accepts with probability greater than if is clusterable. We first introduce the main geometric property of our paper.
Definition 3.1.
Vectors a and b are close to collinear if they can be moved distance at most to lie on a line through the origin. Vectors a and b are far from collinear if they are not close to collinear. See Figure 4 for reference.
Let be a clusterable graph. We show that ClusterTest accepts with probability at least using the following argument.
Lemma 3.2.
If and are close to collinear then the smallest eigenvalue of is less than . Conversely, if and are far from collinear then both the eigenvalues of are larger than .
Proof.
Write , the matrix with columns and . Then . Because is positive semidefinite, we can also write
where are orthonormal and . Suppose and are close to collinear. An equivalent formulation of Definition 3.1 is that there exists such that and and lie on a line through the origin. This implies that the matrix with columns is such that is rank . Therefore, is also a rank matrix so it has a zero eigenvalue. Because
we know by Weyl’s inequality that has an eigenvalue less than
where denotes the Frobenius norm. Because , we can easily compute that
Therefore,
for which proves the first part of our lemma.
For the second part, we prove the contrapositive. Suppose that . We wish to show that and are close to collinear. Define
Let and denote the columns of as . Then is rank and . By the orthogonality of and , we have that . Therefore, the matrix satisfies , implying that for any vector the equation is satisfied. We note that
is also a rank matrix and we define the columns of to be . Because preserves lengths, . Thus are close to collinear, as desired. ∎
We proceed to show that if the input graph is clusterable then for any pair of vertices , and are close to collinear. To show this, we need the following lemma from [3] which relates the property of clusterable to the eigenvalues of the Laplacian matrix.
Lemma 3.3 (Lemma in [3]).
There exists a constant depending on such that for a clusterable graph of maximum degree at most , for where is the th smallest eigenvalue of the Laplacian matrix of .
We note here that there is a short proof that has at most one large eigenvalue if is clusterable. Lemma 3.3 states that has at most one large eigenvalue, hence also has at most one large eigenvalue. Then the Cauchy interlacing theorem implies that all the minors also have at most one large eigenvalue. However, we present this longer proof that uses the definition of close to highlight the similarities between the proofs of the soundness and completeness case.
We now show that given Lemma 3.3, it follows that is close to the line spanned by .
Lemma 3.4.
Let be clusterable. Then for any pair of vertices , and are close to collinear where is a constant defined in Lemma 3.3.
Proof.
Recall that where is the probability distribution of the endpoint of a length lazy random walk starting at vertex . Writing in the eigenbasis of gives us
Therefore,
It follows that for any vertices and , and are close to collinear. ∎
Lemmas 3.2 and 3.4 together guarantee that both of the eigenvalues of cannot be large. We now want to show that this also holds when the ClusterTest approximates . We need the following lemma which tells us that accepts with high probability in step of ClusterTest. This lemma is just a technicality that we need for the query complexity of Theorem 1.3.
Lemma 3.5 (Lemma in [3]).
Let . There exists a constant such that for a clusterable graph, there exists with such that for any and any , the following holds:
We now prove that ClusterTest with the parameters defined in Theorem 2.1 passes the completeness case.
Lemma 3.6.
ClusterTest with the parameters defined in Theorem 2.1 accepts clusterable graphs with probability greater than .
Proof.
Let be a clusterable graph. We analyze one round of ClusterTest and calculate the rejection probability of one round. Note that ClusterTest samples a pair of vertices and uniformly at random from at each round. There are three ways one round can reject :

One of the vertices or in the complement of in Lemma 3.5.

rejects or in step of ClusterTest.

Both of the eigenvalues of are larger than .
Setting in Lemma 3.5, we see that both and lie inside in Lemma 3.5 with probability at least . Therefore, the rejection probability of case is at most .
If as defined in Lemma 3.5, then . Given this along with the fact that , we have that accepts with probability at least Therefore, the rejection probability of case is also at most .
By Lemma 3.4, and are close to collinear. Recall that in Theorem 2.1. Therefore by Lemma 3.2, has at least one eigenvalue smaller than
The matrix that ClusterTest computes can be written as where each entry of the by matrix is at most with probability due to . Therefore, with probability . If this holds, then by Weyl’s inequality, has an eigenvalue at most
Therefore, the rejection probability of case is at most .
Adding up the rejection probabilities of each of the three cases tells us that one round rejects with probability at most . Thus the total probability that we reject in one of the rounds is at most , as desired. The query complexity is . ∎
3.2 Soundness: rejecting graphs far from clusterable
In this section we show that ClusterTest rejects with probability greater than if is far from clusterable for . We introduce two properties that expand on the property of close to collinear.
Definition 3.7.
Vectors a and b are close to antipodal if they can be moved distance at most to lie on a line through the origin where the origin lies between the two moved points. Vectors a and b are far from antipodal if they are not close to antipodal.
Definition 3.8.
Vectors a and b are close to podal if they can be moved distance at most to lie on a line through the origin where the origin does not lie between the two moved points. Vectors a and b are far from podal if they are not close to podal.
See Figure 4 for reference. Note that vectors a and b are far from collinear if and only if they are far from both antipodal and podal.
(a): Vectors a and b are close to antipodal. (b): Vectors a and b are close to podal.
We now outline our argument which shows that ClusterTest rejects graph if is far from clusterable. We do this by showing that there are many pair of vertices where and are far from collinear which allows us to say that the eigenvalues of are large due to Lemma 3.2. This is a relatively harder task than showing that and are close to collinear in the completeness case so we need a more complicated argument which is detailed below. For , we define .

We let be the projection onto the span of the eigenvectors of with “large” eigenvalues. We use the above result to show that the aggregate vectors and are far from collinear in Lemma 3.11. This projection trick is necessary to relate to later on.
We now want to use the fact that the aggregate vectors and are far from collinear to find many pairs of vectors that are far from collinear.

We use the pigeonhole principle to deduce that there are pairs of vertices such that and are far from antipodal. Similarly, we show that that there are pairs of vertices such that and are far from podal. This is shown in Lemma 3.13.
Note that the above point does not immediately imply that there are pairs of vertices such that and are far from both podal and antipodal.

Using properties of we transfer this result on the vectors to the vectors.
We now give quantitative versions of the definitions of antipodal and podal which is useful later on in our argument.
Lemma 3.9.
If vectors a and b are close to antipodal then
(3) 
Similarly, if a and b are close to podal then
(4) 
Proof.
We restate a lemma from [3] which says that we can partition a graph that is from clusterable into three subsets of vertices that are separated by sparse cuts.
Lemma 3.10 (Lemma in [3]).
Let be a graph with maximum degree at most . There are constants and , that depend on , such that if is far from clusterable with , then there exists a partition of into three subsets such that for each , we have and .
From now on we assume that and are the smallest of the two parts so
always holds.
We begin by showing that a projection of the aggregate vectors are far from collinear by using tools from [8].
Lemma 3.11.
Let and be two disjoint subsets of vertices such that the cut has conductance less than for . Suppose that and let denote the projection onto the span of the eigenvectors of with eigenvalue greater than . Then and are far from collinear.
Proof.
Recall that is the Laplacian and is the lazy random walk matrix related by the equation . Also recall that the eigenvalues of are with corresponding eigenvectors . Let and define the vector as
where is any constant in and . Let . Write in the eigenbasis of as . We have and one can compute that . Equating these two gives
(5) 
We now also compute in two different ways. We have . On the other hand, using the quadratic form of gives us . Now note that there are three cases where the term is nonzero:

One of vertex and vertex lies in and the other lies in ,

One of vertex and vertex lies in and the other lies in ,

One of vertex and vertex lies in and the other lies in .
In these three cases, evaluates to , , and respectively. We bound these expressions from above by , , and respectively to extract the bound
Now using the fact that the has conductance less than for each , we have
It follows that