1 Introduction
The graph partitioning problem is to partition a graph into smaller components such that the components will have some specific properties. This problem is sometimes also referred to as community structure detection in networks. One kind of graph partitioning problem that has gained much scientific interest focuses on partitioning the graph into components with similar size and tries to minimize the number of edges cut in the process. Examples of applications are given in Section 4.
There are many algorithms focusing on solving this kind of problem that give preferable graph partitioning results. Among the numerous methods, two clustering techniques that use spectral properties of matrices derived from the adjacency matrices of graphs are widely used and researched. Fiedler [6]
discovered that a graph’s structure is closely related to one of the eigenvectors of the Laplacian matrix of the graph, and the eigenvector corresponds to the second smallest eigenvalue. Fiedler suggested in
[7]to use signs of the entries in the eigenvector to partition a graph. The clustering method developed by Fiedler is widely referred to as spectral clustering. The concept of modularity was first introduced by Newman and Girvan in
[16], and further explained by Newman in [15]. The modularity clustering method aims to partition a graph while maximizing the modularity. Like the spectral clustering method suggested by Fiedler, the modularity clustering method also uses signs of entries in the eigenvector corresponding to a modularity matrix’s largest eigenvalue.There are some modified versions of the spectral clustering and modularity clustering methods. Chung [5] analyzes the properties of a scaled version of Laplacian matrices. Shi and Malik [20] use the the scaled Laplacian matrices to develop a normalized spectral clustering method and use it on image segmentation. Ng et al. [17] discuss another version of normalized spectral clustering. In their method a oneside scaled Laplacian matrix is used. Bolla [2] analyzes a normalized version of modularity clustering.
Since modularity matrices are derived from the adjacency matrices of graphs, it is interesting to see if the same or similar clustering results can be obtained from eigenvalues of the adjacency matrices. In this paper relations and comparisons between clustering results from using eigenvectors of modularity matrices and adjacency matrices will be given, and the equivalence between using normalized modularity matrices and normalized adjacency matrices to cluster will be proved.
Throughout the paper we assume that is a connected simple graph with vertices and edges. Unless specifically noted, is assumed to be an adjacency matrix of a graph, i.e.
The degree of a vertex is , and . The number of clusters is always fixed to be 2. If more clusters are needed, the clustering methods can be run iteratively to build a hierarchy to get the desired number of clusters. The signs of the entries in the eigenvectors will be used to partition the graph. Assume there are no zero entries in the eigenvectors used. It should be noted that although the adjacency matrices are used in this paper, extending the results to use similarity matrices is also possible. The graph Laplacian is defined by
and the modularity matrix is defined by
where
is the vector containing the degrees of the nodes. The normalized versions of the graph Laplacian and the modularity matrix are
respectively. If is a column vector with all ones, then it is easy to see that is an eigenpair of and , and is an eigenpair of and .
The paper is organized as follows. Section 2 contains the approximation of the leading eigenvector of the modularity matrix with eigenvectors of the adjacency matrix. Section 3 gives the equivalence between normalized adjacency clustering and normalized modularity clustering. Section 4 gives example applications. Conclusions are in Section 5.
2 Dominant Eigenvectors of Modularity and Adjacency Matrices
In this section, we will write the eigenvector corresponding to the largest eigenvalue of a modularity matrix as a linear combination of the eigenvectors of the corresponding adjacency matrix. Before that, we first state a theorem from [3] about the interlacing property of a diagonal matrix and its rankone modification and how to calculate the eigenvectors of a diagonal plus rank one (DPR1) matrix [14]. The theorem can also be found in [23]. These results will be used in our analysis. Let , where is diagonal, . Let be the eigenvalues of , and let be the eigenvalues of . Then if . If the are distinct and all the elements of are nonzero, then the eigenvalues of strictly separate those of . With the notations in Theorem 2, the eigenvector of corresponding to the eigenvalue is given by .
Theorem 2 tells us the eigenvalues of a DPR1 matrix are interlaced with the eigenvalues of the original diagonal matrix. Next we will write the eigenvector corresponding to the largest eigenvalue of a modularity matrix as a linear combination of the eigenvectors of the corresponding adjacency matrix.
With the notations in Section 1, since
is an adjacency matrix, it is symmetric and therefore orthogonally similar to a diagonal matrix. Therefore, there exists orthogonal matrix
and diagonal matrix such thatSuppose the rows and columns of are ordered such that , where . Let . Similarly, since is symmetric, it is orthogonally similar to a diagonal matrix. Suppose the eigenvalues of are with .
Suppose , , and . Then the eigenvector corresponding to the largest eigenvalue of is given by
Since , we have
where and . Since is also symmetric, it is orthogonally similar to a diagonal matrix. So we have
where is orthogonal and is diagonal. Since is a DPR1 matrix, and , the interlacing theorem applies to the eigenvalues of and . More specifically, we have
The strict inequalities hold because and . Then implies . Let . Since , we have . Suppose is an eigenpair of , then
implies that is an eigenpair of if and only if is an eigenpair of . By Corollary 2, the eigenvector of corresponding to is given by
and hence the eigenvector of corresponding to is given by
The point of Theorem 2 is to realize that the vector is a linear combination of the . Let
The purpose of the next theorem is to approximate by a linear combination of that have the largest and examine how good the approximation is by calculating the norm between and its approximation.
With the notations and assumptions in Theorem 2 , let
Suppose that , and the are reordered such that . Then given , can be approximated by
with relative error
where is the 2norm of the vector .
Since
the vector can be written as
So if
is an approximation of , then the difference between and its approximation is
and the 2norm of is
because the are orthonormal. So if is the 2norm of the vector , then the relative error of the approximation is
The utility of this error helps us gauge the number of terms that are required to obtain a given level of accuracy when approximating the dominant eigenvector of the modularity matrix with eigenvectors of the adjacency matrix.
3 Normalized Adjacency and Modularity Clustering
Parallel to the previous analysis, we will prove that the eigenvectors corresponding to the largest eigenvalues of a normalized adjacency matrix and a normalized modularity matrix will give the same clustering results in nontrivial cases. A similar statement is mentioned in [2] without a complete proof, and it is considered in [24] from a different perspective.
Suppose is an adjacency matrix and is the corresponding normalized adjacency matrix. Let be the unnormalized Laplacian matrix and be the normalized Laplacian matrix. Finally let be the unnormalized modularity matrix defined in Section 1, , and be the normalized modularity matrix. We first state the theorem then prove it.
Suppose that zero is a simple eigenvalue of , and one is a simple eigenvalue of . If and , then is an eigenpair of if and only if is an eigenpair of .
The proof of the theorem is obtained by combining the following two observations. The second observation needs more lines to explain so we write it as a lemma.
Observation
is an eigenpair of if and only if is an eigenpair of because
Suppose that is a simple eigenvalue of both and . It follows that if and is an eigenpair of , then is an eigenpair of . If and is an eigenpair of , then is an eigenpair of . For , it is easy to observe that
Let . If is an eigenpair of , we have
Note that is an outer product and , so rank()=1. Because is congruent to , and have the same number of positive, negative and zero eigenvalues by Sylvester’s law [14]. Therefore rank=rank=1. To prove , it is sufficient to prove is in the nullspace of .
Let be the vector such that all its entries are one. Observe that
since is just the sum of the degrees of all the nodes in the graph. Because , is an eigenpair of . Also observe that
Therefore, is an eigenpair of . Since is an eigenvector of corresponding to a nonzero eigenvalue , we have , so is in the nullspace of . This gives and thus . Therefore .
On the other hand, if is an eigenpair of , then we have
Observe that
because the row sums of are all zeros. Therefore, is an eigenpair of . Since is an eigenvector of corresponding to a nonzero eigenvalue , we have , so is in the nullspace of .
This gives and thus . Therefore .
By theorem 3, a bijection from the nonzero eigenvalues of to the eigenvalues of that are not equal to one can be established, and the order of these eigenvalues is maintained. Since zero is always an eigenvalue of , the largest eigenvalue of is always nonnegative. Newman [15] gives a discussion of when the largest eigenvalue of can be zero. Since and are congruent, it follows that if zero is the largest eigenvalue of , then it is also the largest eigenvalue of . In this case, all nodes in the graph will be put into one cluster because is an eigenpair of and all entries in the vector are larger than zero. The following theorem establishes that the eigenvectors corresponding to the largest eigenvalues of a normalized adjacency matrix and a normalized modularity matrix are the same for nontrivial cases (i.e. when the largest eigenvalue of is not zero), and therefore they will provide the same clustering results in nontrivial cases.
With the assumptions in Theorem 3, and given zero is not the largest eigenvalue of , the eigenvector corresponding to the largest eigenvalue of and the eigenvector corresponding to the second largest eigenvalue of are identical. Since is positive semidefinite [22], zero is the smallest eigenvalue of . Then by Observation 3, one is the largest eigenvalue of . Since all eigenvalues of that are not equal to one are also the eigenvalues of , it follows that if the simple zero eigenvalue is not the largest eigenvalue of , then the largest eigenvalue of is the second largest eigenvalue of and they have the same eigenvectors by Theorem 3.
4 Some Applications and Experiments
To corroborate the theoretical results obtained in the previous sections, experiments were conducted with three wellknown data sets. In the following experiments the effects of the units are first eliminated by normalizing each variable by the 2norm if necessary, then the Gaussian similarity function is applied to the data to generate a similarity matrix . The parameters in the similarity function used for different data sets are different, and will be specified individually. The mean, , of all offdiagonal entries in is computed and the adjacency matrix is formed by
4.1 Data Sets
We used three popular data sets from the literature, and they are described below.
4.1.1 Wine Recognition Data Set
The wine recognition data from the UCI data repository [12] is one of the most famous data sets used in data mining [8][10][18]. The data set is a result of chemical analysis of wines growing in the same region. The difference between the wines is that they are derived from three different cultivars. The data contains 178 wine samples, the labels of the samples that tell which kind of wine each sample is and 13 variables from chemical analysis. In the experiments the data of the first two kinds of wines are used to generate a similarity matrix. A good clustering method should be able to put samples from the same classes into the same clusters. To build the similarity matrix, the Gaussian similarity function with is used.
4.1.2 Breast Cancer Wisconsin (Original) Data Set
The Breast Cancer Wisconsin Data Set [13] is a widely used data in classification and clustering [1] and it can be downloaded from the UCI data repository. The data contains 699 instances and 9 attributes. The attributes are measurements of the sample tissues. Each data has a label to indicate whether the tissue is benign or malignant. The data contains 16 instances that has missing values in the attributes, and the missing values are replaced by zeros in the experiments. To build the similarity matrix, the Gaussian similarity function with is used.
4.1.3 PenDigit Data Sets from MNIST database
The PenDigit data sets are subsets of the widely used MNIST database
[11][25][9][4][19]. The original data contains a training set of 60,000 handwritten digits from 44 writers. One subset used in the experiments contains some of the digits 1 and 7, and the other subset contains some of the digits 2 and 3^{1}^{1}1The data can be downloaded at http://www.kaggle.com/c/digitrecognizer/data. Each piece of data is a row vector converted from a greyscale image. Each image is 28 pixels in height and 28 pixels in width, so there are 784 pixels in total. Each row vector contains the label of the digit and the lightness of each pixel. Lightness of a pixel is represented by a number from 0 to 255 inclusively, and smaller numbers represent lighter pixels. To build the similarity matrix, the Gaussian similarity function with is used.4.2 Clustering Synchronization Rate
In classification methods the performance is gauged by using metrics such as accuracy and error rate. More specifically, the accuracy is defined as the number of correct classifications over the total number of classifications, and the error rate is defined as the number of wrong classifications over the total number of classifications [21]. Similar metics can be used to evaluate how close two results from two different clustering methods can be. Suppose two clustering methods and are used to partition the same data into two parts. Define the clustering synchronization rate (CSR) between and to be
(1) 
In other words this is the percentage of agreement between and . Because different clustering method may give different labels to a cluster, the max function is used in the definition. If is the known “ground” truth of the clusters, then the CSR between and is as the same as the accuracy of . It should be noted that the CSR is not relevant to the accuracy of the clustering methods unless one of them is the ground truth.
4.3 Results
The experimental results are in the tables below. Table 1 contains the number of data points in each data set and the accuracy of each clustering method when applied on the data sets. The symbols , and in the table represent the unnormalized spectral clustering, unnormalized modularity clustering and normalized spectral clustering, respectively. These clustering results are used as the benchmarks. Note that by Theorem 3, the clustering results from using and are the same. The columns use the approximations described in Theorem 2 with the first eigenvectors of the adjacency matrix to do clustering. In Table 2, the CSRs between the leading eigenvector of and its approximations are computed, and the largest four magnitudes of described in Theorem 2 are listed.
Data  Number of data points  

Wine  130  56.2  91.2  92.3  91.2  91.2  91.2 
Breast Cancer  699  70.0  96.6  96.3  87.0  96.7  96.7 
PenDigit17  9085  51.8  96.9  96.3  82.9  96.5  97.0 
PenDigit23  8528  51.2  90.1  88.2  89.2  90.2  90.3 
Data  CSR(,)  CSR(,)  CSR(,)  

Wine  100.0  100.0  100.0  1344.7  21.1  1.2  0.53 
Breast Cancer  88.4  99.9  99.9  59.5  31.2  2.1  0.45 
PenDigit17  82.8  99.0  99.6  265.2  146.3  42.6  7.1 
PenDigit23  94.2  99.6  99.6  653.0  150.1  17.2  5.1 
From Table 1, it can be seen that the approximations of the leading eigenvector of can outperform the unnormalized spectral clustering method for all data sets considered, and the accuracy is about the same with the unnormalized modularity method and the normalized spectral clustering method. In some cases, the clustering results from the approximations are better than the benchmarks. From Table 2, it can be seen that the CSRs between the leading eigenvector of and its approximations are higher than 80%. If two or three eigenvectors of are used, then the CSRs are higher than 99%.
5 Conclusion
In this paper the exact linear relation between the leading eigenvector of the unnormalized modularity matrix and the eigenvectors of the adjacency matrix is developed. It is proven that the leading eigenvector of the modularity matrix can be written as a linear combination of the eigenvectors of the adjacency matrix, and the coefficients in the linear combination are determined. Then a method to approximate the leading eigenvector of the modularity matrix is given, and the relative error of the approximation is derived. It is also proven that when the largest eigenvalue of the modularity matrix is nonzero, the normalized modularity clustering method will give the same clustering results as obtained by using the eigenvector corresponding to the smallest eigenvalue of the normalized adjacency matrix. A new metric, the clustering synchronizing rate, is defined to compare different clustering methods. Some applications and experiments are given to illustrate and corroborate the points that are made in the theoretical development.
References
 [1] R. W. Abbey, Stochastic clustering: Visualization and application., PhD Thesis, North Carolina State University, (2013).

[2]
M. Bolla,
Penalized versions of the newmangirvan modularity and their relation to normalized cuts and kmeans clustering
, Physical Review E, 84 (2011), p. 016108.  [3] J. R. Bunch, C. P. Nielsen, and D. C. Sorensen, Rankone modification of the symmetric eigenproblem, Numerische Mathematik, 31 (1978), pp. 31–48.
 [4] R. Chitta, R. Jin, and A. K. Jain, Efficient kernel clustering using random fourier features, in Data Mining (ICDM), 2012 IEEE 12th International Conference on, IEEE, 2012, pp. 161–170.
 [5] F. R. Chung, Spectral graph theory, vol. 92, American Mathematical Soc., 1997.
 [6] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal, 23 (1973), pp. 298–305.
 [7] , A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory, Czechoslovak Mathematical Journal, 25 (1975), pp. 619–633.
 [8] J. Goldberger, G. E. Hinton, S. T. Roweis, and R. R. Salakhutdinov, Neighbourhood components analysis, in Advances in Neural Information Processing Systems 17, L. Saul, Y. Weiss, and L. Bottou, eds., MIT Press, 2005, pp. 513–520.

[9]
T. Hertz, A. BarHillel, and D. Weinshall, Boosting margin based
distance functions for clustering
, in Proceedings of the twentyfirst international conference on Machine learning, ACM, 2004, p. 50.
 [10] H. Ishibuchi and T. Yamamoto, Rule weight specification in fuzzy rulebased classification systems, Fuzzy Systems, IEEE Transactions on, 13 (2005), pp. 428–435.
 [11] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, 86 (1998), pp. 2278–2324.
 [12] M. Lichman, UCI machine learning repository, 2013.

[13]
O. L. Mangasarian, W. N. Street, and W. H. Wolberg,
Breast cancer diagnosis and prognosis via linear programming
, Operations Research, 43 (1995), pp. 570–577.  [14] C. D. Meyer, Matrix analysis and applied linear algebra, Siam, 2000.
 [15] M. E. Newman, Modularity and community structure in networks, Proceedings of the National Academy of Sciences, 103 (2006), pp. 8577–8582.
 [16] M. E. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical review E, 69 (2004), p. 026113.

[17]
A. Y. Ng, M. I. Jordan, Y. Weiss, et al.,
On spectral clustering: Analysis and an algorithm
, Advances in neural information processing systems, 2 (2002), pp. 849–856.  [18] S. L. Race, Iterative consensus clustering, PhD Thesis, North Carolina State University, (2014).
 [19] S. L. Race, C. Meyer, and K. Valakuzhy, Determining the number of clusters via iterative consensus clustering, in Proceedings of the SIAM Conference on Data Mining (SDM), SIAM, 2013, pp. 94––102.
 [20] J. Shi and J. Malik, Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (2000), pp. 888–905.
 [21] P.N. Tan, M. Steinbach, V. Kumar, et al., Introduction to data mining, vol. 1, Pearson Addison Wesley Boston, 2006.
 [22] U. Von Luxburg, A tutorial on spectral clustering, Statistics and computing, 17 (2007), pp. 395–416.
 [23] J. H. Wilkinson, J. H. Wilkinson, and J. H. Wilkinson, The algebraic eigenvalue problem, vol. 87, Clarendon Press Oxford, 1965.
 [24] L. Yu and C. Ding, Network community discovery: solving modularity clustering via normalized cut, in Proceedings of the Eighth Workshop on Mining and Learning with Graphs, ACM, 2010, pp. 34–36.

[25]
R. Zhang and A. I. Rudnicky, A large scale clustering scheme for
kernel kmeans
, in Pattern Recognition, 2002. Proceedings. 16th International Conference on, vol. 4, IEEE, 2002, pp. 289–292.
Comments
There are no comments yet.