1 Introduction
Clustering is one of the most fundamental machine learning problems
[8]. It aims to assign data samples in a data set into different clusters in such a way that samples in the same cluster are more similar compared to those in different clusters.In the past decades, many clustering techniques have been proposed. Among them, spectral clustering has drawn considerable attention and has been widely applied to computer vision, image processing and speech processing
[1, 11]. Although spectral clustering has superior performance, the involved eigendecomposition procedure has a time complexity of , where is the number of samples in the data set. The high computational cost can easily become an obstacle in many largescale applications [4, 2]. To solve this problem, considerable effort has been devoted in research related to approximate spectral clustering. [13]proposed a kmeans based hierachical spectral clustering framework (KASP); Inspired by sparse coding theory,
[2] proposed a landmarkbased representation method (LSC); [3]leveraged the Nyström method to approximate the affinity matrix. However, none of these methods can preserve the spectrum of the original graphs, thus result in a significant degradation of clustering accuracy. For example, for Covtype data set, the clustering accuracy drops more than
when using the LSC method, more than when using the KASP framework and more than when using the Nyström method.[14] introduced a graph coarsening method for dividing the graph topology into random number of partitions. However, [14] mixed up and garbled spectral clustering and graph partitioning. All the data used for graph partitioning in [14] are without cluster membership (labels), so their claim that it works for clustering is wrong. Clustering aims to group similar samples together, however, the graphs used in [14] are pure topologies, rather than data graphs. Their nodes are not associating with data samples. So, there is no ”similarity” between their nodes, how comes clustering? [14] is not applicable to spectral clustering due to the following reasons: 1) spectral clustering aims to discover clusters for all the samples in the data set. However, original samples are missing in the coarsened graph. 2) the number of clusters in spectral clustering is fixed and meaningful (such as handwritten digits data sets have 10 clusters representing the digits 09). So clustering of the data is meaningless if the number of clusters is a random number. 3) To measure the clustering performance, direct measurements such as clustering accuracy should be used. In contrast, there is no direct way to evaluate the random graph partitioning. Merits such as quotientstyle metrics, edgecut metric and expansion or conductance are all needed to be considered. Only using the normalized cut as in [14] cannot demonstrate its effectiveness.
In this paper, we improve spectral clustering via spectrumpreserving node reduction. Our contributions are as follows: (1) To the best of our knowledge, this is the first attempt for employing spectrumpreserving node reduction in the general approximate spectral clustering framework [13] to improve the KASP method. (2) The proposed method can significantly improve the clustering accuracy: our method achieves and
accuracy gain on USPS and MNIST data sets over the standard spectral clustering method, respectively. (3) Our method fundamentally addressed the computational challenge of eigendecomposition procedure in standard spectral clustering algorithm. Compared to the existing approximation methods, our method is the only one that enables us to accelerate spectral clustering for very large data set without loss of clustering accuracy.
2 Preliminaries
kmeans is the most fundamental clustering method [10]. It discovers clusters by minimizing the following objective function:
(1) 
where:
(2) 
and is the centroid of the cluster .
However, it often fails to handle complicated geometric shapes, such as nonconvex shapes [7]. In contrast, spectral clustering is good at detecting nonconvex and linearly nonseparable patterns [6]. Spectral clustering includes the following three steps: 1) construct a Laplacian matrix according to the similarities between data points; 2) embed nodes into dimensional space using the first
nontrivial eigenvectors of
; 3) apply kmeans to partition the embedded data points into clusters. As shown in Figure 1, in the two moons data set, there are two slightly entangled nonconvex shapes, where each cluster corresponds to a moon. In the two circles data set, the samples are arranged in two concentric circles, where each cluster corresponds to a circle. kmeans gives incorrect clustering results for both data sets while spectral clustering performs an ideal clustering. Because kmeans only cares about the distance while spectral clustering cares about the connectivity between nodes. The connectivity information is embedded in the spectrum of the underlying graph so that it is critical to preserve the spectrum when manipulating graphs [12].3 Methods
3.1 Algorithmic Framework
We first construc a standard kNN graph . Then, based on the node proximity measure proposed in [5], the spectral similarity of two nodes and can be calculated as:
(3) 
where is obtained by applying GaussSeidel relaxation to solve for , starting with
random vectors. We aggregate the nodes with high spectral similarity to reduce the graph size while preserving the spectrum of the original graph. During this process, The finetocoarse graph mapping operators
and the coarsetofine mapping operator can be constructed [14]. By aggregating nodes based on spectral similarity, the reduced graph can best represent the original data set in the sense of minimizing the structural distortion [12], as shown in Figure 2.To retrieve the cluster membership of the samples in the original data set, we build a correspondence table using the mapping operators to associate original samples with aggregated nodes. If the desired graph size is not reached, the graph can be further reduced in a multilevel way [14]: given the graph Laplacian and the mapping operators from the finest level to the coarsest level , , where and .
Then standard spectral clustering is applied on the aggregated nodes to obtain clusters of the reduced data set. Finally, we determine the cluster membership of each node by checking the cluster membership of its corresponding aggregated node based on the correspondence table. The complete algorithm flow has been shown in Algorithm 1.
3.2 Algorithm Complexity
The computational complexity of spectrumpreserving node reduction is , where is the number of edges in the original graph and is the number of nodes. The computational complexity of reduced standard spectral clustering is where is the number of aggregated nodes. The complexity of cluster membership retrieving is , where is the number of samples in the original data set.
4 Experiment
Experiments are performed using MATLAB R2020b running on a Laptop. The reported results are averaged over runs. Node reduction scheme implementation is available at ^{1}^{1}1https://github.com/cornellzhang/GraphZoom/tree/master/mat_coarsen.
4.1 Experiment Setup
One midsized data set (USPS), one large data set (MNIST) and one very large data set (Covtype) are used in our experiments. They can be downloaded from the UCI machine learning repository ^{2}^{2}2https://archive.ics.uci.edu/ml/ and LibSVM Data^{3}^{3}3https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/: USPS includes images of USPS hand written digits with attributes; MNIST is a data set from Yann LeCun’s website^{4}^{4}4 http://yann.lecun.com/exdb/mnist/, which includes images with each of them represented by attributes; Covtype includes instances for predicting forest cover type from cartographic variables and each instance with attributes is from one of seven classes.
We compare the proposed method against both the baseline and the stateoftheart fast spectral clustering methods including: (1) the standard spectral clustering algorithm [10], (2) the Nyström method [3], (3) the Landmarkbased spectral clustering (LSC) method that uses random sampling for landmark selection [2], and (4) the KASP method using kmeans for centroids selection [13]. For fair comparison, we use the same parameter setting in [2] for compared algorithms: the number of sampled points in Nyström method ( or the number of landmarks in LSC, or the number of centroids in KASP ) is set to 500.
4.2 Evaluation Metrics
Clustering accuracy () is the most widely used measurement for clustering quality [2, 4]. It is defined as follows:
(4) 
where is the number of data samples, is the groundtruth label, and is the label generated by the algorithm. is a delta function defined as: =1 for , and =0, otherwise. is a permutation function which can be realized using the Hungarian algorithm [9]. A higher value of indicates better clustering quality.
4.3 Experimental Results
Data Set 
Standard SC  Nyström  KASP  LSC  Ours()  Ours()  Ours() 

USPS 
64.31  69.31  70.62  66.28  70.65  72.25  81.39 
MNIST  64.20  55.86  71.23  58.94  70.99  72.42  82.18 
Covtype  48.81  33.24  27.56  22.60  48.77  44.50  48.28 

Data Set 
Standard SC  Nyström  KASP  LSC  Ours()  Ours()  Ours() 

USPS 
0.72  0.29  0.16  0.22  0.21  0.18  0.17 
MNIST  252.59  0.95  0.18  0.49  0.59  0.35  0.21 
Covtype  128.70  5.48  0.70  3.77  0.92  0.50  0.49 

Data Set  5X  10X  50X  

USPS 
9,298  1,692  767  138 
MNIST  70,000  13,658  6,368  1,182 
Covtype  581,012  104,260  44,188  8,192 
Table 1 shows the clustering accuracy results of compared methods and our method with node reduction ratios of , and . The runtime of the eigendecomposition and kmeans steps are reported in Table 2. As observed, the proposed method can consistently lead to dramatic performance improvement, beating all competitors in clustering accuracy across all the three data sets: our method achieves more than accuracy gain on USPS, gain on MNIST and gain on Covtype over the secondbest methods; for the USPS and MNIST data sets, our method achieves over and accuracy gain over the standard spectral clustering method, respectively. The superior clustering results of our method clearly illustrate that spectrumpreserving concise representation can improve clustering accuracy by removing redundant or false relations among data samples.
As shown in Table 3, the reduced graphs generated by our framework have only , and nodes for USPS, MNIST and Covtype data sets, respectively, thereby allowing much faster eigendecompositions. As shown in Fig. 3 and Fig. 4, with increasing node reduction ratio, the proposed method consistently produces high clustering accuracy. For the very large data set Covtype, our method is the only one that enables us to accelerate spectral clustering algorithm without loss of clustering accuracy.
5 Conclusion
We presented a novel framework for aggressively accelerating eigendecomposition and improving accuracy for spectral clustering. We use spectrumpreserving node reduction to reduce the data size and remove the redundant relations among samples. Experimental results demonstrate that our method outperforms stateoftheart methods by a large margin.
References
 [1] (2004) Learning spectral clustering. Advances in neural information processing systems 16 (2), pp. 305–312. Cited by: §1.
 [2] (2011) Large scale spectral clustering with landmarkbased representation.. In AAAI, Cited by: §1, §4.1, §4.2.
 [3] (2004) Spectral grouping using the nystrom method. IEEE transactions on pattern analysis and machine intelligence 26 (2), pp. 214–225. Cited by: §1, §4.1.

[4]
(2013)
Largescale spectral clustering on graphs.
In
Proceedings of the TwentyThird international joint conference on Artificial Intelligence
, pp. 1486–1492. Cited by: §1, §4.2.  [5] (2012) Lean algebraic multigrid (lamg): fast graph laplacian linear solver. SIAM Journal on Scientific Computing 34 (4), pp. B499–B522. Cited by: §3.1.

[6]
(2002)
On spectral clustering: analysis and an algorithm
. Advances in neural information processing systems 2, pp. 849–856. Cited by: §2.  [7] (2019) Kmultiplemeans: a multiplemeans clustering method with specified k clusters. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 959–967. Cited by: §2.
 [8] (2009) Spectral embedded clustering. In TwentyFirst International Joint Conference on Artificial Intelligence, Cited by: §1.
 [9] (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation. Cited by: §4.2.
 [10] (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §2, §4.1.

[11]
(1998)
A new approach to image retrieval with hierarchical color clustering
. IEEE transactions on circuits and systems for video technology 8 (5), pp. 628–643. Cited by: §1.  [12] (2021) High performance spectral methods for graphbased machine learning. Ph.D. Thesis, Michigan Technological University. Cited by: §2, §3.1.
 [13] (2009) Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 907–916. Cited by: §1, §1, §4.1.

[14]
(2021)
Towards scalable spectral embedding and data visualization via spectral coarsening
. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 869–877. Cited by: §1, §3.1, §3.1.
Comments
There are no comments yet.