Improving Spectral Clustering Using Spectrum-Preserving Node Reduction

10/24/2021
by   Yongyu Wang, et al.
0

Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/25/2018

Scalable Spectral Clustering Using Random Binning Features

Spectral clustering is one of the most effective clustering approaches t...
05/21/2019

Spatially Constrained Spectral Clustering Algorithms for Region Delineation

Regionalization is the task of dividing up a landscape into homogeneous ...
07/07/2016

Mini-Batch Spectral Clustering

The cost of computing the spectrum of Laplacian matrices hinders the app...
05/15/2019

EasiCS: the objective and fine-grained classification method of cervical spondylosis dysfunction

The precise diagnosis is of great significance in developing precise tre...
07/21/2020

Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling

Spectral clustering has shown a superior performance in analyzing the cl...
09/27/2019

Clustering Uncertain Data via Representative Possible Worlds with Consistency Learning

Clustering uncertain data is an essential task in data mining for the in...
02/15/2019

Massively Parallel Benders Decomposition for Correlation Clustering

We tackle the problem of graph partitioning for image segmentation using...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Clustering is one of the most fundamental machine learning problems

[8]. It aims to assign data samples in a data set into different clusters in such a way that samples in the same cluster are more similar compared to those in different clusters.

In the past decades, many clustering techniques have been proposed. Among them, spectral clustering has drawn considerable attention and has been widely applied to computer vision, image processing and speech processing

[1, 11]. Although spectral clustering has superior performance, the involved eigen-decomposition procedure has a time complexity of , where is the number of samples in the data set. The high computational cost can easily become an obstacle in many large-scale applications [4, 2]. To solve this problem, considerable effort has been devoted in research related to approximate spectral clustering. [13]

proposed a k-means based hierachical spectral clustering framework (KASP); Inspired by sparse coding theory,

[2] proposed a landmark-based representation method (LSC); [3]

leveraged the Nyström method to approximate the affinity matrix. However, none of these methods can preserve the spectrum of the original graphs, thus result in a significant degradation of clustering accuracy. For example, for Covtype data set, the clustering accuracy drops more than

when using the LSC method, more than when using the KASP framework and more than when using the Nyström method.

[14] introduced a graph coarsening method for dividing the graph topology into random number of partitions. However, [14] mixed up and garbled spectral clustering and graph partitioning. All the data used for graph partitioning in [14] are without cluster membership (labels), so their claim that it works for clustering is wrong. Clustering aims to group similar samples together, however, the graphs used in [14] are pure topologies, rather than data graphs. Their nodes are not associating with data samples. So, there is no ”similarity” between their nodes, how comes clustering? [14] is not applicable to spectral clustering due to the following reasons: 1) spectral clustering aims to discover clusters for all the samples in the data set. However, original samples are missing in the coarsened graph. 2) the number of clusters in spectral clustering is fixed and meaningful (such as handwritten digits data sets have 10 clusters representing the digits 0-9). So clustering of the data is meaningless if the number of clusters is a random number. 3) To measure the clustering performance, direct measurements such as clustering accuracy should be used. In contrast, there is no direct way to evaluate the random graph partitioning. Merits such as quotient-style metrics, edge-cut metric and expansion or conductance are all needed to be considered. Only using the normalized cut as in [14] cannot demonstrate its effectiveness.

In this paper, we improve spectral clustering via spectrum-preserving node reduction. Our contributions are as follows: (1) To the best of our knowledge, this is the first attempt for employing spectrum-preserving node reduction in the general approximate spectral clustering framework [13] to improve the KASP method. (2) The proposed method can significantly improve the clustering accuracy: our method achieves and

accuracy gain on USPS and MNIST data sets over the standard spectral clustering method, respectively. (3) Our method fundamentally addressed the computational challenge of eigen-decomposition procedure in standard spectral clustering algorithm. Compared to the existing approximation methods, our method is the only one that enables us to accelerate spectral clustering for very large data set without loss of clustering accuracy.


2 Preliminaries

k-means is the most fundamental clustering method [10]. It discovers clusters by minimizing the following objective function:

(1)

where:

(2)

and is the centroid of the cluster .

However, it often fails to handle complicated geometric shapes, such as non-convex shapes [7]. In contrast, spectral clustering is good at detecting non-convex and linearly non-separable patterns [6]. Spectral clustering includes the following three steps: 1) construct a Laplacian matrix according to the similarities between data points; 2) embed nodes into -dimensional space using the first

nontrivial eigenvectors of

; 3) apply k-means to partition the embedded data points into clusters. As shown in Figure 1, in the two moons data set, there are two slightly entangled non-convex shapes, where each cluster corresponds to a moon. In the two circles data set, the samples are arranged in two concentric circles, where each cluster corresponds to a circle. k-means gives incorrect clustering results for both data sets while spectral clustering performs an ideal clustering. Because k-means only cares about the distance while spectral clustering cares about the connectivity between nodes. The connectivity information is embedded in the spectrum of the underlying graph so that it is critical to preserve the spectrum when manipulating graphs [12].

(a) k-means of the two moons data set
(b) Spectral clustering of the two moons data set
(c) k-means of the two circles data set
(d) Spectral clustering of the two circles data set
Figure 1: Performance of k-means and spectral clustering

3 Methods

3.1 Algorithmic Framework

We first construc a standard k-NN graph . Then, based on the node proximity measure proposed in [5], the spectral similarity of two nodes and can be calculated as:

(3)

where is obtained by applying Gauss-Seidel relaxation to solve for , starting with

random vectors. We aggregate the nodes with high spectral similarity to reduce the graph size while preserving the spectrum of the original graph. During this process, The fine-to-coarse graph mapping operators

and the coarse-to-fine mapping operator can be constructed [14]. By aggregating nodes based on spectral similarity, the reduced graph can best represent the original data set in the sense of minimizing the structural distortion [12], as shown in Figure 2.

(a) The adjacency graph corresponding to the original node set of 9298 nodes
(b) The adjacency graph corresponding to the reduced node set of 138 pseudo-nodes
Figure 2: Visualization of node reduction results of USPS data set

To retrieve the cluster membership of the samples in the original data set, we build a correspondence table using the mapping operators to associate original samples with aggregated nodes. If the desired graph size is not reached, the graph can be further reduced in a multi-level way [14]: given the graph Laplacian and the mapping operators from the finest level to the coarsest level , , where and .

Then standard spectral clustering is applied on the aggregated nodes to obtain clusters of the reduced data set. Finally, we determine the cluster membership of each node by checking the cluster membership of its corresponding aggregated node based on the correspondence table. The complete algorithm flow has been shown in Algorithm 1.

Input: A data set with samples , number of clusters .
Output: Clusters ,…,.

1:

  Construct a k-nearest neighbor (kNN) graph

from the input data ;
2:  Compute the adjacency matrix and diagonal matrix of graph ;
3:  Compute the Laplacian matrix =-;
4:  Perform spectrum-preserving node reduction to obtain the reduced graph Laplacian ;
5:  Build a correspondence table to associate each node with its corresponding aggregated node;
6:  Compute the eigenvectors ,…, that correspond to the bottom

nonzero eigenvalues of

;
7:  Construct , with the eigenvectors stored as column vectors;
8:  Perform k-means algorithm to partition the rows of into clusters.
9:  Retrieve the cluster membership of each by assigning it to the cluster as its corresponding aggregated node and return the result.
Algorithm 1 spectrum-preserving node reduction-based high-performance spectral clustering framework

3.2 Algorithm Complexity

The computational complexity of spectrum-preserving node reduction is , where is the number of edges in the original graph and is the number of nodes. The computational complexity of reduced standard spectral clustering is where is the number of aggregated nodes. The complexity of cluster membership retrieving is , where is the number of samples in the original data set.

4 Experiment

Experiments are performed using MATLAB R2020b running on a Laptop. The reported results are averaged over runs. Node reduction scheme implementation is available at 111https://github.com/cornell-zhang/GraphZoom/tree/master/mat_coarsen.

4.1 Experiment Setup

One mid-sized data set (USPS), one large data set (MNIST) and one very large data set (Covtype) are used in our experiments. They can be downloaded from the UCI machine learning repository 222https://archive.ics.uci.edu/ml/ and LibSVM Data333https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/: USPS includes images of USPS hand written digits with attributes; MNIST is a data set from Yann LeCun’s website444 http://yann.lecun.com/exdb/mnist/, which includes images with each of them represented by attributes; Covtype includes instances for predicting forest cover type from cartographic variables and each instance with attributes is from one of seven classes.

We compare the proposed method against both the baseline and the state-of-the-art fast spectral clustering methods including: (1) the standard spectral clustering algorithm [10], (2) the Nyström method [3], (3) the Landmark-based spectral clustering (LSC) method that uses random sampling for landmark selection [2], and (4) the KASP method using k-means for centroids selection [13]. For fair comparison, we use the same parameter setting in [2] for compared algorithms: the number of sampled points in Nyström method ( or the number of landmarks in LSC, or the number of centroids in KASP ) is set to 500.

4.2 Evaluation Metrics

Clustering accuracy () is the most widely used measurement for clustering quality [2, 4]. It is defined as follows:

(4)

where is the number of data samples, is the ground-truth label, and is the label generated by the algorithm. is a delta function defined as: =1 for , and =0, otherwise. is a permutation function which can be realized using the Hungarian algorithm [9]. A higher value of indicates better clustering quality.

4.3 Experimental Results


Data Set
Standard SC Nyström KASP LSC Ours() Ours() Ours()

USPS
64.31 69.31 70.62 66.28 70.65 72.25 81.39
MNIST 64.20 55.86 71.23 58.94 70.99 72.42 82.18
Covtype 48.81 33.24 27.56 22.60 48.77 44.50 48.28

Table 1: Spectral clustering accuracy (%)

Data Set
Standard SC Nyström KASP LSC Ours() Ours() Ours()

USPS
0.72 0.29 0.16 0.22 0.21 0.18 0.17
MNIST 252.59 0.95 0.18 0.49 0.59 0.35 0.21
Covtype 128.70 5.48 0.70 3.77 0.92 0.50 0.49

Table 2: Runtime (seconds)
Data Set 5X 10X 50X


USPS
9,298 1,692 767 138
MNIST 70,000 13,658 6,368 1,182
Covtype 581,012 104,260 44,188 8,192
Table 3: Graph complexity comparison
Figure 3: ACC VS reduction ratio for the MNIST data set.
Figure 4: Graph size VS reduction ratio for the MNIST data set.

Table 1 shows the clustering accuracy results of compared methods and our method with node reduction ratios of , and . The runtime of the eigen-decomposition and k-means steps are reported in Table 2. As observed, the proposed method can consistently lead to dramatic performance improvement, beating all competitors in clustering accuracy across all the three data sets: our method achieves more than accuracy gain on USPS, gain on MNIST and gain on Covtype over the second-best methods; for the USPS and MNIST data sets, our method achieves over and accuracy gain over the standard spectral clustering method, respectively. The superior clustering results of our method clearly illustrate that spectrum-preserving concise representation can improve clustering accuracy by removing redundant or false relations among data samples.

As shown in Table 3, the reduced graphs generated by our framework have only , and nodes for USPS, MNIST and Covtype data sets, respectively, thereby allowing much faster eigen-decompositions. As shown in Fig. 3 and Fig. 4, with increasing node reduction ratio, the proposed method consistently produces high clustering accuracy. For the very large data set Covtype, our method is the only one that enables us to accelerate spectral clustering algorithm without loss of clustering accuracy.

5 Conclusion

We presented a novel framework for aggressively accelerating eigen-decomposition and improving accuracy for spectral clustering. We use spectrum-preserving node reduction to reduce the data size and remove the redundant relations among samples. Experimental results demonstrate that our method outperforms state-of-the-art methods by a large margin.

References

  • [1] F. Bach and M. Jordan (2004) Learning spectral clustering. Advances in neural information processing systems 16 (2), pp. 305–312. Cited by: §1.
  • [2] X. Chen and D. Cai (2011) Large scale spectral clustering with landmark-based representation.. In AAAI, Cited by: §1, §4.1, §4.2.
  • [3] C. Fowlkes, S. Belongie, F. Chung, and J. Malik (2004) Spectral grouping using the nystrom method. IEEE transactions on pattern analysis and machine intelligence 26 (2), pp. 214–225. Cited by: §1, §4.1.
  • [4] J. Liu, C. Wang, M. Danilevsky, and J. Han (2013) Large-scale spectral clustering on graphs. In

    Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

    ,
    pp. 1486–1492. Cited by: §1, §4.2.
  • [5] O. E. Livne and A. Brandt (2012) Lean algebraic multigrid (lamg): fast graph laplacian linear solver. SIAM Journal on Scientific Computing 34 (4), pp. B499–B522. Cited by: §3.1.
  • [6] A. Y. Ng, M. I. Jordan, Y. Weiss, et al. (2002)

    On spectral clustering: analysis and an algorithm

    .
    Advances in neural information processing systems 2, pp. 849–856. Cited by: §2.
  • [7] F. Nie, C. Wang, and X. Li (2019) K-multiple-means: a multiple-means clustering method with specified k clusters. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 959–967. Cited by: §2.
  • [8] F. Nie, D. Xu, I. W. Tsang, and C. Zhang (2009) Spectral embedded clustering. In Twenty-First International Joint Conference on Artificial Intelligence, Cited by: §1.
  • [9] C. H. Papadimitriou and K. Steiglitz (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation. Cited by: §4.2.
  • [10] U. Von Luxburg (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §2, §4.1.
  • [11] X. Wan and C. Kuo (1998)

    A new approach to image retrieval with hierarchical color clustering

    .
    IEEE transactions on circuits and systems for video technology 8 (5), pp. 628–643. Cited by: §1.
  • [12] Y. Wang (2021) High performance spectral methods for graph-based machine learning. Ph.D. Thesis, Michigan Technological University. Cited by: §2, §3.1.
  • [13] D. Yan, L. Huang, and M. I. Jordan (2009) Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 907–916. Cited by: §1, §1, §4.1.
  • [14] Z. Zhao, Y. Zhang, and Z. Feng (2021)

    Towards scalable spectral embedding and data visualization via spectral coarsening

    .
    In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 869–877. Cited by: §1, §3.1, §3.1.