Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

10/12/2017
by   Yongyu Wang, et al.
Michigan Technological University
0

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

11/23/2019

GRASPEL: Graph Spectral Learning at Scale

Learning meaningful graphs from data plays important roles in many data ...
01/05/2017

Signed Laplacian for spectral clustering revisited

Classical spectral clustering is based on a spectral decomposition of a ...
12/21/2018

Nearly-Linear Time Spectral Graph Reduction for Scalable Graph Partitioning and Data Visualization

This paper proposes a scalable algorithmic framework for spectral reduct...
11/14/2017

Similarity-Aware Spectral Sparsification by Edge Filtering

In recent years, spectral graph sparsification techniques that can compu...
04/16/2021

SGL: Spectral Graph Learning from Measurements

This work introduces a highly scalable spectral graph densification fram...
08/28/2017

Towards Spectral Sparsification of Simplicial Complexes Based on Generalized Effective Resistance

As a generalization of the use of graphs to describe pairwise interactio...
09/09/2018

Clustering of graph vertex subset via Krylov subspace model reduction

Clustering via graph-Laplacian spectral imbedding is ubiquitous in data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Data clustering and graph partitioning are playing increasingly important roles in many compute-intensive applications related to scientific computing, data mining, machine learning, image processing, etc. Among the existing data clustering and graph partitioning methods, spectral methods have gained great attention in recent years

[1, 2, 3, 4]

, which typically involve solving eigenvalue decomposition problems associated with graph Laplacians. For example, classical spectral clustering or partitioning algorithms leverage the first few nontrivial eigenvectors corresponding to the smallest nonzero eigenvalues of graph Laplacians for low dimensional embedding, which is followed by a k-means clustering procedure that usually leads to high-quality data clustering (graph partitioning) results. Although spectral methods have many advantages, such as easy implementation, good solution quality and rigorous theoretical foundations

[5, 3, 4], the high computational cost (e.g. memory and runtime cost) due to the eigenvalue decomposition procedure can immediately hinder their applications in emerging big data (graph) analytics tasks [6].

To address the computational bottleneck of data clustering or graph partitioning methods, recent research efforts aim to reduce the complexity of the original data network (graph Laplacian) through various kinds of approximations: k-nearest neighbor (k-NN) graphs maintain k nearest neighbors for each node, whereas -neighborhood graphs keep the neighbors within the range of distance [7]

; a sampling-based approach for affinity matrix approximation using Nyström method has been introduced

[8, 9], while its error analysis has been introduced in [10, 11]; a landmark-based method for representing the original data points has been introduced in [12]; [13] proposed a general framework for fast approximate spectral clustering by collapsing the original data points into a small number of centroids using k-means or random-projection trees; [14]

introduced a method for compressing the original graph into a sparse bipartite graph by generating a small number of “supernodes”; a graph sparsification method using a similarity-based heuristic has been proposed for scalable clustering

[15]. However, none of the existing approximation methods can efficiently and robustly preserve the spectrums of the original graphs, and thus may lead to degraded or misleading result.

Recent spectral graph sparsification research enables to compute nearly-linear-sized111The number of edges is similar to the number of nodes in the sparsifier. sparsifiers that can well preserve the spectrum (eigenvalues and eigenvectors) of the original graph (Laplacian), which immediately leads to a series of “theoretically nearly-linear”

numerical and graph algorithms for solving sparse matrices, graph-based semi-supervised learning (SSL), as well as spectral graph (data) partitioning (clustering) and max-flow problems

[16, 17, 18, 19, 20, 21, 22, 23, 24]. For instances, sparsified transportation networks allow developing more scalable navigation (routing) algorithms in large transportation systems; sparsified social networks enable to more effectively understand and predict information propagation phenomenons in large social networks; sparsified data networks enable to more efficiently store, partition (cluster) and analyze big data networks; sparsified matrices can be leveraged to accelerate solving large linear system of equations.

Inspired by recent progress in the development of efficient spectral graph sparsification methods [25, 22, 26], we propose a novel spectrum-preserving graph sparsification method for constructing ultra-sparse nearest-neighbor (u-NN) graphs that can immediately lead to highly-scalable spectral clustering without loss of accuracy. The key contributions of this work include:

  1. In contrast to existing graph approximation approaches, the proposed method enables to directly preserve the key spectral (structural) properties of the original graph within nearly-linear-sized u-NN graphs for scalable spectral clustering.

  2. A novel spectral edge embedding scheme as well as a robust eigenvalue stability checking procedure are introduced for iteratively recovering small portions of off-tree edges to the low-stretch spanning tree (LSST), which can dramatically improve the spectral similarity of the sparsified graph.

  3. An incremental graph densification procedure based on an efficient spectral graph embedding scheme is proposed for adding extra (new) edges that have been missing in the original k-NN graph but can still be critical for high-quality spectral clustering tasks.

2 Preliminaries

2.1 Spectral clustering algorithms.

Spectral clustering methods can often outperform traditional clustering algorithms, such as k-means algorithms[27]. Consider a data graph , where and denote the graph vertex and edge sets, respectively, while denotes a weight (similarity) function that assigns positive weights to all edges. The symmetric diagonally dominant (SDD) Laplacian matrix of graph can be constructed as follows:

(1)

Spectral clustering methods typically include the following three steps: 1) construct a Laplacian matrix according to the entire data set; 2) embed data points into k-dimensional space using first k nontrivial eigenvectors the graph Laplacian; 3) perform k-means algorithm to partition the embedded data points into different clusters. Existing spectral clustering algorithms can be very computationally expensive for handling large-scale data sets due to the very costly procedure for computing Laplacian eigenvectors.

2.2 Spectral graph sparsification.

Graph sparsification aims to find a graph proxy (sparsifier) that has the same set of vertices of the original graph , but much fewer edges. In general, there are two types of graph sparsification methods. The cut sparsification methods preserve cuts by random sampling of edges [29], whereas spectral sparsification methods can preserve graph spectral properties, such as eigenvalues and eigenvectors of the Laplacian and thus more powerful than cut sparsification methods. Since preserving the first (bottom) few eigenvectors of graph Laplacians within the graph sparsifier is key to spectral clustering tasks, this work will only focus on spectral sparsification of graphs. The Laplacian quadratic form is defined as

(2)

where

is a real vector.

Figure 1: From a spanning tree to a spectral sparsifier.

We say two graphs and are spectrally similar if the following holds for all real vectors :

(3)

By defining the relative condition number to be:

(4)

where and denote the largest and smallest nonzero generalized eigenvalues that satisfy:

(5)

with corresponding to the generalized eigenvector of , it can be shown that:

(6)

which indicates that a smaller relative condition number corresponds to a higher spectral similarity of two graphs.

A recent nearly-linear time spectral graph sparsification algorithm has been proposed to dramatically reduce via the following two steps [25], which is also shown in Fig 1: 1) extract a low-stretch spanning tree (LSST) from the original graph; 2) recover a small portion of “spectrally critical” off-tree edges to LSST to form the spectral sparsifier. It has been shown that a -similar spectral sparsifier with edges can be obtained in almost linear time based on the LSST [30], where and .

3 Spectrally-Sparsified Spectral Clustering

3.1 Overview of this work.

The overview of the proposed method has been shown in Fig 2. For a given input data set, an LSST is first extracted based on its original NN graph. Next, spectral (generalized eigenvalue) embedding and ranking of off-tree edges will be performed by leveraging the generalized eigenvalue perturbation analysis framework [25]. Then small portions of “spectrally critical” off-tree edges will be iteratively selected and recovered to the LSST to dramatically improve the approximation of the spectral sparsifier. To determine the suitable amount of off-tree edges needed for high-quality spectral clustering tasks, we propose an effective scheme for stability checking of the first few eigenvalues and eigenvectors, which assures good preservation of the original graph spectrums. Additional edges that are missing in the original NN graph will be added to the sparsifier by performing an efficient spectral graph embedding procedure for the latest spectral sparsifier.

Figure 2: Scalable spectral clustering via spectrum-preserving sparsification

3.2 Spanning-tree spectral sparsifier.

We assume that the original graph is a weighted, undirected and connected graph, whereas is its graph sparsifier. We start analyzing the spectral similarity between a given graph and its spanning-tree sparsifier . The stretch of a tree edge can be defined as:

(7)

where are edges forming a unique path in that connect the endpoints of . The total stretch of with respect to is defined as:

(8)

is a good measure of the overall distortion due to the spanning-tree approximation. Denoting the pseudo-inverse of by , and the descending eigenvalues of by , recent work shows the trace of is equal to the total stretch [31]:

(9)

Recent theoretical computer science (TCS) research showed that every undirected graph has an LSST such that the total stretch is bounded by [18]. It has also been shown that there are not many large generalized eigenvalues: has at most eigenvalues greater than [31]. Consequently, a good spectral sparsifier can be obtained by recovering a few off-tree edges to the spanning tree for dramatically reducing the largest generalized eigenvalues.

3.3 Spectral embedding of off-tree edges.

A practically efficient nearly-linear time algorithm for constructing -similar spectral sparsifiers with off-tree edges has been proposed in [25]. To identify “spectrally critical” off-tree edges for adding to the LSST, the following generalized eigenvalue perturbation analysis is adopted:

(10)

where a perturbation that includes off-tree edges is applied to , resulting in perturbed generalized eigenvalues and eigenvectors and for , respectively. The key to effective spectral sparsification is to identify the off-tree edges that will result in the greatest reduction in via the following spectral embedding steps:
Step 1: Start with an initial random vector orthogonal to all-1 vector written as:

(11)

where for are the -orthogonal generalized eigenvectors of satisfying:

(12)

Step 2: Perform t-step generalized power iterations with :

(13)

Step 3: Compute the Laplacian quadratic form for with :

(14)

where includes all off-tree edges, and is a vector with only the -th element being , the -th element being and others being . (14) reflects the spectral similarity between graphs and : greater values indicate larger and thus lower spectral similarity. More importantly, (14) allows embedding generalized eigenvalues into the Laplacian quadratic form of each off-tree edge and subsequently ranking off-tree edges according to their “spectral criticality” levels. Recovering the off-tree edges with largest values are highly likely to significantly impact the largest generalized eigenvalues. It should be noted that the required number of generalized power iterations can be rather small (e.g. ) in practice for achieving good spectral embedding result.

3.4 Preservation of bottom eigenvalues

It is important to assure that recovering the most “spectrally critical” off-tree edges identified by (14) can always effectively improve the preservation of key (bottom) eigenvalues and eigenvectors within the sparsified Laplacians. To this end, the following theoretical analysis has been proposed.

A “spectrally unique” off-tree edge is defined to be the edge that connects to vertices and , and only impacts a single large generalized eigenvalue , though each off-tree edge usually influence more than one eigenvalue or eigenvector according to (14). Then the following truncated expansion of the Laplacian quadratic form of (14) with only top dominant “spectrally unique” off-tree edges for fixing the top largest eigenvalues of can be written as:

(15)

Since each off-tree edge only impacts one generalized eigenvalue, the following is true according to (14):

(16)

The effective resistance of edge becomes:

(17)

which immediate leads to:

(18)

which indicates . Consequently, the most “spectrally critical” off-tree edges identified by (14) or (18) will have the largest stretch values and therefore immediately impact the largest eigenvalues of . In fact, (18) can be regarded as a randomized version of trace in (9), with each off-tree edge scaled up by a factor of .

Denote the descending eigenvalues and the corresponding unit-length, mutually-orthogonal eigenvectors of by , and . Similarly denote the eigenvalues and eigenvectors of by and . Then the following spectral decompositions of and always hold:

(19)

which leads to the following trace of :

(20)

where satisfies:

(21)

According to (18) and (20), the most “spectrally critical” off-tree edges identified by (14) will impact the largest eigenvalues of as well as the bottom (smallest nonzero) eigenvalues of , since the smallest values directly contribute to the largest components in trace of . This fact enables to recover small portions of most “spectrally critical” off-tree edges to the LSST for preserving the key spectral graph properties within the sparsified graph.

3.5 Criteria for selecting off-tree edges.

We propose to iteratively recover small portions of top off-tree edges while checking the stabilities of bottom eigenvalues of the sparsified Laplacian. We can stop adding off-tree edges when the bottom eigenvalues become sufficiently stable and output the final spectral sparsifier for spectral clustering purpose.

3.5.1 Eigen-stability checking.

We propose a novel method for checking the stability of bottom eigenvalues of the sparsified Laplacian. Our approach proceeds as follows: 1) in each iteration for recovering off-tree edges, we compute and record the several smallest eigenvalues of the latest sparsified Laplacian: for example, the bottom k eigenvalues that are critical for spectral clustering tasks; 2) we determine whether more off-tree edges should be recovered by looking at the stability by comparing with the eigenvalues computed in the previous iteration: if the change of eigenvalues is significant, more off-tree edges should be added to the current sparsifier. More specifically, we store the bottom k eigenvalues computed in the previous (current) iteration into vector (), and calculate the eigenvalue variation ratio by:

(22)

A greater eigenvalue variation ratio indicates less stable eigenvalues within the latest sparsified graph Laplacian, and thus justifies another iteration to allow adding more ”spectrally-critical” off-tree edges into the sparsifier.

3.5.2 Inexact yet fast eigenvalue computation.

The software package ARPACK has become the standard solver for solving practical large-scale eigenvalue problems [32].[6] shows that the ARPACK employs implicitly iterative restarted Arnoldi process that contains at most () steps, where is the Arnoldi length empirically set to for sparse matrices. Since the cost of each iteration is for a sparse matrix-vector product, the overall runtime cost of ARPACK solver is proportional to , and the memory cost is , where is the number of data points, is the arnoldi length, is the number of nearest neighbors, and is the number of desired eigenvalues.

Algorithms with a sparsified Laplacian can dramatically reduce the time and memory cost of the ARPACK solver due to the dramatically reduced . To gain high efficiency, we propose to quickly compute eigenvalues for stability checking based on an inexact implicitly restarted Arnoldi method [33]. It has been shown that by relaxing the tolerance, the total inner iteration counts can be significantly reduced, while the inner iteration cost can be further reduced by using subspace recycling with iterative linear solver.

3.6 Incremental graph densification.

Using the spectral sparsifier (u-NN graph) computed by the proposed spectral sparsification method can achieve similar spectral clustering quality as the original k-NN graph. To further improve clustering accuracy, an incremental graph densification scheme is introduced in this work for identifying extra (new) edges that have been missing in the original k-NN graph but can still be critical for spectral clustering purpose.

The proposed graph densification procedure is achieved by leveraging an efficient graph embedding scheme that can approximately preserve the distances between nodes similar to the effective resistance metric used in [17] for sampling edges. Our graph embedding scheme is motivated by one-dimensional graph embedding (into a line graph) using the Fielder vector that corresponds to the smallest nonzero eigenvalue of graph Laplacian. However, it can be too expensive to compute the exact Fielder vector in practice since many inverse power iterations may be required. To this end, we propose to perform only a small number of inverse power iterations using multiple random vectors, which can also lead to decent graph embedding results. Our embedding scheme requires to solve the sparsified Laplacian with a few () random right-hand-side (RHS) vectors, while the solution vectors can be subsequently used for embedding the sparsified graph into -dimensional space. As a result, extra “spectrally-critical” edges that connect between remote nodes in the sparsifier can be effectively identified and added to the sparsifier if their stretch values are large enough. The extra edges that are missing in the original k-NN graph but included in the latest sparsified graph can significantly improve the spectral clustering accuracy.

In the following, we show that when one-step inverse power iteration is used for graph embedding, the distance between nodes is very similar to the effective-resistance distance in [17]. If the Laplacian of edge is given by:

(23)

where can be expressed using eigenvectors of as:

(24)

the effective resistance between and using the spectral decomposition in (19) can be written as:

(25)

Write a random vector that is orthogonal to all-1 vector using eigenvectors of as:

(26)

Performing one-step inverse power iteration with leads to:

(27)

If we use to embed the sparsified graph into a (one-dimensional) line graph, the distance between nodes and after embedding becomes:

(28)

It can be observed from (28) and (25) that the distance after using either graph embedding method is mainly influenced by the smallest nonzero Laplacian eigenvalue () of , while there is only slight difference between the effective-resistance distance (25) and the upper-bound distance obtained by the proposed embedding scheme: the random factor is replaced by . When using multiple random vectors and more steps of inverse power iterations, we can effectively reduce the impact due to these random factors, and thus achieve decent graph embedding results for identifying the extra ”spectrally-critical” edges.

It should be noted that as observed in our extensive experiments, the amount of extra edges added in the incremental graph densification procedure is usually less than , which thus will not significantly increase the complexity of the sparsifier .

3.7 Algorithm flow and complexity analysis.

The complete algorithm flow for spectrum-preserving graph sparsification has been shown in Algorithm 1. The complexity of the proposed method proposed is also analyzed as follows: step 1 for constructing low-stretch spanning tree (LSST) takes nearly-linear time according to [30]; step 2 can also be finished in almost linear time for a fixed power iteration number since the factorization of the spanning-tree Laplacian can be done within linear time [25]; the cost associated with steps 4-8 for the eigen-stability checking procedure has been analyzed in section 3.5.2; the cost associated with steps 9-10 for the incremental graph densification procedure can be done efficiently since the sparsified Laplacian can be solved quickly either using preconditioned iterative or direct methods. Based on the above discussion, we expect the overall algorithm to be highly scalable even for handling very large-scale data sets.

Input: The original nearest-neighbor (NN) graph and the number of clusters k.
Output: A spectrally-similar u-NN graph .

1:  Construct an initial LSST from the original graph [30];
2:  Embed off-tree edges via spectral perturbation analysis [25];
3:  Rank off-tree edges by spectral criticality levels (14);
4:  Iteratively add small portions of off-tree edges to the LSST:
5:  while the bottom k eigenvalues are not stable: do
6:     add a small portion of critical off-tree edges;
7:     check the stability of the bottom k eigenvalues with (22);
8:  end while
9:  Perform graph embedding for the sparsifier using a few inverse power iterations with multiple random vectors;
10:  Add extra spectrally-critical edges with large stretches to ;
11:  Return the final u-NN graph for spectral clustering.
Algorithm 1 Spectrum-preserving graph sparsification.

4 Experimental Evaluation

We perform extensive experiments to demonstrate the effectiveness of our proposed method. All the experiments have been performed using C++ (spectral sparsification engine) and MATLAB R2015A running on a Linux machine with GHz CPU and GB memory. The reported results are averaged over runs.

4.1 Data sets.

Several real-word data sets are used in our experiments. COIL-20: A dataset contains images of 20 different objects, and each of them has 72 normalized gray-scale images. PenDigits: A dataset consists of 7,494 images of handwritten digits from 44 writers, using the sampled coordination information. USPS: A dataset with 9,298 scanned hand-written digits on the envelops from U.S. Postal Service. MNIST: A dataset consists of 70,000 images of handwritten digits. RCV1: It consists of 193,844 documents of newswire stories in 103 categories [6]. The statistics of these data sets are shown in Table 1.

Data set Size Dimensions Classes
COIL-20 1,440 1,024 20
PenDigits 7,494 16 10
USPS 9,298 256 10
MNIST 70,000 784 10
RCV1 193,844 47,236 103
Table 1: Statistics of the data sets.

4.2 Algorithms for comparison.

We compare our method with the following clustering algorithms: 1) standard spectral clustering (SC) algorithm, 2) Nyström method [8] (MATLAB implementation from [6]), and 3) landmark-based spectral clustering (LSC-K) algorithm in [12].

4.3 Parameters selection.

The number of nearest neighbors, k is set to 10 for all data sets, except for the RCV1 data set with k being set to 80. We use the following Gaussian kernel as the similarity function for converting the original distance matrix to the affinity matrix:

(29)

We also adopt the self-tuning method [34] to determine the scaling parameter . The sparsification threshold for recovering off-tree edges is defined as , which is a good indicator of density for the sparsified graph. For all our data sets, is set to be between to . In these experiments, we add less than in the graph densification procedure.

4.4 Evaluation metric.

We measure the quality of clustering in two metrics: clustering accuracy (ACC) and normalized mutual information (NMI) [36] between the clustering results generated by clustering algorithms and the ground-truth labels provided by the data sets. A higher value of indicates better clustering quality. The NMI value is in the range of [0, 1], while a higher NMI value indicates a better matching between the algorithm-generated result and ground-truth result.

4.5 Experimental results.

Table 2 and Fig. 3 shows the impact of adding off-tree edges to the stability (variation ratio) of the bottom eigenvalues computed by (22),which verifies the theoretical foundation of the proposed method. Adding extra off-tree edges will immediately reduce the variation ratio of the bottom eigenvalues, indicating a gradually improved eigenvalue stability. It is also observed that by adding only a few (less than ) off-tree edges to the spanning tree, very good spectral clustering results can be obtained.

Figure 3: Variation ratio of bottom eigenvalues with increasing number of off-tree edges.
sparTH COIL-20 PenDigits USPS MNIST RCV1
0.05 2.24 12.52 28.68 277.60 42.04
0.10 0.66 0.49 0.93 1.15 1.36
0.15 0.15 0.36 0.48 0.43 0.59
0.20 0.15 0.11 0.40 0.21 0.42
0.25 0.13 0.10 0.19 0.18 0.3
0.30 0.11 0.08 0.15 0.10 0.28
Table 2: Stability result of bottom eigenvalues of the sparsified Laplacian after recovering small portions of off-tree edges to the initial spanning tree.
Data Set SC on Original k-NN Nyström LSC-K SC on u-NN
COIL-20 64.5 55.3 75.1 75.2
PenDigits 73.2 73.0 79.3 80.1
USPS 65.4 63.3 65.2 76.7
MNIST 66.1 53.7 67.0 66.8
RCV1 18.4 17.9 17.8 17.1

Table 3: Clustering accuracy results (%).
Data Set SC on Original k-NN Nyström LSC-K SC on u-NN
COIL-20 0.82 0.76 0.87 0.86
PenDigits 0.78 0.75 0.80 0.80
USPS 0.79 0.60 0.78 0.80
MNIST 0.72 0.59 0.71 0.70
RCV1 0.28 0.25 0.24 0.23
Table 4: NMI results.
Data Set SC on Original k-NN Nyström LSC-K SC on u-NN
COIL-20 0.87 0.78 0.73 0.56
PenDigits 0.67 0.39 0.52 0.45
USPS 1.7 0.45 1.53 0.41
MNIST 143 23.2 16.2 2.7
RCV1 - 282.46 276.5 170.3

Table 5: Runtime results (seconds).
Data Set NNZ (orig.) NNZ (spar.)
COIL-20 17624 3210
PenDigits 101216 18836
USPS 136762 24252
MNIST 1043418 178108
RCV1 23718138 503994
Table 6: Affinity matrix density comparison.
Figure 4: The original graph corresponding to the original affinity matrix (USPS).
Figure 5: The spanning tree of the original graph (USPS).
Figure 6: The sparsified graph corresponding to the affinity matrix (USPS) with sparTH=0.1.

Clustering quality results are provided in Table 3 and Table 4. Since it is impossible to run standard spectral clustering algorithm on the original RCV1 data set due to its extremely large size, the results of its clustering accuracy, and NMI are obtained from [6], which represent the best results generated by distributed computing systems.

The runtime results are listed in Table 5. It shows that for large data sets, the proposed method can dramatically improve the runtime efficiency: spectral clustering of the sparsified MNIST data set is over faster than clustering the original data set; the original RCV1 data set can not be handled by using original k-NN graph on our server with memory, while only a few minutes are required for clustering the sparsified data set.

To show the effectiveness of spectral sparsification in reducing graph complexity, we also list the numbers of non-zero (NNZ) elements in the affinity matrices in Table 6. It is observed that the nearly-linear-sized Laplacians can be much smaller than the original ones, leading to dramatically improved memory/storage efficiency for spectral clustering tasks. It is expected that the proposed method will be a key enabler for storing and processing much bigger data sets on more energy-efficient computing platforms, such as FPGAs or even hand-held devices.

We also visualize the original graph, the initial spanning tree and the spectral sparsifier according to their affinity matrices for the USPS data set in Fig. 4, Fig. 5 and Fig. 6, respectively. It is observed that the initial spanning tree seems to be a very poor approximation of the original graph, while adding only off-tree edges already leads to a good approximation of the original k-NN graph.

5 Conclusions

In this work, we introduce a spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse graph sparsifiers that can well preserve the first few eigenvectors of the original graph Laplacian, which immediately enables highly scalable spectral clustering without loss of accuracy. Our method starts from constructing a “spectrally-critical” low-stretch spanning tree (LSST), which is followed by a novel spectral off-tree edge embedding scheme for identifying and recovering a small portion of off-tree edges that are most critical to preserving the bottom eigenvalues and eigenvectors of the original Laplacian. In the last, extra edges are added to the sparsifier via an incremental graph densification procedure to form almost-linear-sized spectral graph sparsifiers that immediately lead to highly scalable and accurate spectral clustering. Our extensive experimental results on a variety of well-known data sets demonstrate significant speedups over traditional spectral clustering method without sacrificing clustering quality.

References

  • [1] D. Spielman and S. Teng, “Spectral partitioning works: Planar graphs and finite element meshes,” in Foundations of Computer Science (FOCS), 1996. Proceedings., 37th Annual Symposium on.   IEEE, 1996, pp. 96–105.
  • [2] A. Y. Ng, M. I. Jordan, Y. Weiss et al.

    , “On spectral clustering: Analysis and an algorithm,”

    Advances in neural information processing systems, vol. 2, pp. 849–856, 2002.
  • [3] P. Kolev and K. Mehlhorn, “A note on spectral clustering,” arXiv preprint arXiv:1509.09188, 2015.
  • [4] R. Peng, H. Sun, and L. Zanetti, “Partitioning well-clustered graphs: Spectral clustering works,” in Proceedings of The 28th Conference on Learning Theory (COLT), 2015, pp. 1423–1455.
  • [5] J. R. Lee, S. O. Gharan, and L. Trevisan, “Multiway spectral partitioning and higher-order cheeger inequalities,” Journal of the ACM (JACM), vol. 61, no. 6, p. 37, 2014.
  • [6] W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E. Y. Chang, “Parallel spectral clustering in distributed systems,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 3, pp. 568–586, 2011.
  • [7]

    M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”

    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2227–2240, 2014.
  • [8] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral grouping using the nystrom method,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 2, pp. 214–225, 2004.
  • [9] C. Williams and M. Seeger, “Using the nyström method to speed up kernel machines,” in Proceedings of the 14th annual conference on neural information processing systems, no. EPFL-CONF-161322, 2001, pp. 682–688.
  • [10] A. Choromanska, T. Jebara, H. Kim, M. Mohan, and C. Monteleoni, “Fast spectral clustering via the nyström method,” in International Conference on Algorithmic Learning Theory.   Springer, 2013, pp. 367–381.
  • [11] K. Zhang, I. W. Tsang, and J. T. Kwok, “Improved nyström low-rank approximation and error analysis,” in Proceedings of the 25th international conference on Machine learning.   ACM, 2008, pp. 1232–1239.
  • [12] X. Chen and D. Cai, “Large scale spectral clustering with landmark-based representation.” in AAAI, 2011.
  • [13] D. Yan, L. Huang, and M. I. Jordan, “Fast approximate spectral clustering,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2009, pp. 907–916.
  • [14] J. Liu, C. Wang, M. Danilevsky, and J. Han, “Large-scale spectral clustering on graphs,” in

    Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

    .   AAAI Press, 2013, pp. 1486–1492.
  • [15] V. Satuluri, S. Parthasarathy, and Y. Ruan, “Local graph sparsification for scalable clustering,” in Proceedings of the 2011 ACM International Conference on Management of data (SIGMOD).   ACM, 2011, pp. 721–732.
  • [16] D. Spielman and S. Teng, “Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems,” in Proc. ACM STOC, 2004, pp. 81–90.
  • [17] D. Spielman and N. Srivastava, “Graph sparsification by effective resistances,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1913–1926, 2011.
  • [18] D. Spielman, “Algorithms, graph theory, and linear equations in laplacian matrices,” in Proceedings of the International Congress of Mathematicians, vol. 4, 2010, pp. 2698–2722.
  • [19] A. Kolla, Y. Makarychev, A. Saberi, and S. Teng, “Subgraph sparsification and nearly optimal ultrasparsifiers,” in Proc. ACM STOC, 2010, pp. 57–66.
  • [20] I. Koutis, G. Miller, and R. Peng, “Approaching Optimality for Solving SDD Linear Systems,” in Proc. IEEE FOCS, 2010, pp. 235–244.
  • [21] W. Fung, R. Hariharan, N. Harvey, and D. Panigrahi, “A general framework for graph sparsification,” in Proc. ACM STOC, 2011, pp. 71–80.
  • [22] D. Spielman and S.-H. Teng, “Spectral sparsification of graphs,” SIAM Journal on Computing, vol. 40, no. 4, pp. 981–1025, 2011.
  • [23] P. Christiano, J. Kelner, A. Madry, D. Spielman, and S. Teng, “Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs,” in Proc. ACM STOC, 2011, pp. 273–282.
  • [24] D. Spielman and S. Teng, “Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems,” SIAM Journal on Matrix Analysis and Applications, vol. 35, no. 3, pp. 835–885, 2014.
  • [25] Z. Feng, “Spectral graph sparsification in nearly-linear time leveraging efficient spectral perturbation analysis,” in Proceedings of the 53rd Annual Design Automation Conference.   ACM, 2016, p. 57.
  • [26] J. Batson, D. Spielman, N. Srivastava, and S.-H. Teng, “Spectral sparsification of graphs: theory and algorithms,” Communications of the ACM, vol. 56, no. 8, pp. 87–94, 2013.
  • [27] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
  • [28] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888–905, 2000.
  • [29] A. A. Benczúr and D. R. Karger, “Approximating st minimum cuts in õ (n 2) time,” in

    Proceedings of the twenty-eighth annual ACM symposium on Theory of computing

    .   ACM, 1996, pp. 47–55.
  • [30] I. Abraham and O. Neiman, “Using petal-decompositions to build a low stretch spanning tree,” in Proceedings of the forty-fourth annual ACM symposium on Theory of computing.   ACM, 2012, pp. 395–406.
  • [31] D. Spielman and J. Woo, “A note on preconditioning by low-stretch spanning trees,” arXiv preprint arXiv:0903.2816, 2009.
  • [32] R. Lehoucq, D. Sorensen, and C. Yang, “Arpack users’ guide: Solution of large scale eigenvalue problems with implicitly restarted arnoldi methods.” Software Environ. Tools, vol. 6, 1997.
  • [33] F. Xue, “Numerical solution of eigenvalue problems with spectral transformations,” Ph.D. dissertation, 2009.
  • [34] L. Zelnik-Manor and P. Perona, “Self-tuning spectral clustering.” in NIPS, vol. 17, no. 1601-1608, 2004, p. 16.
  • [35] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity.   Courier Corporation, 1982.
  • [36] A. Strehl and J. Ghosh, “Cluster ensembles—a knowledge reuse framework for combining multiple partitions,” Journal of machine learning research, vol. 3, no. Dec, pp. 583–617, 2002.