Clustering of graph vertex subset via Krylov subspace model reduction

09/09/2018
by   Vladimir Druskin, et al.
0

Clustering via graph-Laplacian spectral imbedding is ubiquitous in data science and machine learning. However, it becomes less efficient for large data sets due to two factors. First, computing the partial eigendecomposition of the graph-Laplacian typically requires a large Krylov subspace. Second, after the spectral imbedding is complete, the clustering is typically performed with various relaxations of k-means, which may become prone to getting stuck in local minima and scale poorly in terms of computational cost for large data sets. Here we propose two novel algorithms for spectral clustering of a subset of the graph vertices (target subset) based on the theory of model order reduction. They rely on realizations of a reduced order model (ROM) that accurately approximates the diffusion transfer function of the original graph for inputs and outputs restricted to the target subset. While our focus is limited to this subset, our algorithms produce its clustering that is consistent with the overall structure of the graph. Moreover, working with a small target subset reduces greatly the required dimension of Krylov subspace and allows to exploit the approximations of k-means in the regimes when they are most robust and efficient, as verified by the numerical experiments. There are several uses for our algorithms. First, they can be employed on their own to clusterize a representative subset in cases when the full graph clustering is either infeasible or not required. Second, they may be used for quality control. Third, as they drastically reduce the problem size, they enable the application of more powerful approximations of k-means like those based on semi-definite programming (SDP) instead of the conventional Lloyd's algorithm. Finally, they can be used as building blocks of a divide-and-conquer algorithm for the full graph clustering. The latter will be reported in a separate article.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical Aspects

Spectral clustering techniques are valuable tools in signal processing a...
research
07/07/2016

Mini-Batch Spectral Clustering

The cost of computing the spectrum of Laplacian matrices hinders the app...
research
05/11/2023

Spectral Clustering on Large Datasets: When Does it Work? Theory from Continuous Clustering and Density Cheeger-Buser

Spectral clustering is one of the most popular clustering algorithms tha...
research
03/09/2023

Optimizing network robustness via Krylov subspaces

We consider the problem of attaining either the maximal increase or redu...
research
08/18/2017

Two provably consistent divide and conquer clustering algorithms for large networks

In this article, we advance divide-and-conquer strategies for solving th...
research
02/13/2023

Kernelized Diffusion maps

Spectral clustering and diffusion maps are celebrated dimensionality red...
research
09/03/2019

Incrementally Updated Spectral Embeddings

Several fundamental tasks in data science rely on computing an extremal ...

Please sign up or login with your details

Forgot password? Click here to reset