Spectral Clustering, Spanning Forest, and Bayesian Forest Process

02/01/2022
by   Leo L. Duan, et al.
0

Spectral clustering algorithms are very popular. Starting from a pairwise similarity matrix, spectral clustering gives a partition of data that approximately minimizes the total similarity scores across clusters. Since there is no need to model how data are distributed within each cluster, such a method enjoys algorithmic simplicity and robustness in clustering non-Gaussian data such as those near manifolds. Nevertheless, several important questions are unaddressed, such as how to estimate the similarity scores and cluster assignment probabilities, as important uncertainty estimates in clustering. In this article, we propose to solve these problems with a discovered generative modeling counterpart. Our clustering model is based on a spanning forest graph that consists of several disjoint spanning trees, with each tree corresponding to a cluster. Taking a Bayesian approach, we assign proper densities on the root and leaf nodes, and we prove that the posterior mode is almost the same as spectral clustering estimates. Further, we show that the associated generative process, named "forest process", is a continuous extension to the classic urn process, hence inheriting many nice properties such as having unbounded support for the number of clusters and being amenable to existing partition probability function; at the same time, we carefully characterize their differences. We demonstrate a novel application in joint clustering of multiple-subject functional magnetic resonance imaging scans of the human brain.

READ FULL TEXT
research
11/14/2012

Spectral Clustering: An empirical study of Approximation Algorithms and its Application to the Attrition Problem

Clustering is the problem of separating a set of objects into groups (ca...
research
07/04/2012

Unsupervised spectral learning

In spectral clustering and spectral image segmentation, the data is part...
research
09/24/2009

Initialization Free Graph Based Clustering

This paper proposes an original approach to cluster multi-component data...
research
04/24/2019

Construction of the similarity matrix for the spectral clustering method: numerical experiments

Spectral clustering is a powerful method for finding structure in a data...
research
02/24/2023

Bayesian contiguity constrained clustering, spanning trees and dendrograms

Clustering is a well-known and studied problem, one of its variants, cal...
research
04/25/2020

Local Graph Clustering with Network Lasso

We study the statistical and computational properties of a network Lasso...
research
11/11/2017

Differential Performance Debugging with Discriminant Regression Trees

Differential performance debugging is a technique to find performance pr...

Please sign up or login with your details

Forgot password? Click here to reset