Learning by Unsupervised Nonlinear Diffusion

10/15/2018
by   Mauro Maggioni, et al.
0

This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with density estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is to introduce time of a data-adapted diffusion process as a scale parameter that is different from the local spatial scale parameter used in many clustering and learning algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of the proposed learning by unsupervised nonlinear diffusion (LUND) algorithm. We implement the LUND algorithm numerically and confirm its theoretical properties on illustrative datasets, showing that the proposed method enjoys both theoretical and empirical advantages over current spectral clustering and density-based clustering techniques.

READ FULL TEXT

page 6

page 9

page 22

page 27

page 29

page 30

page 31

research
02/08/2019

Spectral-Spatial Diffusion Geometry for Hyperspectral Image Clustering

An unsupervised learning algorithm to cluster hyperspectral image (HSI) ...
research
10/11/2021

Density-Based Clustering with Kernel Diffusion

Finding a suitable density function is essential for density-based clust...
research
03/07/2020

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

Data-dependent metrics are powerful tools for learning the underlying st...
research
01/31/2021

A Multiscale Environment for Learning by Diffusion

Clustering algorithms partition a dataset into groups of similar points....
research
02/11/2010

Operator norm convergence of spectral clustering on level sets

Following Hartigan, a cluster is defined as a connected component of the...
research
06/19/2017

Capacity Releasing Diffusion for Speed and Locality

Diffusions and related random walk procedures are of central importance ...
research
07/07/2023

Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

We analyze the convergence properties of Fermat distances, a family of d...

Please sign up or login with your details

Forgot password? Click here to reset