A Multiscale Environment for Learning by Diffusion

01/31/2021
by   James M. Murphy, et al.
20

Clustering algorithms partition a dataset into groups of similar points. The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the algorithm's performance and establish its computational efficiency. Finally, we show that the M-LUND clustering algorithm detects the latent structure in a range of synthetic and real datasets.

READ FULL TEXT

page 24

page 25

page 26

page 29

page 30

page 31

page 32

research
03/29/2021

Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry

Clustering algorithms partition a dataset into groups of similar points....
research
05/07/2023

Persistent Homology of the Multiscale Clustering Filtration

In many applications in data clustering, it is desirable to find not jus...
research
10/15/2018

Learning by Unsupervised Nonlinear Diffusion

This paper proposes and analyzes a novel clustering algorithm that combi...
research
04/17/2018

Multivariate Gaussian Process Regression for Multiscale Data Assimilation and Uncertainty Reduction

We present a multivariate Gaussian process regression approach for param...
research
03/07/2020

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

Data-dependent metrics are powerful tools for learning the underlying st...
research
06/22/2020

A Multiscale Graph Convolutional Network Using Hierarchical Clustering

The information contained in hierarchical topology, intrinsic to many ne...

Please sign up or login with your details

Forgot password? Click here to reset