Re-embedding data to strengthen recovery guarantees of clustering

01/26/2023
by   Tao Jiang, et al.
5

We propose a clustering method that involves chaining four known techniques into a pipeline yielding an algorithm with stronger recovery guarantees than any of the four components separately. Given n points in ℝ^d, the first component of our pipeline, which we call leapfrog distances, is reminiscent of density-based clustering, yielding an n× n distance matrix. The leapfrog distances are then translated to new embeddings using multidimensional scaling and spectral methods, two other known techniques, yielding new embeddings of the n points in ℝ^d', where d' satisfies d'≪ d in general. Finally, sum-of-norms (SON) clustering is applied to the re-embedded points. Although the fourth step (SON clustering) can in principle be replaced by any other clustering method, our focus is on provable guarantees of recovery of underlying structure. Therefore, we establish that the re-embedding improves recovery SON clustering, since SON clustering is a well-studied method that already has provable guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2013

Recovery guarantees for exemplar-based clustering

For a certain class of distributions, we prove that the linear programmi...
research
02/18/2020

Hierarchical Correlation Clustering and Tree Preserving Embedding

We propose a hierarchical correlation clustering method that extends the...
research
05/29/2019

Clustering without Over-Representation

In this paper we consider clustering problems in which each point is end...
research
07/23/2021

The decomposition of the higher-order homology embedding constructed from the k-Laplacian

The null space of the k-th order Laplacian ℒ_k, known as the k-th homolo...
research
12/01/2022

Clustering – Basic concepts and methods

We review clustering as an analysis tool and the underlying concepts fro...
research
04/28/2021

Sum-of-norms clustering does not separate nearby balls

Sum-of-norms clustering is a popular convexification of K-means clusteri...
research
09/08/2019

Iterative Spectral Method for Alternative Clustering

Given a dataset and an existing clustering as input, alternative cluster...

Please sign up or login with your details

Forgot password? Click here to reset