Clustering with t-SNE, provably

06/08/2017
by   George C. Linderman, et al.
0

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter α and step size h. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

READ FULL TEXT

page 1

page 5

page 10

research
11/01/2019

Optimality of Spectral Clustering for Gaussian Mixture Model

Spectral clustering is one of the most popular algorithms to group high ...
research
06/05/2018

Understanding Regularized Spectral Clustering via Graph Conductance

This paper uses the relationship between graph conductance and spectral ...
research
06/24/2020

Consistency of Anchor-based Spectral Clustering

Anchor-based techniques reduce the computational complexity of spectral ...
research
10/05/2013

Role of normalization in spectral clustering for stochastic blockmodels

Spectral clustering is a technique that clusters elements using the top ...
research
02/22/2023

Approximate spectral clustering with eigenvector selection and self-tuned k

The recently emerged spectral clustering surpasses conventional clusteri...
research
06/30/2021

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presentin...
research
11/21/2019

Local Spectral Clustering of Density Upper Level Sets

We analyze the Personalized PageRank (PPR) algorithm, a local spectral m...

Please sign up or login with your details

Forgot password? Click here to reset