UMAP does not reproduce high-dimensional similarities due to negative sampling

03/26/2021
by   Sebastian Damrich, et al.
0

UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, while the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's effective loss function in closed form and find that it differs from the published one. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the shared k nearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Instead, we claim that the key to UMAP's success is its implicit balancing of attraction and repulsion resulting from negative sampling. This balancing in turn facilitates optimization via gradient descent. We corroborate our theoretical findings on toy and single cell RNA sequencing data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Graph Laplacians on Shared Nearest Neighbor graphs and graph Laplacians on k-Nearest Neighbor graphs having the same limit

A Shared Nearest Neighbor (SNN) graph is a type of graph construction us...
research
12/01/2015

Implicit Sparse Code Hashing

We address the problem of converting large-scale high-dimensional image ...
research
07/19/2011

Unsupervised K-Nearest Neighbor Regression

In many scientific disciplines structures in high-dimensional data have ...
research
03/24/2018

Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation in deep learning

In this paper we model the loss function of high-dimensional optimizatio...
research
12/04/2021

Revisiting k-Nearest Neighbor Graph Construction on High-Dimensional Data : Experiments and Analyses

The k-nearest neighbor graph (KNNG) on high-dimensional data is a data s...
research
07/13/2022

Balancing polynomials, Fibonacci numbers and some new series for π

We evaluate some types of infinite series with balancing and Lucas-balan...

Please sign up or login with your details

Forgot password? Click here to reset