Clustering with UMAP: Why and How Connectivity Matters

08/12/2021
by   Ayush Dalmia, et al.
0

Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in high-dimensional data. These methods have strong mathematical foundations and are based on the intuition that the topology in low dimensions should be close to that of high dimensions. Given that the initial topological structure is a precursor to the success of the algorithm, this naturally raises the question: What makes a "good" topological structure for dimensionality reduction? design better algorithms which take into account both local and global structure. In this paper which focuses on UMAP, we study the effects of node connectivity (k-Nearest Neighbors vs mutual k-Nearest Neighbors) and relative neighborhood (Adjacent via Path Neighbors) on dimensionality reduction. We explore these concepts through extensive ablation studies on 4 standard image and text datasets; MNIST, FMNIST, 20NG, AG, reducing to 2 and 64 dimensions. Our findings indicate that a more refined notion of connectivity (mutual k-Nearest Neighbors with minimum spanning tree) together with a flexible method of constructing the local neighborhood (Path Neighbors), can achieve a much better representation than default UMAP, as measured by downstream clustering performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

Topological Stability: Guided Determination of the Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

In machine learning field, dimensionality reduction is one of the import...
research
11/13/2019

Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

In the machine learning field, dimensionality reduction is an important ...
research
12/11/2019

Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

The central aim of this paper is to implement Deep Autoencoder and Neigh...
research
09/22/2021

The Curse Revisited: a Newly Quantified Concept of Meaningful Distances for Learning from High-Dimensional Noisy Data

Distances between data points are widely used in point cloud representat...
research
02/18/2020

A flexible outlier detector based on a topology given by graph communities

Outlier, or anomaly, detection is essential for optimal performance of m...
research
01/12/2015

Navigating the Semantic Horizon using Relative Neighborhood Graphs

This paper is concerned with nearest neighbor search in distributional s...
research
09/06/2023

GroupEnc: encoder with group loss for global structure preservation

Recent advances in dimensionality reduction have achieved more accurate ...

Please sign up or login with your details

Forgot password? Click here to reset