Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

03/24/2022
by   M. Saquib Sarfraz, et al.
0

Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne

READ FULL TEXT

page 3

page 8

research
07/03/2019

Spectral Overlap and a Comparison of Parameter-Free, Dimensionality Reduction Quality Metrics

Nonlinear dimensionality reduction methods are a popular tool for data s...
research
11/16/2017

A New Method for Performance Analysis in Nonlinear Dimensionality Reduction

In this paper, we develop a local rank correlation measure which quantif...
research
12/02/2019

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

This paper introduces scikit-hubness, a Python package for efficient nea...
research
02/27/2023

In search of the most efficient and memory-saving visualization of high dimensional data

Interactive exploration of large, multidimensional datasets plays a very...
research
07/31/2018

t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data

Modern datasets and models are notoriously difficult to explore and anal...
research
09/17/2016

ADAGIO: Fast Data-aware Near-Isometric Linear Embeddings

Many important applications, including signal reconstruction, parameter ...
research
02/23/2019

Near neighbor preserving dimension reduction for doubling subsets of ℓ_1

Randomized dimensionality reduction has been recognized as one of the fu...

Please sign up or login with your details

Forgot password? Click here to reset