Revisiting k-Nearest Neighbor Graph Construction on High-Dimensional Data : Experiments and Analyses

12/04/2021
by   Liu Yingfan, et al.
0

The k-nearest neighbor graph (KNNG) on high-dimensional data is a data structure widely used in many applications such as similarity search, dimension reduction and clustering. Due to its increasing popularity, several methods under the same framework have been proposed in the past decade. This framework contains two steps, i.e. building an initial KNNG (denoted as ) and then refining it by neighborhood propagation (denoted as ). However, there remain several questions to be answered. First, it lacks a comprehensive experimental comparison among representative solutions in the literature. Second, some recently proposed indexing structures, e.g., SW and HNSW, have not been used or tested for building an initial KNNG. Third, the relationship between the data property and the effectiveness of is still not clear. To address these issues, we comprehensively compare the representative approaches on real-world high-dimensional data sets to provide practical and insightful suggestions for users. As the first attempt, we take SW and HNSW as the alternatives of in our experiments. Moreover, we investigate the effectiveness of and find the strong correlation between the huness phenomenon and the performance of .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Graph Laplacians on Shared Nearest Neighbor graphs and graph Laplacians on k-Nearest Neighbor graphs having the same limit

A Shared Nearest Neighbor (SNN) graph is a type of graph construction us...
research
07/19/2011

Unsupervised K-Nearest Neighbor Regression

In many scientific disciplines structures in high-dimensional data have ...
research
10/17/2018

Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data

Searching for high-dimensional vector data with high accuracy is an inev...
research
10/12/2019

Neighborhood Growth Determines Geometric Priors for Relational Representation Learning

The problem of identifying geometric structure in heterogeneous, high-di...
research
04/10/2019

A Nonparametric Normality Test for High-dimensional Data

Many statistical methodologies for high-dimensional data assume the popu...
research
10/30/2019

Learning pairwise Markov network structures using correlation neighborhoods

Markov networks are widely studied and used throughout multivariate stat...
research
03/26/2021

UMAP does not reproduce high-dimensional similarities due to negative sampling

UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimen...

Please sign up or login with your details

Forgot password? Click here to reset