HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

04/18/2018
by   Akhil Arora, et al.
0

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.

READ FULL TEXT

page 10

page 12

page 16

research
11/24/2020

Efficient Approximate Nearest Neighbor Search for Multiple Weighted l_p≤2 Distance Functions

Nearest neighbor search is fundamental to a wide range of applications. ...
research
12/08/2017

Exploiting Modern Hardware for High-Dimensional Nearest Neighbor Search

Many multimedia information retrieval or machine learning problems requi...
research
01/15/2020

Complete and Sufficient Spatial Domination of Multidimensional Rectangles

Rectangles are used to approximate objects, or sets of objects, in a ple...
research
11/06/2018

High Dimensional Clustering with r-nets

Clustering, a fundamental task in data science and machine learning, gro...
research
02/18/2011

Searching in one billion vectors: re-rank with source coding

Recent indexing techniques inspired by source coding have been shown suc...
research
09/23/2015

Fast k-NN search

Efficient index structures for fast approximate nearest neighbor queries...
research
08/05/2023

DeDrift: Robust Similarity Search under Content Drift

The statistical distribution of content uploaded and searched on media s...

Please sign up or login with your details

Forgot password? Click here to reset