CLAM-Accelerated K-Nearest Neighbors Entropy-Scaling Search of Large High-Dimensional Datasets via an Actualization of the Manifold Hypothesis

09/11/2023
by   Morgan E. Prior, et al.
0

Many fields are experiencing a Big Data explosion, with data collection rates outpacing the rate of computing performance improvements predicted by Moore's Law. Researchers are often interested in similarity search on such data. We present CAKES (CLAM-Accelerated K-NN Entropy Scaling Search), a novel algorithm for k-nearest-neighbor (k-NN) search which leverages geometric and topological properties inherent in large datasets. CAKES assumes the manifold hypothesis and performs best when data occupy a low dimensional manifold, even if the data occupy a very high dimensional embedding space. We demonstrate performance improvements ranging from hundreds to tens of thousands of times faster when compared to state-of-the-art approaches such as FAISS and HNSW, when benchmarked on 5 standard datasets. Unlike locality-sensitive hashing approaches, CAKES can work with any user-defined distance function. When data occupy a metric space, CAKES exhibits perfect recall.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2019

Clustered Hierarchical Entropy-Scaling Search of Astronomical and Biological Data

Both astronomy and biology are experiencing explosive growth of data, re...
research
02/04/2019

2-D Embedding of Large and High-dimensional Data with Minimal Memory and Computational Time Requirements

In the advent of big data era, interactive visualization of large data s...
research
12/13/2021

Fast Single-Core K-Nearest Neighbor Graph Computation

Fast and reliable K-Nearest Neighbor Graph algorithms are more important...
research
12/05/2018

Improving Similarity Search with High-dimensional Locality-sensitive Hashing

We propose a new class of data-independent locality-sensitive hashing (L...
research
04/24/2017

Accelerated Nearest Neighbor Search with Quick ADC

Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a f...
research
07/01/2019

Geodesic Centroidal Voronoi Tessellations: Theories, Algorithms and Applications

Nowadays, big data of digital media (including images, videos and 3D gra...
research
09/04/2017

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

We present FLASH ( Fast LSH Algorithm for Similarity search accelerat...

Please sign up or login with your details

Forgot password? Click here to reset