OOD-DiskANN: Efficient and Scalable Graph ANNS for Out-of-Distribution Queries

10/22/2022
by   Shikhar Jaiswal, et al.
0

State-of-the-art algorithms for Approximate Nearest Neighbor Search (ANNS) such as DiskANN, FAISS-IVF, and HNSW build data dependent indices that offer substantially better accuracy and search efficiency over data-agnostic indices by overfitting to the index data distribution. When the query data is drawn from a different distribution - e.g., when index represents image embeddings and query represents textual embeddings - such algorithms lose much of this performance advantage. On a variety of datasets, for a fixed recall target, latency is worse by an order of magnitude or more for Out-Of-Distribution (OOD) queries as compared to In-Distribution (ID) queries. The question we address in this work is whether ANNS algorithms can be made efficient for OOD queries if the index construction is given access to a small sample set of these queries. We answer positively by presenting OOD-DiskANN, which uses a sparing sample (1 of index set size) of OOD queries, and provides up to 40 query latency over SoTA algorithms of a similar memory footprint. OOD-DiskANN is scalable and has the efficiency of graph-based ANNS indices. Some of our contributions can improve query efficiency for ID queries as well.

READ FULL TEXT
research
05/20/2021

FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search

Approximate nearest neighbor search (ANNS) is a fundamental building blo...
research
08/29/2023

CAPS: A Practical Partition Index for Filtered Similarity Search

With the surging popularity of approximate near-neighbor search (ANNS), ...
research
05/07/2023

Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis

Algorithms for approximate nearest-neighbor search (ANNS) have been the ...
research
09/11/2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

With the advancement of machine learning and deep learning, vector searc...
research
04/07/2023

Similarity search in the blink of an eye with compressed indices

Nowadays, data is represented by vectors. Retrieving those vectors, amon...
research
02/07/2018

Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors

This work addresses the problem of billion-scale nearest neighbor search...
research
02/02/2019

Learned Indexes for Dynamic Workloads

The recent proposal of learned index structures opens up a new perspecti...

Please sign up or login with your details

Forgot password? Click here to reset