FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search

05/20/2021
by   Aditi Singh, et al.
0

Approximate nearest neighbor search (ANNS) is a fundamental building block in information retrieval with graph-based indices being the current state-of-the-art and widely used in the industry. Recent advances in graph-based indices have made it possible to index and search billion-point datasets with high recall and millisecond-level latency on a single commodity machine with an SSD. However, existing graph algorithms for ANNS support only static indices that cannot reflect real-time changes to the corpus required by many key real-world scenarios (e.g. index of sentences in documents, email, or a news index). To overcome this drawback, the current industry practice for manifesting updates into such indices is to periodically re-build these indices, which can be prohibitively expensive. In this paper, we present the first graph-based ANNS index that reflects corpus updates into the index in real-time without compromising on search performance. Using update rules for this index, we design FreshDiskANN, a system that can index over a billion points on a workstation with an SSD and limited memory, and support thousands of concurrent real-time inserts, deletes and searches per second each, while retaining >95% 5-recall@5. This represents a 5-10x reduction in the cost of maintaining freshness in indices when compared to existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

OOD-DiskANN: Efficient and Scalable Graph ANNS for Out-of-Distribution Queries

State-of-the-art algorithms for Approximate Nearest Neighbor Search (ANN...
research
08/29/2023

CAPS: A Practical Partition Index for Filtered Similarity Search

With the surging popularity of approximate near-neighbor search (ANNS), ...
research
04/07/2023

Similarity search in the blink of an eye with compressed indices

Nowadays, data is represented by vectors. Retrieving those vectors, amon...
research
04/07/2021

Graph Reordering for Cache-Efficient Near Neighbor Search

Graph search is one of the most successful algorithmic trends in near ne...
research
09/01/2023

General and Practical Tuning Method for Off-the-Shelf Graph-Based Index: SISAP Indexing Challenge Report by Team UTokyo

Despite the efficacy of graph-based algorithms for Approximate Nearest N...
research
02/07/2018

Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors

This work addresses the problem of billion-scale nearest neighbor search...
research
08/05/2019

FLuID: A Meta Model to Flexibly Define Schema-level Indices for the Web of Data

Schema-level indices are vital for summarizing large collections of grap...

Please sign up or login with your details

Forgot password? Click here to reset