Billion-scale similarity search with GPUs

02/28/2017
by   Jeff Johnson, et al.
0

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55 peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

READ FULL TEXT
08/05/2020

Fast top-K Cosine Similarity Search through XOR-Friendly Binary Quantization on GPUs

We explore the use of GPU for accelerating large scale nearest neighbor ...
01/02/2019

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Billion-scale high-dimensional approximate nearest neighbour (ANN) searc...
01/31/2022

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

Nearest Neighbor Search (NNS) has recently drawn a rapid increase of int...
03/29/2021

Large-Scale Approximate k-NN Graph Construction on GPU

k-nearest neighbor graph is a key data structure in many disciplines suc...
12/10/2014

Memory vectors for similarity search in high-dimensional spaces

We study an indexing architecture to store and search in a database of h...
06/05/2017

To Index or Not to Index: Optimizing Maximum Inner Product Search

Making top-K predictions for state-of-the-art Matrix Factorization model...
09/07/2016

Polysemous codes

This paper considers the problem of approximate nearest neighbor search ...

Code Repositories

faiss

A library for efficient similarity search and clustering of dense vectors.


view repo

Please sign up or login with your details

Forgot password? Click here to reset