DeepAI AI Chat
Log In Sign Up

Searching in one billion vectors: re-rank with source coding

by   Hervé Jégou, et al.

Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.


page 1

page 2

page 3

page 4


Scalable Image Retrieval by Sparse Product Quantization

Fast Approximate Nearest Neighbor (ANN) search technique for high-dimens...

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Nearest neighbor searching of large databases in high-dimensional spaces...

Language Recognition using Random Indexing

Random Indexing is a simple implementation of Random Projections with a ...

Accelerated Nearest Neighbor Search with Quick ADC

Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a f...

Low-Precision Quantization for Efficient Nearest Neighbor Search

Fast k-Nearest Neighbor search over real-valued vector spaces (KNN) is a...

Indexing of CNN Features for Large Scale Image Search

Convolutional neural network (CNN) features which represent images with ...

Link and code: Fast indexing with graphs and compact regression codes

Similarity search approaches based on graph walks have recently attained...