IRLI: Iterative Re-partitioning for Learning to Index

03/17/2021
by   Gaurav Gupta, et al.
8

Neural models have transformed the fundamental information retrieval problem of mapping a query to a giant set of items. However, the need for efficient and low latency inference forces the community to reconsider efficient approximate near-neighbor search in the item space. To this end, learning to index is gaining much interest in recent times. Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI (pronounced `early'), which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. Furthermore, IRLI employs a superior power-of-k-choices based load balancing strategy. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing. IRLI surpasses the best baseline's precision on multi-label classification while being 5x faster on inference. For near-neighbor search tasks, the same method outperforms the state-of-the-art Learned Hashing approach NeuralLSH by requiring only   1/6^th of the candidates for the same recall. IRLI is both data and model parallel, making it ideal for distributed GPU implementation. We demonstrate this advantage by indexing 100 million dense vectors and surpassing the popular FAISS library by >10

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2020

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Dense embedding models are commonly deployed in commercial search engine...
research
08/29/2023

CAPS: A Practical Partition Index for Filtered Similarity Search

With the surging popularity of approximate near-neighbor search (ANNS), ...
research
01/31/2022

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

Nearest Neighbor Search (NNS) has recently drawn a rapid increase of int...
research
02/12/2018

Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification

Machine learning has played an important role in information retrieval (...
research
08/26/2020

Item Tagging for Information Retrieval: A Tripartite Graph Neural Network based Approach

Tagging has been recognized as a successful practice to boost relevance ...
research
05/04/2023

Adaptive Selection of Anchor Items for CUR-based k-NN search with Cross-Encoders

Cross-encoder models, which jointly encode and score a query-item pair, ...

Please sign up or login with your details

Forgot password? Click here to reset