PLAID: An Efficient Engine for Late Interaction Retrieval

05/19/2022
by   Keshav Santhanam, et al.
5

Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Interaction Driver (PLAID). Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7× on a GPU and 45× on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at the largest scales we evaluate with 140M passages.

READ FULL TEXT

page 2

page 5

research
12/02/2021

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Neural information retrieval (IR) has greatly advanced search and other ...
research
06/04/2023

I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval

Passage retrieval is a fundamental task in many information systems, suc...
research
02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...
research
08/31/2022

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

In large-scale retrieval, the lexicon-weighting paradigm, learning weigh...
research
05/12/2023

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Information retrieval (IR) plays a crucial role in locating relevant res...
research
02/18/2018

Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors

Rapid advances in GPU hardware and multiple areas of Deep Learning open ...
research
03/11/2022

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Recent rapid advancements in deep pre-trained language models and the in...

Please sign up or login with your details

Forgot password? Click here to reset