CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

11/18/2022
by   Minghan Li, et al.
0

Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route different token vectors to the predicted lexical “keys” such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Code and data are available at https://github.com/facebookresearch/dpr-scale.

READ FULL TEXT
research
02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...
research
03/24/2022

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Recent progress in neural information retrieval has demonstrated large g...
research
11/20/2022

An Algorithm for Routing Vectors in Sequences

We propose a routing algorithm that takes a sequence of vectors and comp...
research
11/02/2022

Multi-Vector Retrieval as Sparse Alignment

Multi-vector retrieval models improve over single-vector dual encoders o...
research
12/02/2021

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Neural information retrieval (IR) has greatly advanced search and other ...
research
04/04/2023

Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020...
research
02/13/2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Recent progress in information retrieval finds that embedding query and ...

Please sign up or login with your details

Forgot password? Click here to reset