SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

02/13/2023
by   Minghan Li, et al.
0

This paper introduces a method called Sparsified Late Interaction for Multi-vector retrieval with inverted indexes (SLIM). Although multi-vector models have demonstrated their effectiveness in various information retrieval tasks, most of their pipelines require custom optimization to be efficient in both time and space. Among them, ColBERT is probably the most established method which is based on the late interaction of contextualized token embeddings of pre-trained language models. Unlike ColBERT where all its token embeddings are low-dimensional and dense, SLIM projects each token embedding into a high-dimensional, sparse lexical space before performing late interaction. In practice, we further propose to approximate SLIM using the lower- and upper-bound of the late interaction to reduce latency and storage. In this way, the sparse outputs can be easily incorporated into an inverted search index and are fully compatible with off-the-shelf search tools such as Pyserini and Elasticsearch. SLIM has competitive accuracy on information retrieval benchmarks such as MS MARCO Passages and BEIR compared to ColBERT while being much smaller and faster on CPUs. Source code and data will be available at https://github.com/castorini/pyserini/blob/master/docs/experiments-slim.md.

READ FULL TEXT
research
11/18/2022

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) ...
research
05/06/2022

Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

Dense retrievers encode texts and map them in an embedding space using p...
research
12/02/2021

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Neural information retrieval (IR) has greatly advanced search and other ...
research
02/13/2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Recent progress in information retrieval finds that embedding query and ...
research
05/19/2022

PLAID: An Efficient Engine for Late Interaction Retrieval

Pre-trained language models are increasingly important components across...
research
03/24/2022

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Recent progress in neural information retrieval has demonstrated large g...
research
02/28/2020

Fast Indexes for Gapped Pattern Matching

We describe indexes for searching large data sets for variable-length-ga...

Please sign up or login with your details

Forgot password? Click here to reset