DeepAI AI Chat
Log In Sign Up

On Single and Multiple Representations in Dense Passage Retrieval

08/13/2021
by   Craig Macdonald, et al.
0

The advent of contextualised language models has brought gains in search effectiveness, not just when applied for re-ranking the output of classical weighting models such as BM25, but also when used directly for passage indexing and retrieval, a technique which is called dense retrieval. In the existing literature in neural ranking, two dense retrieval families have become apparent: single representation, where entire passages are represented by a single embedding (usually BERT's [CLS] token, as exemplified by the recent ANCE approach), or multiple representations, where each token in a passage is represented by its own embedding (as exemplified by the recent ColBERT approach). These two families have not been directly compared. However, because of the likely importance of dense retrieval moving forward, a clear understanding of their advantages and disadvantages is paramount. To this end, this paper contributes a direct study on their comparative effectiveness, noting situations where each method under/over performs w.r.t. each other, and w.r.t. a BM25 baseline. We observe that, while ANCE is more efficient than ColBERT in terms of response time and memory usage, multiple representations are statistically more effective than the single representations for MAP and MRR@10. We also show that multiple representations obtain better improvements than single representations for queries that are the hardest for BM25, as well as for definitional queries, and those with complex information needs.

READ FULL TEXT
06/21/2021

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance mode...
12/14/2021

Boosted Dense Retriever

We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrB...
08/25/2021

On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval

Dense retrieval, which describes the use of contextualised language mode...
08/23/2021

Query Embedding Pruning for Dense Retrieval

Recent advances in dense retrieval techniques have offered the promise o...
03/21/2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

With the recent success of dense retrieval methods based on bi-encoders,...
04/01/2022

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

Current dense retrievers are not robust to out-of-domain and outlier que...
02/19/2021

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Pyserini is an easy-to-use Python toolkit that supports replicable IR re...