On Single and Multiple Representations in Dense Passage Retrieval

08/13/2021
by   Craig Macdonald, et al.
0

The advent of contextualised language models has brought gains in search effectiveness, not just when applied for re-ranking the output of classical weighting models such as BM25, but also when used directly for passage indexing and retrieval, a technique which is called dense retrieval. In the existing literature in neural ranking, two dense retrieval families have become apparent: single representation, where entire passages are represented by a single embedding (usually BERT's [CLS] token, as exemplified by the recent ANCE approach), or multiple representations, where each token in a passage is represented by its own embedding (as exemplified by the recent ColBERT approach). These two families have not been directly compared. However, because of the likely importance of dense retrieval moving forward, a clear understanding of their advantages and disadvantages is paramount. To this end, this paper contributes a direct study on their comparative effectiveness, noting situations where each method under/over performs w.r.t. each other, and w.r.t. a BM25 baseline. We observe that, while ANCE is more efficient than ColBERT in terms of response time and memory usage, multiple representations are statistically more effective than the single representations for MAP and MRR@10. We also show that multiple representations obtain better improvements than single representations for queries that are the hardest for BM25, as well as for definitional queries, and those with complex information needs.

READ FULL TEXT
research
06/21/2021

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance mode...
research
08/25/2021

On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval

Dense retrieval, which describes the use of contextualised language mode...
research
12/14/2021

Boosted Dense Retriever

We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrB...
research
03/31/2023

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

Vector-based retrieval systems have become a common staple for academic ...
research
05/10/2023

Evaluating Embedding APIs for Information Retrieval

The ever-increasing size of language models curtails their widespread ac...
research
03/21/2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

With the recent success of dense retrieval methods based on bi-encoders,...
research
04/01/2022

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

Current dense retrievers are not robust to out-of-domain and outlier que...

Please sign up or login with your details

Forgot password? Click here to reset