Evaluating Extrapolation Performance of Dense Retrieval

by   Jingtao Zhan, et al.

A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are rather different from the training data. While dense retrieval (DR) models have been demonstrated to achieve better retrieval performance than the traditional term-based retrieval models, we still know little about whether they can extrapolate. To shed light on the research question, we investigate how DR models perform in both the interpolation and extrapolation regimes. We first investigate the distribution of training and test data on popular retrieval benchmarks and identify a considerable overlap in query entities, query intent, and relevance labels. This finding implies that the performance on these test sets is biased towards interpolation and cannot accurately reflect the extrapolation capacity. Therefore, to evaluate the extrapolation performance of DR models, we propose two resampling strategies for existing retrieval benchmarks and comprehensively investigate how DR models perform. Results show that DR models may interpolate as well as complex interaction-based models (e.g., BERT and ColBERT) but extrapolate substantially worse. Among various DR training strategies, text-encoding pretraining and target-domain pretraining are particularly effective for improving the extrapolation capacity. Finally, we compare the extrapolation capacity with domain transfer ability. Despite its simplicity and ease of use, the extrapolation performance can reflect the domain transfer ability in some domains of the BEIR dataset, further highlighting the feasibility of our approaches in evaluating the generalizability of DR models.


page 1

page 2

page 3

page 4


Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

Recent advance in Dense Retrieval (DR) techniques has significantly impr...

Interpreting Dense Retrieval as Mixture of Topics

Dense Retrieval (DR) reaches state-of-the-art results in first-stage ret...

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to ...

Dealing with Typos for BERT-based Passage Retrieval and Ranking

Passage retrieval and ranking is a key task in open-domain question answ...

Isotropic Representation Can Improve Dense Retrieval

The recent advancement in language representation modeling has broadly a...

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

The process of model checkpoint validation refers to the evaluation of t...

Optimizing Dense Retrieval Model Training with Hard Negatives

Ranking has always been one of the top concerns in information retrieval...

Please sign up or login with your details

Forgot password? Click here to reset