CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

12/16/2021
by   George Zerveas, et al.
2

We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method and involves training a model to jointly score a large set of retrieved candidate documents for each query, while potentially transforming on the fly the representation of each document in the context of the other candidates as well as the query itself. When scoring a document representation based on its similarity to a query, the model is thus aware of the representation of its "peer" documents. We show that our approach leads to substantial improvement in retrieval performance over the base method and over scoring candidate documents in isolation from one another, as in a pair-wise training setting. Crucially, unlike term-interaction rerankers based on BERT-like encoders, it incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method. Finally, concurrently considering a set of candidate documents for a given query enables additional valuable capabilities in retrieval, such as score calibration and mitigating societal biases in ranking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2022

PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Dense passage retrieval (DPR) models show great effectiveness gains in f...
research
05/10/2021

Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

In any ranking system, the retrieval model outputs a single score for a ...
research
09/01/2022

Isotropic Representation Can Improve Dense Retrieval

The recent advancement in language representation modeling has broadly a...
research
10/04/2021

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

This paper outlines a conceptual framework for understanding recent deve...
research
09/11/2017

A Short Note on Proximity-based Scoring of Documents with Multiple Fields

The BM25 ranking function is one of the most well known query relevance ...
research
03/12/2022

Information retrieval for label noise document ranking by bag sampling and group-wise loss

Long Document retrieval (DR) has always been a tremendous challenge for ...
research
03/02/2023

Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

Retrieval with extremely long queries and documents is a well-known and ...

Please sign up or login with your details

Forgot password? Click here to reset