Curriculum Sampling for Dense Retrieval with Document Expansion

12/18/2022
by   Xingwei He, et al.
0

The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and document. To alleviate this, recent work expects to get query-informed representations of documents. During training, it expands the document with a real query, while replacing the real query with a generated pseudo query at inference. This discrepancy between training and inference makes the dense retrieval model pay more attention to the query information but ignore the document when computing the document representation. As a result, it even performs worse than the vanilla dense retrieval model, since its performance depends heavily on the relevance between the generated queries and the real query. In this paper, we propose a curriculum sampling strategy, which also resorts to the pseudo query at training and gradually increases the relevance of the generated query to the real query. In this way, the retrieval model can learn to extend its attention from the document only to both the document and query, hence getting high-quality query-informed document representations. Experimental results on several passage retrieval datasets show that our approach outperforms the previous dense retrieval methods1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Dense retrieval systems conduct first-stage retrieval using embedded rep...
research
04/25/2023

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Neural retrieval models (NRMs) have been shown to outperform their stati...
research
08/15/2022

Evaluating Dense Passage Retrieval using Transformers

Although representational retrieval models based on Transformers have be...
research
05/25/2022

Refining Query Representations for Dense Retrieval at Test Time

Dense retrieval uses a contrastive learning framework to learn dense rep...
research
08/13/2021

PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval

Recently, dense passage retrieval has become a mainstream approach to fi...
research
08/01/2023

On the Effects of Regional Spelling Conventions in Retrieval Models

One advantage of neural ranking models is that they are meant to general...
research
04/25/2022

LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback

Pseudo-relevance feedback (PRF) has proven to be an effective query refo...

Please sign up or login with your details

Forgot password? Click here to reset