Query-as-context Pre-training for Dense Passage Retrieval

12/19/2022
by   Xing Wu, et al.
0

This paper presents a pre-training technique called query-as-context that uses query prediction to improve dense retrieval. Previous research has applied query prediction to document expansion in order to alleviate the problem of lexical mismatch in sparse retrieval. However, query prediction has not yet been studied in the context of dense retrieval. Query-as-context pre-training assumes that the predicted query is a special context for the document and uses contrastive learning or contextual masked auto-encoding learning to compress the document and query into dense vectors. The technique is evaluated on large-scale passage retrieval benchmarks and shows considerable improvements compared to existing strong baselines such as coCondenser and CoT-MAE, demonstrating its effectiveness. Our code will be available at https://github.com/caskcsg/ir/tree/main/cotmae-qc .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2023

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

In this paper, we systematically study the potential of pre-training wit...
research
08/16/2022

ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Dense passage retrieval aims to retrieve the relevant passages of a quer...
research
06/05/2023

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Dense retrievers have achieved impressive performance, but their demand ...
research
03/23/2023

A Unified Framework for Learned Sparse Retrieval

Learned sparse retrieval (LSR) is a family of first-stage retrieval meth...
research
08/01/2023

On the Effects of Regional Spelling Conventions in Retrieval Models

One advantage of neural ranking models is that they are meant to general...
research
02/10/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

We consider the large-scale query-document retrieval problem: given a qu...
research
06/05/2023

Benchmarking Middle-Trained Language Models for Neural Search

Middle training methods aim to bridge the gap between the Masked Languag...

Please sign up or login with your details

Forgot password? Click here to reset