Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

08/16/2023
by   Guangyuan Ma, et al.
0

In this paper, we systematically study the potential of pre-training with Large Language Model(LLM)-based document expansion for dense passage retrieval. Concretely, we leverage the capabilities of LLMs for document expansion, i.e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval. These strategies include contrastive learning and bottlenecked query generation. Furthermore, we incorporate a curriculum learning strategy to reduce the reliance on LLM inferences. Experimental results demonstrate that pre-training with LLM-based document expansion significantly boosts the retrieval performance on large-scale web-search tasks. Our work shows strong zero-shot and out-of-domain retrieval abilities, making it more widely applicable for retrieval when initializing with no human-labeled data.

READ FULL TEXT
research
12/19/2022

Query-as-context Pre-training for Dense Passage Retrieval

This paper presents a pre-training technique called query-as-context tha...
research
02/27/2023

Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

Pre-trained language models have achieved great success in various large...
research
10/27/2022

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

Pre-trained language model (PTM) has been shown to yield powerful text r...
research
12/15/2021

Value Retrieval with Arbitrary Queries for Form-like Documents

We propose value retrieval with arbitrary queries for form-like document...
research
12/15/2022

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

Masked language modeling (MLM) has been widely used for pre-training eff...
research
04/27/2023

Large Language Models are Strong Zero-Shot Retriever

In this work, we propose a simple method that applies a large language m...
research
01/13/2023

Do the Findings of Document and Passage Retrieval Generalize to the Retrieval of Responses for Dialogues?

A number of learned sparse and dense retrieval approaches have recently ...

Please sign up or login with your details

Forgot password? Click here to reset