Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

12/17/2022
by   Rui Meng, et al.
0

Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.

READ FULL TEXT
research
12/14/2021

Learning to Retrieve Passages without Supervision

Dense retrievers for open-domain question answering (ODQA) have been sho...
research
09/17/2020

Generation-Augmented Retrieval for Open-domain Question Answering

Conventional sparse retrieval methods such as TF-IDF and BM25 are simple...
research
03/15/2022

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

Dense retrieval models, which aim at retrieving the most relevant docume...
research
08/05/2023

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Domain transfer is a prevalent challenge in modern neural Information Re...
research
05/10/2023

Unsupervised Dense Retrieval Training with Web Anchors

In this work, we present an unsupervised retrieval method with contrasti...
research
05/04/2022

Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings

Dense retrieval is becoming one of the standard approaches for document ...
research
08/01/2016

Keyphrase Extraction using Sequential Labeling

Keyphrases efficiently summarize a document's content and are used in va...

Please sign up or login with your details

Forgot password? Click here to reset