Questions Are All You Need to Train a Dense Passage Retriever

06/21/2022
by   Devendra Singh Sachan, et al.
6

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

Supervised Question Answering systems (QA systems) rely on domain-specif...
research
10/28/2021

Dense Hierarchical Retrieval for Open-Domain Question Answering

Dense neural text retrieval has achieved promising results on open-domai...
research
02/15/2022

Saving Dense Retriever from Shortcut Dependency in Conversational Search

In conversational search (CS), it needs holistic understanding over conv...
research
12/16/2022

Self-Prompting Large Language Models for Open-Domain QA

Open-Domain Question Answering (ODQA) requires models to answer factoid ...
research
03/22/2021

Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

Question Answering (QA) tasks requiring information from multiple docume...
research
05/19/2022

Two-Step Question Retrieval for Open-Domain QA

The retriever-reader pipeline has shown promising performance in open-do...
research
06/18/2021

Weakly Supervised Pre-Training for Multi-Hop Retriever

In multi-hop QA, answering complex questions entails iterative document ...

Please sign up or login with your details

Forgot password? Click here to reset