Learning to Retrieve Passages without Supervision

12/14/2021
by   Ori Ram, et al.
1

Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. We investigate whether dense retrievers can be learned in a self-supervised fashion, and applied effectively without any annotations. We observe that existing pretrained models for retrieval struggle in this scenario, and propose a new pretraining scheme designed for retrieval: recurring span retrieval. We use recurring spans across passages in a document to create pseudo examples for contrastive learning. The resulting model – Spider – performs surprisingly well without any examples on a wide range of ODQA datasets, and is competitive with BM25, a strong sparse baseline. In addition, Spider often outperforms strong baselines like DPR trained on Natural Questions, when evaluated on questions from other datasets. Our hybrid retriever, which combines Spider with BM25, improves over its components across all datasets, and is often competitive with in-domain DPR models, which are trained on tens of thousands of examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2020

Dense Passage Retrieval for Open-Domain Question Answering

Open-domain question answering relies on efficient passage retrieval to ...
research
12/17/2022

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

Dense retrievers have made significant strides in obtaining state-of-the...
research
05/24/2022

Partial-input baselines show that NLI models can ignore context, but they don't

When strong partial-input baselines reveal artifacts in crowdsourced NLI...
research
04/30/2020

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering

To extract answers from a large corpus, open-domain question answering (...
research
10/04/2022

A Study on the Efficiency and Generalization of Light Hybrid Retrievers

Existing hybrid retrievers which integrate sparse and dense retrievers, ...
research
08/02/2023

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

We equip a smaller Language Model to generalise to answering challenging...
research
04/06/2023

Revisiting Dense Retrieval with Unanswerable Counterfactuals

The retriever-reader framework is popular for open-domain question answe...

Please sign up or login with your details

Forgot password? Click here to reset