Towards Unsupervised Dense Information Retrieval with Contrastive Learning

12/16/2021
by   Gautier Izacard, et al.
0

Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2021

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

Recent research demonstrates the effectiveness of using fine-tuned langu...
research
12/08/2020

Distilling Knowledge from Reader to Retriever for Question Answering

The task of information retrieval is an important component of many natu...
research
06/21/2023

Resources and Evaluations for Multi-Distribution Dense Information Retrieval

We introduce and define the novel problem of multi-distribution informat...
research
07/28/2021

Domain-matched Pre-training Tasks for Dense Retrieval

Pre-training on larger datasets with ever increasing model size is now a...
research
11/17/2022

Data-Efficient Autoregressive Document Retrieval for Fact Verification

Document retrieval is a core component of many knowledge-intensive natur...
research
03/23/2023

Parameter-Efficient Sparse Retrievers and Rerankers using Adapters

Parameter-Efficient transfer learning with Adapters have been studied in...
research
01/08/2018

Web2Text: Deep Structured Boilerplate Removal

Web pages are a valuable source of information for many natural language...

Please sign up or login with your details

Forgot password? Click here to reset