Domain-matched Pre-training Tasks for Dense Retrieval

07/28/2021
by   Barlas Oguz, et al.
0

Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder models on 1) a recently released set of 65 million synthetically generated questions, and 2) 200 million post-comment pairs from a preexisting dataset of Reddit conversations made available by pushshift.io. We evaluate on a set of information retrieval and dialogue retrieval benchmarks, showing substantial improvements over supervised baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

Recently, various studies have been directed towards exploring dense pas...
research
10/07/2020

Cross-Thought for Sentence Encoder Pre-training

In this paper, we propose Cross-Thought, a novel approach to pre-trainin...
research
04/15/2021

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

Recent work has shown that commonly available machine reading comprehens...
research
11/27/2021

Pre-training Methods in Information Retrieval

The core of information retrieval (IR) is to identify relevant informati...
research
12/16/2021

Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Information retrieval is an important component in natural language proc...
research
03/21/2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

With the recent success of dense retrieval methods based on bi-encoders,...
research
10/20/2022

Tele-Knowledge Pre-training for Fault Analysis

In this work, we share our experience on tele-knowledge pre-training for...

Please sign up or login with your details

Forgot password? Click here to reset