How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

02/15/2023
by   Sheng-Chieh Lin, et al.
0

Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA). Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to ...
research
07/02/2021

Supervised Contrastive Learning for Accented Speech Recognition

Neural network based speech recognition systems suffer from performance ...
research
07/17/2023

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Dense retrieval (DR) converts queries and documents into dense embedding...
research
10/14/2021

Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

Dense retrieval (DR) methods conduct text retrieval by first encoding te...
research
03/11/2022

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retrie...
research
02/07/2023

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

In this paper we improve the zero-shot generalization ability of languag...
research
04/12/2023

Rethinking Dense Retrieval's Few-Shot Ability

Few-shot dense retrieval (DR) aims to effectively generalize to novel se...

Please sign up or login with your details

Forgot password? Click here to reset