LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

03/11/2022
by   Canwen Xu, et al.
0

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

We present PESCO, a novel contrastive learning framework that substantia...
research
11/03/2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

For the majority of the machine learning community, the expensive nature...
research
06/13/2023

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

BEIR is a benchmark dataset for zero-shot evaluation of information retr...
research
12/20/2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

While dense retrieval has been shown effective and efficient across task...
research
06/20/2022

A Dense Representation Framework for Lexical and Semantic Matching

Lexical and semantic matching capture different successful approaches to...
research
02/15/2023

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Various techniques have been developed in recent years to improve dense ...
research
01/14/2021

The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models

We propose a design pattern for tackling text ranking problems, dubbed "...

Please sign up or login with your details

Forgot password? Click here to reset