Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

07/01/2020
by   Lee Xiong, et al.
0

Conducting text retrieval in a dense learned representation space has many intriguing advantages over sparse retrieval. Yet the effectiveness of dense retrieval (DR) often requires combination with sparse retrieval. In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing. This paper presents Approximate nearest neighbor Negative Contrastive Estimation (ANCE), a training mechanism that constructs negatives from an Approximate Nearest Neighbor (ANN) index of the corpus, which is parallelly updated with the learning process to select more realistic negative training instances. This fundamentally resolves the discrepancy between the data distribution used in the training and testing of DR. In our experiments, ANCE boosts the BERT-Siamese DR model to outperform all competitive dense and sparse retrieval baselines. It nearly matches the accuracy of sparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learned representation space and provides almost 100x speed-up.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

Constructing Tree-based Index for Efficient and Effective Dense Retrieval

Recent studies have shown that Dense Retrieval (DR) techniques can signi...
research
10/20/2020

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval...
research
10/14/2021

Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

Dense retrieval (DR) methods conduct text retrieval by first encoding te...
research
06/26/2022

Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

Recently, several dense retrieval (DR) models have demonstrated competit...
research
04/14/2021

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

A vital step towards the widespread adoption of neural retrieval models ...
research
11/19/2018

End-to-End Retrieval in Continuous Space

Most text-based information retrieval (IR) systems index objects by word...
research
03/27/2023

Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining

Dual encoder models are ubiquitous in modern classification and retrieva...

Please sign up or login with your details

Forgot password? Click here to reset