Enhancing the Ranking Context of Dense Retrieval Methods through Reciprocal Nearest Neighbors

05/25/2023
by   George Zerveas, et al.
0

Sparse annotation poses persistent challenges to training dense retrieval models, such as the problem of false negatives, i.e. unlabeled relevant documents that are spuriously used as negatives in contrastive learning, distorting the training signal. To alleviate this problem, we introduce evidence-based label smoothing, a computationally efficient method that prevents penalizing the model for assigning high relevance to false negatives. To compute the target relevance distribution over candidate documents within the ranking context of a given query, candidates most similar to the ground truth are assigned a non-zero relevance probability based on the degree of their similarity to the ground-truth document(s). As a relevance estimate we leverage an improved similarity metric based on reciprocal nearest neighbors, which can also be used independently to rerank candidates in post-processing. Through extensive experiments on two large-scale ad hoc text retrieval datasets we demonstrate that both methods can improve the ranking effectiveness of dense retrieval models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

Weakly Supervised Label Smoothing

We study Label Smoothing (LS), a widely used regularization technique, i...
research
04/17/2021

Co-BERT: A Context-Aware BERT Retrieval Model Incorporating Local and Query-specific Context

BERT-based text ranking models have dramatically advanced the state-of-t...
research
01/14/2022

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Ad-hoc search calls for the selection of appropriate answers from a mass...
research
08/19/2019

Relevance Proximity Graphs for Fast Relevance Retrieval

In plenty of machine learning applications, the most relevant items for ...
research
04/01/2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Vector quantization (VQ) based ANN indexes, such as Inverted File System...
research
09/01/2022

Isotropic Representation Can Improve Dense Retrieval

The recent advancement in language representation modeling has broadly a...
research
09/03/2019

Finding Salient Context based on Semantic Matching for Relevance Ranking

In this paper, we propose a salient-context based semantic matching meth...

Please sign up or login with your details

Forgot password? Click here to reset