To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

04/30/2022
by   Hang Li, et al.
0

Current pre-trained language model approaches to information retrieval can be broadly divided into two categories: sparse retrievers (to which belong also non-neural approaches such as bag-of-words methods, e.g., BM25) and dense retrievers. Each of these categories appears to capture different characteristics of relevance. Previous work has investigated how relevance signals from sparse retrievers could be combined with those from dense retrievers via interpolation. Such interpolation would generally lead to higher retrieval effectiveness. In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF). This context poses two key challenges: (1) When should interpolation occur: before, after, or both before and after the PRF process? (2) Which sparse representation should be considered: a zero-shot bag-of-words model (BM25), or a learnt sparse representation? To answer these questions we perform a thorough empirical evaluation considering an effective and scalable neural PRF approach (Vector-PRF), three effective dense retrievers (ANCE, TCTv2, DistillBERT), and one state-of-the-art learnt sparse retriever (uniCOIL). The empirical findings from our experiments suggest that, regardless of sparse representation and dense retriever, interpolation both before and after PRF achieves the highest effectiveness across most datasets and metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2021

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of...
research
05/12/2022

How does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval?

Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved b...
research
12/13/2021

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the ...
research
12/20/2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

While dense retrieval has been shown effective and efficient across task...
research
08/20/2023

Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval

Dense retrieval has made significant advancements in information retriev...
research
01/12/2015

Combined modeling of sparse and dense noise for improvement of Relevance Vector Machine

Using a Bayesian approach, we consider the problem of recovering sparse ...
research
05/10/2022

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

Neural retrievers based on dense representations combined with Approxima...

Please sign up or login with your details

Forgot password? Click here to reset