On the Interpolation of Contextualized Term-based Ranking with BM25 for Query-by-Example Retrieval

10/11/2022
by   Amin Abolghasemi, et al.
0

Term-based ranking with pre-trained transformer-based language models has recently gained attention as they bring the contextualization power of transformer models into the highly efficient term-based retrieval. In this work, we examine the generalizability of two of these deep contextualized term-based models in the context of query-by-example (QBE) retrieval in which a seed document acts as the query to find relevant documents. In this setting – where queries are much longer than common keyword queries – BERT inference at query time is problematic as it involves quadratic complexity. We investigate TILDE and TILDEv2, both of which leverage BERT tokenizer as their query encoder. With this approach, there is no need for BERT inference at query time, and also the query can be of any length. Our extensive evaluation on the four QBE tasks of SciDocs benchmark shows that in a query-by-example retrieval setting TILDE and TILDEv2 are still less effective than a cross-encoder BERT ranker. However, we observe that BM25 could show a competitive ranking quality compared to TILDE and TILDEv2 which is in contrast to the findings about the relative performance of these three models on retrieval for short queries reported in prior work. This result raises the question about the use of contextualized term-based ranking models being beneficial in QBE setting. We follow-up on our findings by studying the score interpolation between the relevance score from TILDE (TILDEv2) and BM25. We conclude that these two contextualized term-based ranking models capture different relevance signals than BM25 and combining the different term-based rankers results in statistically significant improvements in QBE retrieval. Our work sheds light on the challenges of retrieval settings different from the common evaluation benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

Conformer-Kernel with Query Term Independence for Document Retrieval

The Transformer-Kernel (TK) model has demonstrated strong reranking perf...
research
01/23/2023

Injecting the BM25 Score as Text Improves BERT-Based Re-rankers

In this paper we propose a novel approach for combining first-stage lexi...
research
03/02/2023

Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

Retrieval with extremely long queries and documents is a well-known and ...
research
04/25/2022

Groupwise Query Performance Prediction with BERT

While large-scale pre-trained language models like BERT have advanced th...
research
08/19/2021

Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion

BERT-based information retrieval models are expensive, in both time (que...
research
01/21/2022

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

Traditional information retrieval (IR) ranking models process the full t...
research
04/28/2020

EARL: Speedup Transformer-based Rankers with Pre-computed Representation

Recent innovations in Transformer-based ranking models have advanced the...

Please sign up or login with your details

Forgot password? Click here to reset