Repeatability Corner Cases in Document Ranking: The Impact of Score Ties

07/16/2018
by   Jimmy Lin, et al.
0

Document ranking experiments should be repeatable: running the same ranking model over the same collection with the same queries should yield exactly the same output. However, the presence of different documents with the same score may yield non-deterministic rankings, making repeatability not as trivial as one might imagine. In the context of our work using the open-source Lucene search engine, score ties are broken by internal document ids, which are assigned at index time. Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably. This short paper examines the effectiveness impact of such score ties, quantifying the variability that can be attributed to this phenomenon. The obvious solution to this non-determinism and to ensure repeatable document ranking is to break score ties using external collection document ids. This approach, however, comes with measurable efficiency costs due to the necessity of consulting external identifiers during the inner loop of query evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2022

Long Document Re-ranking with Modular Re-ranker

Long document re-ranking has been a challenging problem for neural re-ra...
research
02/04/2020

Interpretable Time-Budget-Constrained Contextualization for Re-Ranking

Search engines operate under a strict time constraint as a fast response...
research
04/18/2021

Anytime Ranking on Document-Ordered Indexes

Inverted indexes continue to be a mainstay of text search engines, allow...
research
04/29/2020

Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

Deep pretrained transformer networks are effective at various ranking ta...
research
11/25/2017

Neural Ranking Models with Multiple Document Fields

Deep neural networks have recently shown promise in the ad-hoc retrieval...
research
12/26/2019

On the Reproducibility of Experiments of Indexing Repetitive Document Collections

This work introduces a companion reproducible paper with the aim of allo...
research
05/07/2023

Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking

Pre-trained language models (PLMs) have proven to be effective for docum...

Please sign up or login with your details

Forgot password? Click here to reset