Stat-weight: Improving the Estimator of Interleaved Methods Outcomes with Statistical Hypothesis Testing

03/17/2023
by   Alessandro Benedetti, et al.
0

Interleaving is an online evaluation approach for information retrieval systems that compares the effectiveness of ranking functions in interpreting the users' implicit feedback. Previous work such as Hofmann et al (2011) has evaluated the most promising interleaved methods at the time, on uniform distributions of queries. In the real world, ordinarily, there is an unbalanced distribution of repeated queries that follows a long-tailed users' search demand curve. The more a query is executed, by different users (or in different sessions), the higher the probability of collecting implicit feedback (interactions/clicks) on the related search results. This paper first aims to replicate the Team Draft Interleaving accuracy evaluation on uniform query distributions and then focuses on assessing how this method generalizes to long-tailed real-world scenarios. The reproducibility work raised interesting considerations on how the winning ranking function for each query should impact the overall winner for the entire evaluation. Based on what was observed, we propose that not all the queries should contribute to the final decision in equal proportion. As a result of these insights, we designed two variations of the Δ_AB score winner estimator that assign to each query a credit based on statistical hypothesis testing. To replicate, reproduce and extend the original work, we have developed from scratch a system that simulates a search engine and users' interactions from datasets from the industry. Our experiments confirm our intuition and show that our methods are promising in terms of accuracy, sensitivity, and robustness to noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2019

Anonymising Queries by Semantic Decomposition

Protecting the privacy of search engine users is an important requiremen...
research
01/19/2022

Validating Simulations of User Query Variants

System-oriented IR evaluations are limited to rather abstract understand...
research
07/17/2023

An Exploration Study of Mixed-initiative Query Reformulation in Conversational Passage Retrieval

In this paper, we report our methods and experiments for the TREC Conver...
research
05/31/2022

Interactive Query Clarification and Refinement via User Simulation

When users initiate search sessions, their queries are often unclear or ...
research
04/22/2022

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

Conventional methods for query autocompletion aim to predict which compl...
research
07/15/2013

The Fundamental Learning Problem that Genetic Algorithms with Uniform Crossover Solve Efficiently and Repeatedly As Evolution Proceeds

This paper establishes theoretical bonafides for implicit concurrent mul...

Please sign up or login with your details

Forgot password? Click here to reset