Sensitive and Scalable Online Evaluation with Theoretical Guarantees

by   Harrie Oosterhuis, et al.
University of Amsterdam

Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs comparisons based on document-pair preferences, and prove that it is considerate and has fidelity. We show empirically that, compared to previous multileaved comparison methods, PPM is more sensitive to user preferences and scalable with the number of rankers being compared.


page 1

page 2

page 3

page 4


Comparing Conventional and Conversational Search Interaction using Implicit Evaluation Methods

Conversational search applications offer the prospect of improved user e...

Just Sort It! A Simple and Effective Approach to Active Preference Learning

We address the problem of learning a ranking by using adaptively chosen ...

Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method

Many applications such as recommendation systems or sports tournaments i...

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information ret...

Refining Recency Search Results with User Click Feedback

Traditional machine-learned ranking systems for web search are often tra...

Differentiable Unbiased Online Learning to Rank

Online Learning to Rank (OLTR) methods optimize rankers based on user in...

Finding Favourite Tuples on Data Streams with Provably Few Comparisons

One of the most fundamental tasks in data science is to assist a user wi...

Please sign up or login with your details

Forgot password? Click here to reset