The Simpson's Paradox in the Offline Evaluation of Recommendation Systems

04/18/2021
by   Amir H. Jadidinejad, et al.
0

Recommendation systems are often evaluated based on user's interactions that were collected from an existing, already deployed recommendation system. In this situation, users only provide feedback on the exposed items and they may not leave feedback on other items since they have not been exposed to them by the deployed system. As a result, the collected feedback dataset that is used to evaluate a new model is influenced by the deployed system, as a form of closed loop feedback. In this paper, we show that the typical offline evaluation of recommender systems suffers from the so-called Simpson's paradox. Simpson's paradox is the name given to a phenomenon observed when a significant trend appears in several different sub-populations of observational data but disappears or is even reversed when these sub-populations are combined together. Our in-depth experiments based on stratified sampling reveal that a very small minority of items that are frequently exposed by the deployed system plays a confounding factor in the offline evaluation of recommendation systems. In addition, we propose a novel evaluation methodology that takes into account the confounder, i.e the deployed system's characteristics. Using the relative comparison of many recommendation models as in the typical offline evaluation of recommender systems, and based on the Kendall rank correlation coefficient, we show that our proposed evaluation methodology exhibits statistically significant improvements of 14 (Yahoo! and Coat), respectively, in reflecting the true ranking of systems with an open loop (randomised) evaluation in comparison to the standard evaluation.

READ FULL TEXT
research
02/01/2019

Sequential Evaluation and Generation Framework for Combinatorial Recommender System

Typical recommender systems push K items at once in the result page in t...
research
07/25/2020

Feedback Loop and Bias Amplification in Recommender Systems

Recommendation algorithms are known to suffer from popularity bias; a fe...
research
02/22/2022

KuaiRec: A Fully-observed Dataset for Recommender Systems

Recommender systems are usually developed and evaluated on the historica...
research
10/21/2020

On Offline Evaluation of Recommender Systems

In academic research, recommender models are often evaluated offline on ...
research
10/30/2017

How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility

Recommendation systems occupy an expanding role in everyday decision mak...
research
04/27/2018

Offline Evaluation of Ranking Policies with Click Models

Many web systems rank and present a list of items to users, from recomme...
research
06/27/2019

User Validation of Recommendation Serendipity Metrics

Though it has been recognized that recommending serendipitous (i.e., sur...

Please sign up or login with your details

Forgot password? Click here to reset