On Offline Evaluation of Recommender Systems

10/21/2020
by   Yitong Ji, et al.
0

In academic research, recommender models are often evaluated offline on benchmark datasets. The offline dataset is first split to train and test instances. All training instances are then modeled in a user-item interaction matrix, and supervised learning models are trained. Many such offline evaluations ignore the global timeline in the data, which leads to "data leakage": a model learns from future data to predict a current value, making the evaluation unrealistic. In this paper, we evaluate the impact of "data leakage" using two widely adopted baseline models, BPR and NeuMF, on MovieLens dataset. We show that accessing to different amount of future data may improve or deteriorate a model's recommendation accuracy. That is, ignoring the global timeline in offline evaluation makes the performance among recommendation models not comparable. Our experiments also show that more historical data in training set does not necessarily lead to better recommendation accuracy. We share our understanding of these observations and highlight the importance of preserving the global timeline. We also call for a revisit of recommender system offline evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2022

From Counter-intuitive Observations to a Fresh Look at Recommender System

Recently, a few papers report counter-intuitive observations made from e...
research
08/14/2023

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

The evaluation of recommendation systems is a complex task. The offline ...
research
01/03/2023

Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives

In this paper, we argue that the paradigm commonly adopted for offline e...
research
07/27/2023

Widespread Flaws in Offline Evaluation of Recommender Systems

Even though offline evaluation is just an imperfect proxy of online perf...
research
07/26/2019

On the Value of Bandit Feedback for Offline Recommender System Evaluation

In academic literature, recommender systems are often evaluated on the t...
research
04/18/2021

The Simpson's Paradox in the Offline Evaluation of Recommendation Systems

Recommendation systems are often evaluated based on user's interactions ...
research
09/07/2022

Data Leakage in Notebooks: Static Detection and Better Processes

Data science pipelines to train and evaluate models with machine learnin...

Please sign up or login with your details

Forgot password? Click here to reset