On Offline Evaluation of Recommender Systems

by   Yitong Ji, et al.

In academic research, recommender models are often evaluated offline on benchmark datasets. The offline dataset is first split to train and test instances. All training instances are then modeled in a user-item interaction matrix, and supervised learning models are trained. Many such offline evaluations ignore the global timeline in the data, which leads to "data leakage": a model learns from future data to predict a current value, making the evaluation unrealistic. In this paper, we evaluate the impact of "data leakage" using two widely adopted baseline models, BPR and NeuMF, on MovieLens dataset. We show that accessing to different amount of future data may improve or deteriorate a model's recommendation accuracy. That is, ignoring the global timeline in offline evaluation makes the performance among recommendation models not comparable. Our experiments also show that more historical data in training set does not necessarily lead to better recommendation accuracy. We share our understanding of these observations and highlight the importance of preserving the global timeline. We also call for a revisit of recommender system offline evaluation.



There are no comments yet.


page 1

page 2

page 3

page 4


Study of a bias in the offline evaluation of a recommendation algorithm

Recommendation systems have been integrated into the majority of large o...

Do Offline Metrics Predict Online Performance in Recommender Systems?

Recommender systems operate in an inherently dynamical setting. Past rec...

On the Value of Bandit Feedback for Offline Recommender System Evaluation

In academic literature, recommender systems are often evaluated on the t...

The Simpson's Paradox in the Offline Evaluation of Recommendation Systems

Recommendation systems are often evaluated based on user's interactions ...

Reducing Offline Evaluation Bias in Recommendation Systems

Recommendation systems have been integrated into the majority of large o...

Estimating Error and Bias in Offline Evaluation Results

Offline evaluations of recommender systems attempt to estimate users' sa...

A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

Many video-on-demand and music streaming services provide the user with ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.