Where Do We Go From Here? Guidelines For Offline Recommender Evaluation

11/02/2022
by   Tobias Schnabel, et al.
0

Various studies in recent years have pointed out large issues in the offline evaluation of recommender systems, making it difficult to assess whether true progress has been made. However, there has been little research into what set of practices should serve as a starting point during experimentation. In this paper, we examine four larger issues in recommender system research regarding uncertainty estimation, generalization, hyperparameter optimization and dataset pre-processing in more detail to arrive at a set of guidelines. We present a TrainRec, a lightweight and flexible toolkit for offline training and evaluation of recommender systems that implements these guidelines. Different from other frameworks, TrainRec is a toolkit that focuses on experimentation alone, offering flexible modules that can be can be used together or in isolation. Finally, we demonstrate TrainRec's usefulness by evaluating a diverse set of twelve baselines across ten datasets. Our results show that (i) many results on smaller datasets are likely not statistically significant, (ii) there are at least three baselines that perform well on most datasets and should be considered in future experiments, and (iii) improved uncertainty quantification (via nested CV and statistical testing) rules out some reported differences between linear and neural methods. Given these results, we advocate that future research should standardize evaluation using our suggested guidelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

Widespread Flaws in Offline Evaluation of Recommender Systems

Even though offline evaluation is just an imperfect proxy of online perf...
research
06/27/2023

Wespeaker baselines for VoxSRC2023

This report showcases the results achieved using the wespeaker toolkit f...
research
10/11/2018

A Distributed and Accountable Approach to Offline Recommender Systems Evaluation

Different software tools have been developed with the purpose of perform...
research
09/10/2018

The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project

Since 2010, we have built and maintained LensKit, an open-source toolkit...
research
01/31/2021

Improving Accountability in Recommender Systems Research Through Reproducibility

Reproducibility is a key requirement for scientific progress. It allows ...
research
10/11/2018

Sequeval: A Framework to Assess and Benchmark Sequence-based Recommender Systems

In this paper, we present sequeval, a software tool capable of performin...
research
05/04/2019

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

Numerical evaluations with comparisons to baselines play a central role ...

Please sign up or login with your details

Forgot password? Click here to reset