Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

09/18/2022
by   Imad Aouali, et al.
0

Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and the latter class only provides theoretical guarantees under assumptions that cannot be fulfilled in real-world environments. Here, we make the case that simulation-based comparisons provide ways forward beyond offline metrics, and argue that they are a preferable means of evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2023

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

The evaluation of recommendation systems is a complex task. The offline ...
research
11/07/2020

Do Offline Metrics Predict Online Performance in Recommender Systems?

Recommender systems operate in an inherently dynamical setting. Past rec...
research
07/27/2023

On (Normalised) Discounted Cumulative Gain as an Offline Evaluation Metric for Top-n Recommendation

Approaches to recommendation are typically evaluated in one of two ways:...
research
06/17/2020

Causal Meta-Mediation Analysis: Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments

It is common in the internet industry to use offline-developed algorithm...
research
06/26/2022

Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?

Offline evaluation is a popular approach to determine the best algorithm...
research
03/02/2022

Counterfactually Evaluating Explanations in Recommender Systems

Modern recommender systems face an increasing need to explain their reco...
research
07/26/2019

On the Value of Bandit Feedback for Offline Recommender System Evaluation

In academic literature, recommender systems are often evaluated on the t...

Please sign up or login with your details

Forgot password? Click here to reset