On (Normalised) Discounted Cumulative Gain as an Offline Evaluation Metric for Top-n Recommendation

07/27/2023
by   Olivier Jeunen, et al.
0

Approaches to recommendation are typically evaluated in one of two ways: (1) via a (simulated) online experiment, often seen as the gold standard, or (2) via some offline evaluation procedure, where the goal is to approximate the outcome of an online experiment. Several offline evaluation metrics have been adopted in the literature, inspired by ranking metrics prevalent in the field of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one such metric that has seen widespread adoption in empirical studies, and higher (n)DCG values have been used to present new methods as the state-of-the-art in top-n recommendation for many years. Our work takes a critical look at this approach, and investigates when we can expect such metrics to approximate the gold standard outcome of an online experiment. We formally present the assumptions that are necessary to consider DCG an unbiased estimator of online reward and provide a derivation for this metric from first principles, highlighting where we deviate from its traditional uses in IR. Importantly, we show that normalising the metric renders it inconsistent, in that even when DCG is unbiased, ranking competing methods by their normalised DCG can invert their relative order. Through a correlation analysis between off- and on-line experiments conducted on a large-scale recommendation platform, we show that our unbiased DCG estimates strongly correlate with online reward, even when some of the metric's inherent assumptions are violated. This statement no longer holds for its normalised variant, suggesting that nDCG's practical utility may be limited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Offline Retrieval Evaluation Without Evaluation Metrics

Offline evaluation of information retrieval and recommendation has tradi...
research
09/18/2022

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

Both in academic and industry-based research, online evaluation methods ...
research
12/04/2019

Evaluation Metrics for Item Recommendation under Sampling

The task of item recommendation requires ranking a large catalogue of it...
research
03/31/2010

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Contextual bandit algorithms have become popular for online recommendati...
research
06/17/2020

Causal Meta-Mediation Analysis: Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments

It is common in the internet industry to use offline-developed algorithm...
research
07/07/2022

Batch Evaluation Metrics in Information Retrieval: Measures, Scales, and Meaning

A sequence of recent papers has considered the role of measurement scale...
research
01/19/2023

New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches

In evaluation campaigns, participants often explore variations of popula...

Please sign up or login with your details

Forgot password? Click here to reset