Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

12/01/2016
by   Damien Lefortier, et al.
0

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news. Recent approaches for off-policy evaluation and learning in these settings appear promising. With this paper, we provide real-world data and a standardized test-bed to systematically investigate these algorithms using data from display advertising. In particular, we consider the problem of filling a banner ad with an aggregate of multiple products the user may want to purchase. This paper presents our test-bed, the sanity checks we ran to ensure its validity, and shows results comparing state-of-the-art off-policy learning methods like doubly robust optimization, POEM, and reductions to supervised learning using regression baselines. Our results show experimental evidence that recent off-policy learning methods can improve upon state-of-the-art supervised learning techniques on a large-scale real-world data set.

READ FULL TEXT
research
09/09/2019

Deep Reinforcement Learning for Online Advertising in Recommender Systems

With the recent prevalence of Reinforcement Learning (RL), there have be...
research
11/04/2017

An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy

Etsy is a global marketplace where people across the world connect to ma...
research
09/10/2018

Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning

For online advertising in e-commerce, the traditional problem is to assi...
research
01/04/2021

Scalable representation learning and retrieval for display advertising

Over the past decades, recommendation has become a critical component of...
research
08/19/2019

Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds

Most e-commerce product feeds provide blended results of advertised prod...
research
04/01/2022

Model-agnostic Counterfactual Synthesis Policy for Interactive Recommendation

Interactive recommendation is able to learn from the interactive process...
research
05/17/2023

Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

Off-Policy reinforcement learning has been a driving force for the state...

Please sign up or login with your details

Forgot password? Click here to reset