Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

05/29/2019
by   Eugene Ie, et al.
1

Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items - which may have interacting effects on user choice - methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. (ii) We outline a methodology that leverages existing myopic learning-based recommenders to quickly develop a recommender that handles LTV. (iii) We demonstrate our methods in simulation, and validate the scalability of decomposed TD-learning using SLATEQ in live experiments on YouTube.

READ FULL TEXT
research
05/23/2023

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Auction-based recommender systems are prevalent in online advertising pl...
research
02/17/2022

Should I send this notification? Optimizing push notifications decision making by modeling the future

Most recommender systems are myopic, that is they optimize based on the ...
research
01/20/2023

Generative Slate Recommendation with Reinforcement Learning

Recent research has employed reinforcement learning (RL) algorithms to o...
research
12/20/2020

Reinforcement Learning-based Product Delivery Frequency Control

Frequency control is an important problem in modern recommender systems....
research
09/13/2019

Towards an Adaptive Robot for Sports and Rehabilitation Coaching

The work presented in this paper aims to explore how, and to what extent...
research
02/07/2023

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

We study the problem of optimizing a recommender system for outcomes tha...

Please sign up or login with your details

Forgot password? Click here to reset