Local Policy Improvement for Recommender Systems

12/22/2022
by   Dawen Liang, et al.
0

Recommender systems aim to answer the following question: given the items that a user has interacted with, what items will this user likely interact with next? Historically this problem is often framed as a predictive task via (self-)supervised learning. In recent years, we have seen more emphasis placed on approaching the recommendation problem from a policy optimization perspective: learning a policy that maximizes some reward function (e.g., user engagement). However, it is commonly the case in recommender systems that we are only able to train a new policy given data collected from a previously-deployed policy. The conventional way to address such a policy mismatch is through importance sampling correction, which unfortunately comes with its own limitations. In this paper, we suggest an alternative approach, which involves the use of local policy improvement without off-policy correction. Drawing from a number of related results in the fields of causal inference, bandits, and reinforcement learning, we present a suite of methods that compute and optimize a lower bound of the expected reward of the target policy. Crucially, this lower bound is a function that is easy to estimate from data, and which does not involve density ratios (such as those appearing in importance sampling correction). We argue that this local policy improvement paradigm is particularly well suited for recommender systems, given that in practice the previously-deployed policy is typically of reasonably high quality, and furthermore it tends to be re-trained frequently and gets continuously updated. We discuss some practical recipes on how to apply some of the proposed techniques in a sequential recommendation setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

Self-Supervised Reinforcement Learning for Recommender Systems

In session-based or sequential recommendation, it is important to consid...
research
12/06/2022

PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement

Current advances in recommender systems have been remarkably successful ...
research
08/20/2018

The Deconfounded Recommender: A Causal Inference Approach to Recommendation

The goal of a recommender system is to show its users items that they wi...
research
06/06/2022

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deplo...
research
04/17/2023

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

Reinforcement learning-based recommender systems have recently gained po...
research
08/08/2022

Fast Offline Policy Optimization for Large Scale Recommendation

Personalised interactive systems such as recommender systems require sel...
research
12/06/2018

Top-K Off-Policy Correction for a REINFORCE Recommender System

Industrial recommender systems deal with extremely large action spaces -...

Please sign up or login with your details

Forgot password? Click here to reset