Value Penalized Q-Learning for Recommender Systems

10/15/2021
by   Chengqian Gao, et al.
0

Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data. However, the high-dimensional action space and the non-stationary dynamics in commercial RS intensify distributional shift issues, making it challenging to apply offline RL methods to RS. To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm. It penalizes the unstable Q-values in the regression target by uncertainty-aware weights, without the need to estimate the behavior policy, suitable for RS with a large number of items. We derive the penalty weights from the variances across an ensemble of Q-functions. To alleviate distributional shift issues at test time, we further introduce the critic framework to integrate the proposed method with classic RS models. Extensive experiments conducted on two real-world datasets show that the proposed method could serve as a gain plugin for existing RS models.

READ FULL TEXT
research
10/18/2021

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System

Reinforcement learning based recommender systems (RL-based RS) aims at l...
research
11/05/2021

Supervised Advantage Actor-Critic for Recommender Systems

Casting session-based or sequential recommendation as reinforcement lear...
research
02/13/2023

On Modeling Long-Term User Engagement from Stochastic Feedback

An ultimate goal of recommender systems (RS) is to improve user engageme...
research
07/14/2021

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

In high-dimensional state spaces, the usefulness of Reinforcement Learni...
research
08/09/2022

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Recommender System (RS) is an important online application that affects ...
research
05/30/2023

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have demonstrated prom...
research
07/18/2022

Back to the Manifold: Recovering from Out-of-Distribution States

Learning from previously collected datasets of expert data offers the pr...

Please sign up or login with your details

Forgot password? Click here to reset