Supervised Advantage Actor-Critic for Recommender Systems

11/05/2021
by   Xin Xin, et al.
5

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods . Code will be open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

Self-Supervised Reinforcement Learning for Recommender Systems

In session-based or sequential recommendation, it is important to consid...
research
05/30/2023

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have demonstrated prom...
research
10/15/2021

Value Penalized Q-Learning for Recommender Systems

Scaling reinforcement learning (RL) to recommender systems (RS) is promi...
research
06/10/2020

Self-Supervised Reinforcement Learning forRecommender Systems

In session-based or sequential recommendation, it is important to consid...
research
02/07/2023

Multi-Task Recommendations with Reinforcement Learning

In recent years, Multi-task Learning (MTL) has yielded immense success i...
research
11/14/2020

A Geometric Perspective on Self-Supervised Policy Adaptation

One of the most challenging aspects of real-world reinforcement learning...
research
07/04/2018

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

Dynamic treatment recommendation systems based on large-scale electronic...

Please sign up or login with your details

Forgot password? Click here to reset