MBCAL: A Simple and Efficient Reinforcement Learning Method for Recommendation Systems

by   Fan Wang, et al.

It has been widely regarded that only considering the immediate user feedback is not sufficient for modern industrial recommendation systems. Many previous works attempt to maximize the long term rewards with Reinforcement Learning(RL). However, model-free RL suffers from problems including significant variance in gradient, long convergence period, and requirement of sophisticated online infrastructures. While model-based RL provides a sample-efficient choice, the cost of planning in an online system is unacceptable. To achieve high sample efficiency in practical situations, we propose a novel model-based reinforcement learning method, namely the model-based counterfactual advantage learning(MBCAL). In the proposed method, a masking item is introduced in the environment model learning. With the masking item and the environment model, we introduce the counterfactual future advantage, which eliminates most of the noises in long term rewards. The proposed method selects through approximating the immediate reward and future advantage separately. It is easy to implement, yet it requires reasonable cost in both training and inference processes. In the experiments, we compare our methods with several baselines, including supervised learning, model-free RL, and other model-based RL methods in carefully designed experiments. Results show that our method transcends all the baselines in both sample efficiency and asymptotic performance.


page 1

page 2

page 3

page 4


Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

Model-free RL-based recommender systems have recently received increasin...

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

Discovering achievements with a hierarchical structure on procedurally g...

An End-to-End Deep RL Framework for Task Arrangement in Crowdsourcing Platforms

In this paper, we propose a Deep Reinforcement Learning (RL) framework f...

Data-Efficient Reinforcement Learning for Malaria Control

Sequential decision-making under cost-sensitive tasks is prohibitively d...

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

In model-based reinforcement learning, the agent interleaves between mod...

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

Deep model-based Reinforcement Learning (RL) has the potential to substa...

Benchmarking Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is widely seen as having the p...

Please sign up or login with your details

Forgot password? Click here to reset