DeepAI AI Chat
Log In Sign Up

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System

by   Kai Wang, et al.
NetEase, Inc.

Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, with casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL researches, RL-based RS suffer from extrapolation error and the difficulties of being well validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.


page 1

page 2

page 3

page 4


NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning a good policy from ...

RecSim: A Configurable Simulation Platform for Recommender Systems

We propose RecSim, a configurable platform for authoring simulation envi...

Value Penalized Q-Learning for Recommender Systems

Scaling reinforcement learning (RL) to recommender systems (RS) is promi...

Automatic Representation for Lifetime Value Recommender Systems

Many modern commercial sites employ recommender systems to propose relev...

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have demonstrated prom...

Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives

In this paper, we argue that the paradigm commonly adopted for offline e...

Supervised Advantage Actor-Critic for Recommender Systems

Casting session-based or sequential recommendation as reinforcement lear...