Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation

09/17/2022
by   Xiaocong Chen, et al.
0

Deep reinforcement learning (DRL) has been proven its efficiency in capturing users' dynamic interests in recent literature. However, training a DRL agent is challenging, because of the sparse environment in recommender systems (RS), DRL agents could spend times either exploring informative user-item interaction trajectories or using existing trajectories for policy learning. It is also known as the exploration and exploitation trade-off which affects the recommendation performance significantly when the environment is sparse. It is more challenging to balance the exploration and exploitation in DRL RS where RS agent need to deeply explore the informative trajectories and exploit them efficiently in the context of recommender systems. As a step to address this issue, We design a novel intrinsically ,otivated reinforcement learning method to increase the capability of exploring informative interaction trajectories in the sparse environment, which are further enriched via a counterfactual augmentation strategy for more efficient exploitation. The extensive experiments on six offline datasets and three online simulation platforms demonstrate the superiority of our model to a set of existing state-of-the-art methods.

READ FULL TEXT

page 1

page 10

research
08/10/2022

Deep Reinforcement Learning for Dynamic Recommendation with Model-agnostic Counterfactual Policy Synthesis

Recent advances in recommender systems have proved the potential of Rein...
research
08/22/2023

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

Reinforcement learning serves as a potent tool for modeling dynamic user...
research
03/12/2023

AutoDenoise: Automatic Data Instance Denoising for Recommendations

Historical user-item interaction datasets are essential in training mode...
research
05/14/2014

Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques

In many recommendation applications such as news recommendation, the ite...
research
10/21/2021

Locality-Sensitive Experience Replay for Online Recommendation

Online recommendation requires handling rapidly changing user preference...
research
12/29/2018

Learn to Interpret Atari Agents

Deep Reinforcement Learning (DeepRL) models surpass human-level performa...
research
05/28/2019

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

A fundamental issue in reinforcement learning algorithms is the balance ...

Please sign up or login with your details

Forgot password? Click here to reset