Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

11/10/2019
by   Xueying Bai, et al.
0

Reinforcement learning is effective in optimizing policies for recommender systems. Current solutions mostly focus on model-free approaches, which require frequent interactions with a real environment, and thus are expensive in model learning. Offline evaluation methods, such as importance sampling, can alleviate such limitations, but usually request a large amount of logged data and do not work well when the action space is large. In this work, we propose a model-based reinforcement learning solution which models the user-agent interaction for offline policy learning via a generative adversarial network. To reduce bias in the learnt policy, we use the discriminator to evaluate the quality of generated sequences and rescale the generated rewards. Our theoretical analysis and empirical evaluations demonstrate the effectiveness of our solution in identifying patterns from given offline data and learning policies based on the offline and generated data.

READ FULL TEXT
research
11/10/2019

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
06/04/2022

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

Value function estimation is an indispensable subroutine in reinforcemen...
research
02/21/2023

Adversarial Model for Offline Reinforcement Learning

We propose a novel model-based offline Reinforcement Learning (RL) frame...
research
08/12/2020

Model-Based Offline Planning

Offline learning is a key part of making reinforcement learning (RL) use...
research
04/17/2023

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

Reinforcement learning-based recommender systems have recently gained po...
research
10/05/2020

Offline Learning for Planning: A Summary

The training of autonomous agents often requires expensive and unsafe tr...
research
11/08/2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

We propose a new model-based offline RL framework, called Adversarial Mo...

Please sign up or login with your details

Forgot password? Click here to reset