Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

05/03/2021
by   Xiaocong Chen, et al.
0

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. Reward function is crucial for most of reinforcement learning applications as it can provide the guideline about the optimization. However, current reinforcement-learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic and noisy environments. Besides, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modelling, to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general way of characterizing and explaining underlying behavioral tendencies, and our experiments show our method outperforms state-of-the-art methods in a variety of scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

READ FULL TEXT

page 3

page 9

research
01/19/2020

Discriminator Soft Actor Critic without Extrinsic Rewards

It is difficult to be able to imitate well in unknown states from a smal...
research
10/12/2012

Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics

We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA...
research
08/15/2017

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same s...
research
11/07/2010

Reinforcement Learning Based on Active Learning Method

In this paper, a new reinforcement learning approach is proposed which i...
research
05/17/2019

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

We assume that we are given a time series of data from a dynamical syste...
research
09/27/2021

From internal models toward metacognitive AI

In several papers published in Biological Cybernetics in the 1980s and 1...
research
04/28/2020

Learned Garbage Collection

Several programming languages use garbage collectors (GCs) to automatica...

Please sign up or login with your details

Forgot password? Click here to reset