Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

01/31/2018
by   Xiaoqin Zhang, et al.
0

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-critic reinforcement learning algorithms. Also, some existing methods rely on the global optimum assumption, which is not true in most scenarios. In this paper, we employ expert demonstrations in a actor-critic reinforcement learning framework, and meanwhile ensure that the performance is not affected by the fact that expert demonstrations are not global optimal. We theoretically derive a method for computing policy gradients and value estimators with only expert demonstrations. Our method is theoretically plausible for actor-critic reinforcement learning algorithms that pretrains both policy and value functions. We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.

READ FULL TEXT

page 4

page 5

research
05/09/2019

Pretrain Soft Q-Learning with Imperfect Demonstrations

Pretraining reinforcement learning methods with demonstrations has been ...
research
09/16/2018

Inspiration Learning through Preferences

Current imitation learning techniques are too restrictive because they r...
research
10/02/2019

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

In recent years, advances in deep learning have enabled the application ...
research
10/14/2022

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Providing densely shaped reward functions for RL algorithms is often exc...
research
10/23/2018

Efficient Eligibility Traces for Deep Reinforcement Learning

Eligibility traces are an effective technique to accelerate reinforcemen...
research
11/25/2020

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

A learning dialogue agent can infer its behaviour from interactions with...

Please sign up or login with your details

Forgot password? Click here to reset