Pretrain Soft Q-Learning with Imperfect Demonstrations

05/09/2019
by   Xiaoqin Zhang, et al.
0

Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms. Pretraining reinforcement learning remains a significant challenge in exploiting expert demonstrations whilst keeping exploration potentials, especially for value based methods. In this paper, we propose a pretraining method for soft Q-learning. Our work is inspired by pretraining methods for actor-critic algorithms since soft Q-learning is a value based algorithm that is equivalent to policy gradient. The proposed method is based on γ-discounted biased policy evaluation with entropy regularization, which is also the updating target of soft Q-learning. Our method is evaluated on various tasks from Atari 2600. Experiments show that our method effectively learns from imperfect demonstrations, and outperforms other state-of-the-art methods that learn from expert demonstrations.

READ FULL TEXT
research
01/31/2018

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speedin...
research
11/16/2019

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

In this paper, we study Reinforcement Learning from Demonstrations (RLfD...
research
10/27/2021

Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling

During recent years, deep reinforcement learning (DRL) has made successf...
research
09/17/2022

Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control

Flocking control is a significant problem in multi-agent systems such as...
research
10/01/2022

Bayesian Q-learning With Imperfect Expert Demonstrations

Guided exploration with expert demonstrations improves data efficiency f...
research
11/25/2020

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

A learning dialogue agent can infer its behaviour from interactions with...

Please sign up or login with your details

Forgot password? Click here to reset