Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

11/16/2019
by   Mingxuan Jing, et al.
0

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2019

Pretrain Soft Q-Learning with Imperfect Demonstrations

Pretraining reinforcement learning methods with demonstrations has been ...
research
10/01/2022

Bayesian Q-learning With Imperfect Expert Demonstrations

Guided exploration with expert demonstrations improves data efficiency f...
research
10/05/2020

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...
research
06/17/2021

Learning from Demonstration without Demonstrations

State-of-the-art reinforcement learning (RL) algorithms suffer from high...
research
02/20/2023

Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot

Task automation of surgical robot has the potentials to improve surgical...
research
12/18/2019

Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took ...
research
09/16/2021

Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise

We consider the problem of learning the behavioral preferences of an exp...

Please sign up or login with your details

Forgot password? Click here to reset