DeepAI AI Chat
Log In Sign Up

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

11/16/2019
by   Mingxuan Jing, et al.
Tsinghua University
0

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/09/2019

Pretrain Soft Q-Learning with Imperfect Demonstrations

Pretraining reinforcement learning methods with demonstrations has been ...
10/01/2022

Bayesian Q-learning With Imperfect Expert Demonstrations

Guided exploration with expert demonstrations improves data efficiency f...
10/05/2020

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...
06/17/2021

Learning from Demonstration without Demonstrations

State-of-the-art reinforcement learning (RL) algorithms suffer from high...
03/03/2023

Guarded Policy Optimization with Imperfect Online Demonstrations

The Teacher-Student Framework (TSF) is a reinforcement learning setting ...
12/18/2019

Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took ...
09/16/2021

Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise

We consider the problem of learning the behavioral preferences of an exp...