On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

12/28/2022
by   Tim G. J. Rudner, et al.
0

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

READ FULL TEXT

page 2

page 9

page 20

page 21

page 22

research
12/07/2022

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Deep reinforcement learning (DRL) provides a new way to generate robot c...
research
10/03/2016

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Reinforcement learning holds the promise of enabling autonomous robots t...
research
07/12/2019

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Imitation learning, followed by reinforcement learning algorithms, is a ...
research
05/11/2022

Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II

Traditionally, learning from human demonstrations via direct behavior cl...
research
09/01/2021

Implicit Behavioral Cloning

We find that across a wide range of robot policy learning scenarios, tre...
research
09/29/2022

Blessing from Experts: Super Reinforcement Learning in Confounded Environments

We introduce super reinforcement learning in the batch setting, which ta...
research
01/20/2022

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

The ability to discover behaviours from past experience and transfer the...

Please sign up or login with your details

Forgot password? Click here to reset