D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

10/26/2022
by   Caroline Wang, et al.
0

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward. If, however, suboptimal demonstrations are provided, a fundamental challenge appears in that the demonstration-matching objective of IL conflicts with the return-maximization objective of RL. This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict. D-Shape allows learning from suboptimal demonstrations while retaining the ability to find the optimal policy with respect to the task reward. We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy in the presence of suboptimal demonstrations.

READ FULL TEXT
research
07/25/2021

Reinforced Imitation Learning by Free Energy Principle

Reinforcement Learning (RL) requires a large amount of exploration espec...
research
10/20/2022

Task Phasing: Automated Curriculum Learning from Demonstrations

Applying reinforcement learning (RL) to sparse reward domains is notorio...
research
07/20/2022

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

We study the problem of offline Imitation Learning (IL) where an agent a...
research
03/21/2022

Self-Imitation Learning from Demonstrations

Despite the numerous breakthroughs achieved with Reinforcement Learning ...
research
12/30/2021

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Inferring reward functions from demonstrations and pairwise preferences ...
research
10/16/2020

Learning Dexterous Manipulation from Suboptimal Experts

Learning dexterous manipulation in high-dimensional state-action spaces ...
research
12/16/2020

Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

Learning to produce efficient movement behaviour for humanoid robots fro...

Please sign up or login with your details

Forgot password? Click here to reset