Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

11/02/2020
by   Yuchen Wu, et al.
0

The potential benefits of model-free reinforcement learning to real robotics systems are limited by its uninformed exploration that leads to slow convergence, lack of data-efficiency, and unnecessary interactions with the environment. To address these drawbacks we propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. Unlike the majority of existing methods that assume optimal demonstrations and incorporate the demonstration data as hard constraints on policy optimization, we instead incorporate demonstration data as advice in the form of a reward shaping potential trained as a generative model of states and actions. In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials. We show that, unlike many existing approaches that incorporate demonstrations as hard constraints, our approach is unbiased even in the case of suboptimal and noisy demonstrations. We present an extensive range of simulations, as well as experiments on the Franka Emika 7DOF arm, to demonstrate the practicality of our method.

READ FULL TEXT

page 1

page 6

research
02/14/2018

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and i...
research
06/19/2019

Wasserstein Adversarial Imitation Learning

Imitation Learning describes the problem of recovering an expert policy ...
research
05/19/2022

IL-flOw: Imitation Learning from Observation using Normalizing Flows

We present an algorithm for Inverse Reinforcement Learning (IRL) from ex...
research
02/22/2017

Counterfactual Control for Free from Generative Models

We introduce a method by which a generative model learning the joint dis...
research
10/08/2021

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
10/26/2022

Leveraging Demonstrations with Latent Space Priors

Demonstrations provide insight into relevant state or action space regio...
research
06/09/2023

Value function estimation using conditional diffusion models for control

A fairly reliable trend in deep reinforcement learning is that the perfo...

Please sign up or login with your details

Forgot password? Click here to reset