DeepAI AI Chat
Log In Sign Up

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

05/16/2019
by   Ruohan Wang, et al.
Imperial College London
0

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on both discrete and continuous domains, achieving comparable or better performance than the state of the art under different reinforcement learning algorithms.

READ FULL TEXT

page 11

page 12

05/03/2020

Off-Policy Adversarial Inverse Reinforcement Learning

Adversarial Imitation Learning (AIL) is a class of algorithms in Reinfor...
04/06/2019

Reinforced Imitation in Heterogeneous Action Space

Imitation learning is an effective alternative approach to learn a polic...
06/10/2016

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interac...
04/20/2020

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...
06/05/2020

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

The generative adversarial imitation learning (GAIL) has provided an adv...
01/19/2020

Discriminator Soft Actor Critic without Extrinsic Rewards

It is difficult to be able to imitate well in unknown states from a smal...
06/28/2020

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

Despite the recent success of reinforcement learning in various domains,...