DeepAI AI Chat
Log In Sign Up

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

by   Ruohan Wang, et al.
Imperial College London

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on both discrete and continuous domains, achieving comparable or better performance than the state of the art under different reinforcement learning algorithms.


page 11

page 12


Off-Policy Adversarial Inverse Reinforcement Learning

Adversarial Imitation Learning (AIL) is a class of algorithms in Reinfor...

Reinforced Imitation in Heterogeneous Action Space

Imitation learning is an effective alternative approach to learn a polic...

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interac...

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

The generative adversarial imitation learning (GAIL) has provided an adv...

Discriminator Soft Actor Critic without Extrinsic Rewards

It is difficult to be able to imitate well in unknown states from a smal...

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

Despite the recent success of reinforcement learning in various domains,...