Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

05/16/2019
by   Ruohan Wang, et al.
0

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on both discrete and continuous domains, achieving comparable or better performance than the state of the art under different reinforcement learning algorithms.

READ FULL TEXT

page 11

page 12

research
05/03/2020

Off-Policy Adversarial Inverse Reinforcement Learning

Adversarial Imitation Learning (AIL) is a class of algorithms in Reinfor...
research
04/06/2019

Reinforced Imitation in Heterogeneous Action Space

Imitation learning is an effective alternative approach to learn a polic...
research
06/10/2016

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interac...
research
04/20/2020

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...
research
08/15/2023

Generating Personas for Games with Multimodal Adversarial Imitation Learning

Reinforcement learning has been widely successful in producing agents ca...
research
06/05/2020

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

The generative adversarial imitation learning (GAIL) has provided an adv...
research
01/19/2020

Discriminator Soft Actor Critic without Extrinsic Rewards

It is difficult to be able to imitate well in unknown states from a smal...

Please sign up or login with your details

Forgot password? Click here to reset