Bridging the Imitation Gap by Adaptive Insubordination

07/23/2020
by   Luca Weihs, et al.
5

Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR), which dynamically reweights imitation and reward-based reinforcement learning losses during training, enabling switching between imitation and exploration. On a suite of challenging tasks, we show that ADVISOR outperforms pure imitation, pure reinforcement learning, as well as sequential combinations of these approaches.

READ FULL TEXT
research
10/22/2020

Error Bounds of Imitating Policies and Environments

Imitation learning trains a policy by mimicking expert demonstrations. V...
research
07/24/2019

Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy

This paper proposes a method for learning a trajectory-conditioned polic...
research
03/11/2019

Hybrid Reinforcement Learning with Expert State Sequences

Existing imitation learning approaches often require that the complete d...
research
10/13/2021

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

We consider the problem of using expert data with unobserved confounders...
research
11/12/2019

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Jap...
research
06/03/2011

Accelerating Reinforcement Learning through Implicit Imitation

Imitation can be viewed as a means of enhancing learning in multiagent e...
research
04/19/2023

Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation

Domain-adaptive trajectory imitation is a skill that some predators lear...

Please sign up or login with your details

Forgot password? Click here to reset