Model-Based Imitation Learning with Accelerated Convergence

06/12/2018

∙

Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, a specific type of imitation learning that interleaves policy evaluation and policy optimization, is a particularly effective framework for training policies with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. When a dynamics model is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox by Juditsky et al. (2011), and admit a simple constructive FTL-style analysis of performance. The algorithms are also empirically validated in simulation.

READ FULL TEXT

Model-Based Imitation Learning with Accelerated Convergence

Sign in with Google

Consider DeepAI Pro