Model-Based Imitation Learning with Accelerated Convergence

06/12/2018
by   Ching-An Cheng, et al.
0

Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, a specific type of imitation learning that interleaves policy evaluation and policy optimization, is a particularly effective framework for training policies with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. When a dynamics model is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox by Juditsky et al. (2011), and admit a simple constructive FTL-style analysis of performance. The algorithms are also empirically validated in simulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Sample-Efficient On-Policy Imitation Learning from Observations

Imitation learning from demonstrations (ILD) aims to alleviate numerous ...
research
01/22/2018

Convergence of Value Aggregation for Imitation Learning

Value aggregation is a general framework for solving imitation learning ...
research
09/26/2022

On Efficient Online Imitation Learning via Classification

Imitation learning (IL) is a general learning paradigm for tackling sequ...
research
07/29/2022

Improved Policy Optimization for Online Imitation Learning

We consider online imitation learning (OIL), where the task is to find a...
research
10/31/2022

Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning

The permutation flow shop scheduling (PFSS), aiming at finding the optim...
research
09/27/2022

Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Seamlessly integrating rules in Learning-from-Demonstrations (LfD) polic...
research
12/09/2020

Neural Rate Control for Video Encoding using Imitation Learning

In modern video encoders, rate control is a critical component and has b...

Please sign up or login with your details

Forgot password? Click here to reset