A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

11/02/2010
by   Stéphane Ross, et al.
0

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

READ FULL TEXT
research
06/23/2014

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Recent work has demonstrated that problems-- particularly imitation lear...
research
09/26/2022

On Efficient Online Imitation Learning via Classification

Imitation learning (IL) is a general learning paradigm for tackling sequ...
research
11/06/2018

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

On-policy imitation learning algorithms such as Dagger evolve a robot co...
research
11/01/2018

Online Learning Algorithms for Statistical Arbitrage

Statistical arbitrage is a class of financial trading strategies using m...
research
07/06/2020

Explaining Fast Improvement in Online Policy Optimization

Online policy optimization (OPO) views policy optimization for sequentia...
research
01/22/2018

Convergence of Value Aggregation for Imitation Learning

Value aggregation is a general framework for solving imitation learning ...
research
11/29/2020

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Thompson sampling (TS) has emerged as a robust technique for contextual ...

Please sign up or login with your details

Forgot password? Click here to reset