Reinforcement and Imitation Learning via Interactive No-Regret Learning

06/23/2014
by   Stéphane Ross, et al.
0

Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of actions. We extend existing results in two directions: first, we develop an interactive imitation learning approach that leverages cost information; second, we extend the technique to address reinforcement learning. The results provide theoretical support to the commonly observed successes of online approximate policy iteration. Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2010

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Sequential prediction problems such as imitation learning, where future ...
research
09/26/2022

On Efficient Online Imitation Learning via Classification

Imitation learning (IL) is a general learning paradigm for tackling sequ...
research
02/04/2021

Feedback in Imitation Learning: The Three Regimes of Covariate Shift

Imitation learning practitioners have often noted that conditioning poli...
research
02/17/2021

Fully General Online Imitation Learning

In imitation learning, imitators and demonstrators are policies for pick...
research
02/13/2021

Interactive Learning from Activity Description

We present a novel interactive learning protocol that enables training r...
research
03/04/2021

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning

We provide a unifying view of a large family of previous imitation learn...
research
10/15/2018

Predictor-Corrector Policy Optimization

We present a predictor-corrector framework, called PicCoLO, that can tra...

Please sign up or login with your details

Forgot password? Click here to reset