Fully General Online Imitation Learning

02/17/2021
by   Michael K. Cohen, et al.
15

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2012

Active Imitation Learning via Reduction to I.I.D. Active Learning

In standard passive imitation learning, the goal is to learn a target po...
research
06/23/2014

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Recent work has demonstrated that problems-- particularly imitation lear...
research
01/19/2018

Global overview of Imitation Learning

Imitation Learning is a sequential task where the learner tries to mimic...
research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
05/26/2020

Active Imitation Learning with Noisy Guidance

Imitation learning algorithms provide state-of-the-art results on many s...
research
02/04/2021

Feedback in Imitation Learning: The Three Regimes of Covariate Shift

Imitation learning practitioners have often noted that conditioning poli...
research
08/18/2020

How to organize a hackathon – A planning kit

Hackathons and similar time-bounded events have become a global phenomen...

Please sign up or login with your details

Forgot password? Click here to reset