Selective Sampling and Imitation Learning via Online Regression

07/11/2023
by   Ayush Sekhari, et al.
0

We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t. the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
09/26/2022

On Efficient Online Imitation Learning via Classification

Imitation learning (IL) is a general learning paradigm for tackling sequ...
research
10/19/2020

Online Active Model Selection for Pre-trained Classifiers

Given k pre-trained classifiers and a stream of unlabeled data examples,...
research
10/27/2021

Online Selective Classification with Limited Feedback

Motivated by applications to resource-limited and safety-critical domain...
research
01/03/2023

DADAgger: Disagreement-Augmented Dataset Aggregation

DAgger is an imitation algorithm that aggregates its original datasets b...
research
02/16/2023

Adaptive Selective Sampling for Online Prediction with Experts

We consider online prediction of a binary sequence with expert advice. F...
research
05/26/2020

Active Imitation Learning with Noisy Guidance

Imitation learning algorithms provide state-of-the-art results on many s...

Please sign up or login with your details

Forgot password? Click here to reset