Leveraging Demonstrations to Improve Online Learning: Quality Matters

02/07/2023
by   Botao Hao, et al.
0

We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrapping and show substantial empirical regret reduction through experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2023

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

In this paper, we address the following problem: Given an offline demons...
research
07/25/2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve fu...
research
07/24/2018

Decision Variance in Online Learning

Online learning has classically focused on the expected behaviour of lea...
research
09/23/2022

An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

Recently a multi-agent variant of the classical multi-armed bandit was p...
research
06/23/2016

Adaptive Task Assignment in Online Learning Environments

With the increasing popularity of online learning, intelligent tutoring ...
research
01/29/2019

Decentralized Online Learning: Take Benefits from Others' Data without Sharing Your Own to Track Global Trend

Decentralized Online Learning (online learning in decentralized networks...
research
09/19/2023

Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing

The problem of coordinated data collection is studied for a mobile crowd...

Please sign up or login with your details

Forgot password? Click here to reset