Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

03/20/2023
by   Botao Hao, et al.
0

In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its cumulative Bayesian regret goes down to zero exponentially fast in N, the offline dataset size if the expert is competent enough. Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning. Our empirical results show that the proposed iRLSVI algorithm is able to achieve significant reduction in regret as compared to two baselines: no offline data, and offline dataset but used without information about the generative policy. Our algorithm bridges online RL and imitation learning for the first time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2023

DITTO: Offline Imitation Learning with World Models

We propose DITTO, an offline imitation learning algorithm which uses wor...
research
08/19/2021

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

In generative adversarial imitation learning (GAIL), the agent aims to l...
research
03/22/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...
research
11/03/2021

Curriculum Offline Imitation Learning

Offline reinforcement learning (RL) tasks require the agent to learn fro...
research
11/29/2020

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Thompson sampling (TS) has emerged as a robust technique for contextual ...
research
02/07/2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters

We investigate the extent to which offline demonstration data can improv...
research
07/31/2022

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Learning robotic tasks in the real world is still highly challenging and...

Please sign up or login with your details

Forgot password? Click here to reset