Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

05/17/2023
by   Gen Li, et al.
1

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and model-based offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds – pure offline RL and pure online RL – in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...
research
10/13/2022

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

We consider a hybrid reinforcement learning setting (Hybrid RL), in whic...
research
12/16/2022

Offline Reinforcement Learning for Visual Navigation

Reinforcement learning can enable robots to navigate to distant goals wh...
research
02/23/2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

In many contemporary applications such as healthcare, finance, robotics,...
research
11/21/2022

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...
research
06/05/2023

Survival Instinct in Offline Reinforcement Learning

We present a novel observation about the behavior of offline reinforceme...
research
01/23/2023

Learning to View: Decision Transformers for Active Object Detection

Active perception describes a broad class of techniques that couple plan...

Please sign up or login with your details

Forgot password? Click here to reset