Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

06/12/2023
by   Kai Zhao, et al.
0

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called Ensemble-based Offline-to-Online (E2O) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that E2O can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
research
03/09/2023

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtai...
research
05/17/2023

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

This paper studies tabular reinforcement learning (RL) in the hybrid set...
research
03/13/2023

Deploying Offline Reinforcement Learning with Human Feedback

Reinforcement learning (RL) has shown promise for decision-making tasks ...
research
05/25/2023

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

Offline-to-online reinforcement learning (RL), by combining the benefits...
research
07/10/2019

Striving for Simplicity in Off-policy Deep Reinforcement Learning

Reflecting on the advances of off-policy deep reinforcement learning (RL...
research
07/18/2023

REX: Rapid Exploration and eXploitation for AI Agents

In this paper, we propose an enhanced approach for Rapid Exploration and...

Please sign up or login with your details

Forgot password? Click here to reset