Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

by   Yi Zhao, et al.

Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and would further need to be fine-tuned online by interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. While constraints enforced by offline RL methods such as a behaviour cloning loss prevent this to an extent, these constraints also significantly slow down online fine-tuning by forcing the agent to stay close to the behavior policy. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Moreover, we use a randomized ensemble of Q functions to further increase the sample efficiency of online fine-tuning by performing a large number of learning updates. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark. Code is available: <>.


page 1

page 5

page 7

page 8


Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Recent advance in deep offline reinforcement learning (RL) has made it p...

Learning to Assist Agents by Observing Them

The ability of an AI agent to assist other agents, such as humans, is an...

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtai...

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

Pre-training with offline data and online fine-tuning using reinforcemen...

Deep reinforcement learning for smart calibration of radio telescopes

Modern radio telescopes produce unprecedented amounts of data, which are...

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Offline reinforcement learning agents seek optimal policies from fixed d...

Please sign up or login with your details

Forgot password? Click here to reset