Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

03/09/2023
by   Mitsuhiko Nakamoto, et al.
0

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction. However, several existing offline RL methods tend to exhibit poor online fine-tuning performance. On the other hand, online RL methods can learn effectively through online interaction, but struggle to incorporate offline data, which can make them very slow in settings where exploration is challenging or pre-training is necessary. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL) accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy, which may simply be the behavior policy. We show that offline RL algorithms that learn such calibrated value functions lead to effective online fine-tuning, enabling us to take the benefits of offline initializations in online fine-tuning. In practice, Cal-QL can be implemented on top of existing conservative methods for offline RL within a one-line code change. Empirically, Cal-QL outperforms state-of-the-art methods on 10/11 fine-tuning benchmark tasks that we study in this paper.

READ FULL TEXT
research
06/12/2023

Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...
research
10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
research
10/11/2022

Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

Recent progress in deep learning highlights the tremendous potential of ...
research
06/06/2023

Boosting Offline Reinforcement Learning with Action Preference Query

Training practical agents usually involve offline and online reinforceme...
research
09/25/2019

Pre-training as Batch Meta Reinforcement Learning with tiMe

Pre-training is transformative in supervised learning: a large network t...
research
12/08/2022

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Offline reinforcement learning (RL) promises the ability to learn effect...
research
05/24/2023

Collaborative World Models: An Online-Offline Transfer RL Approach

Training visual reinforcement learning (RL) models in offline datasets i...

Please sign up or login with your details

Forgot password? Click here to reset