Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

07/01/2021
by   Seunghyun Lee, et al.
9

Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfamiliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine-tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.

READ FULL TEXT

page 6

page 15

page 17

research
10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
research
03/30/2023

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent...
research
06/11/2023

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

We consider the problem of learning the best possible policy from a fixe...
research
03/13/2023

Deploying Offline Reinforcement Learning with Human Feedback

Reinforcement learning (RL) has shown promise for decision-making tasks ...
research
06/06/2023

Boosting Offline Reinforcement Learning with Action Preference Query

Training practical agents usually involve offline and online reinforceme...
research
10/07/2021

Offline RL With Resource Constrained Online Deployment

Offline reinforcement learning is used to train policies in scenarios wh...
research
07/21/2023

Model-based Offline Reinforcement Learning with Count-based Conservatism

In this paper, we propose a model-based offline reinforcement learning m...

Please sign up or login with your details

Forgot password? Click here to reset