DeepAI
Log In Sign Up

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

07/01/2021
by   Seunghyun Lee, et al.
9

Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfamiliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine-tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.

READ FULL TEXT

page 6

page 15

page 17

10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
06/19/2021

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Offline reinforcement learning (RL) tries to learn the near-optimal poli...
10/21/2021

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

End-to-end learning robotic manipulation with high data efficiency is on...
10/09/2022

State Advantage Weighting for Offline RL

We present state advantage weighting for offline reinforcement learning ...
10/07/2021

Offline RL With Resource Constrained Online Deployment

Offline reinforcement learning is used to train policies in scenarios wh...
11/21/2022

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...
11/14/2022

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Modern decision-making systems, from robots to web recommendation engine...