MOORe: Model-based Offline-to-Online Reinforcement Learning

01/25/2022
by   Yihuan Mao, et al.
7

With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online. A smooth transfer of the policy matters in safe real-world deployment. Besides, fast adaptation of the policy plays a vital role in practical online performance improvement. To tackle these challenges, we propose a simple yet efficient algorithm, Model-based Offline-to-Online Reinforcement learning (MOORe), which employs a prioritized sampling scheme that can dynamically adjust the offline and online data for smooth and efficient online adaptation of the policy. We provide a theoretical foundation for our algorithms design. Experiment results on the D4RL benchmark show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption, and also significantly outperforms existing methods.

READ FULL TEXT
research
03/30/2023

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent...
research
06/07/2022

On the Role of Discount Factor in Offline Reinforcement Learning

Offline reinforcement learning (RL) enables effective learning from prev...
research
06/03/2022

Offline Reinforcement Learning with Causal Structured World Models

Model-based methods have recently shown promising for offline reinforcem...
research
06/05/2020

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Most reinforcement learning (RL) algorithms assume online access to the ...
research
10/19/2022

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Reinforcement Learning (RL) algorithms can solve challenging control pro...
research
02/10/2021

Personalization for Web-based Services using Offline Reinforcement Learning

Large-scale Web-based services present opportunities for improving UI po...
research
05/25/2023

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

Offline-to-online reinforcement learning (RL), by combining the benefits...

Please sign up or login with your details

Forgot password? Click here to reset