PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

05/25/2023
by   Jianxiong Li, et al.
0

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to diverse methods. Simple yet elegant, PROTO imposes minimal additional computation and enables highly efficient online finetuning. Extensive experiments demonstrate that PROTO achieves superior performance over SOTA baselines, offering an adaptable and efficient offline-to-online RL framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent...
research
06/12/2023

Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...
research
01/25/2022

MOORe: Model-based Offline-to-Online Reinforcement Learning

With the success of offline reinforcement learning (RL), offline trained...
research
02/13/2022

Supported Policy Optimization for Offline Reinforcement Learning

Policy constraint methods to offline reinforcement learning (RL) typical...
research
10/13/2022

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

We consider a hybrid reinforcement learning setting (Hybrid RL), in whic...
research
04/11/2023

Control invariant set enhanced reinforcement learning for process control: improved sampling efficiency and guaranteed stability

Reinforcement learning (RL) is an area of significant research interest,...
research
09/04/2023

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously...

Please sign up or login with your details

Forgot password? Click here to reset