Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

02/02/2023
by   Haichao Zhang, et al.
0

Pre-training with offline data and online fine-tuning using reinforcement learning is a promising strategy for learning control policies by leveraging the best of both worlds in terms of sample efficiency and performance. One natural approach is to initialize the policy for online learning with the one trained offline. In this work, we introduce a policy expansion scheme for this task. After learning the offline policy, we use it as one candidate policy in a policy set. We then expand the policy set with another policy which will be responsible for further learning. The two policies will be composed in an adaptive manner for interacting with the environment. With this approach, the policy previously learned offline is fully retained during online learning, thus mitigating the potential issues such as destroying the useful behaviors of the offline policy in the initial stage of online learning while allowing the offline policy participate in the exploration naturally in an adaptive manner. Moreover, new useful behaviors can potentially be captured by the newly added policy through learning. Experiments are conducted on a number of tasks and the results demonstrate the effectiveness of the proposed approach.

READ FULL TEXT
research
03/30/2023

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent...
research
10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
research
06/18/2021

Active Offline Policy Selection

This paper addresses the problem of policy selection in domains with abu...
research
08/10/2020

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

We propose a novel adaptive reinforcement learning control approach for ...
research
06/16/2023

π2vec: Policy Representations with Successor Features

This paper describes π2vec, a method for representing behaviors of black...
research
03/20/2020

An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

Mobile platforms must satisfy the contradictory requirements of fast res...
research
05/22/2010

Incremental Training of a Detector Using Online Sparse Eigen-decomposition

The ability to efficiently and accurately detect objects plays a very cr...

Please sign up or login with your details

Forgot password? Click here to reset