Collaborative World Models: An Online-Offline Transfer RL Approach

05/24/2023
by   Qi Wang, et al.
0

Training visual reinforcement learning (RL) models in offline datasets is challenging due to overfitting issues in representation learning and overestimation problems in value function. In this paper, we propose a transfer learning method called Collaborative World Models (CoWorld) to improve the performance of visual RL under offline conditions. The core idea is to use an easy-to-interact, off-the-shelf simulator to train an auxiliary RL model as the online "test bed" for the offline policy learned in the target domain, which provides a flexible constraint for the value function – Intuitively, we want to mitigate the overestimation problem of value functions outside the offline data distribution without impeding the exploration of actions with potential advantages. Specifically, CoWorld performs domain-collaborative representation learning to bridge the gap between online and offline hidden state distributions. Furthermore, it performs domain-collaborative behavior learning that enables the source RL agent to provide target-aware value estimation, allowing for effective offline policy regularization. Experiments show that CoWorld significantly outperforms existing methods in offline visual control tasks in DeepMind Control and Meta-World.

READ FULL TEXT

page 3

page 7

page 13

page 14

page 15

research
06/19/2021

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Offline reinforcement learning (RL) tries to learn the near-optimal poli...
research
06/06/2023

Vid2Act: Activate Offline Videos for Visual RL

Pretraining RL models on offline video datasets is a promising way to im...
research
11/09/2021

Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

Reinforcement Learning (RL) has been shown effective in domains where th...
research
03/09/2023

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtai...
research
02/01/2023

Selective Uncertainty Propagation in Offline RL

We study the finite-horizon offline reinforcement learning (RL) problem....
research
03/03/2023

Learning to Influence Human Behavior with Offline Reinforcement Learning

In the real world, some of the most complex settings for learned agents ...
research
07/01/2022

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world...

Please sign up or login with your details

Forgot password? Click here to reset