Model-Based Offline Meta-Reinforcement Learning with Regularization

02/07/2022
by   Sen Lin, et al.
0

Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balance has to be delicately calibrated between "exploring" the out-of-distribution state-actions by following the meta-policy and "exploiting" the offline dataset by staying close to the behavior policy. Motivated by such empirical analysis, we explore model-based offline Meta-RL with regularized Policy Optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative meta-policy for safe exploration of out-of-distribution state-actions. In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy. We theoretically show that the learnt policy offers guaranteed improvement over both the behavior policy and the meta-policy, thus ensuring the performance improvement on new tasks via offline Meta-RL. Experiments corroborate the superior performance of MerPO over existing offline Meta-RL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where exist...
research
10/02/2021

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for tra...
research
10/14/2022

Mutual Information Regularized Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning an effective policy...
research
10/02/2020

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

We study the offline meta-reinforcement learning (OMRL) problem, a parad...
research
10/12/2022

Efficient Offline Policy Optimization with a Learned Model

MuZero Unplugged presents a promising approach for offline policy learni...
research
10/24/2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...
research
10/13/2022

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Model-based offline reinforcement learning (RL) aims to find highly rewa...

Please sign up or login with your details

Forgot password? Click here to reset