Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

05/31/2023
by   Jianhao Wang, et al.
0

Recent offline meta-reinforcement learning (meta-RL) methods typically utilize task-dependent behavior policies (e.g., training RL agents on each individual task) to collect a multi-task dataset. However, these methods always require extra information for fast adaptation, such as offline context for testing tasks. To address this problem, we first formally characterize a unique challenge in offline meta-RL: transition-reward distribution shift between offline datasets and online adaptation. Our theory finds that out-of-distribution adaptation episodes may lead to unreliable policy evaluation and that online adaptation with in-distribution episodes can ensure adaptation performance guarantee. Based on these theoretical insights, we propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ), which generates in-distribution context using a given uncertainty quantification and performs effective task belief inference to address new tasks. We find a return-based uncertainty quantification for IDAQ that performs effectively. Experiments show that IDAQ achieves state-of-the-art performance on the Meta-World ML1 benchmark compared to baselines with/without offline adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2021

Offline Meta-Reinforcement Learning with Online Self-Supervision

Meta-reinforcement learning (RL) can meta-train policies that adapt to n...
research
02/23/2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previous...
research
04/01/2023

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Offline meta reinforcement learning (OMRL) aims to learn transferrable k...
research
02/23/2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

In many contemporary applications such as healthcare, finance, robotics,...
research
05/29/2023

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

Diffusion models have demonstrated highly-expressive generative capabili...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...
research
01/12/2021

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta...

Please sign up or login with your details

Forgot password? Click here to reset