Offline Meta Reinforcement Learning

08/06/2020
by   Ron Dorfman, et al.
13

Consider the following problem, which we term Offline Meta Reinforcement Learning (OMRL): given the complete training histories of N conventional RL agents, trained on N different tasks, design a learning agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the OMRL agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. To solve OMRL, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the offline data. We extend the recently proposed VariBAD BRL algorithm to the off-policy setting, and demonstrate learning of Bayes-optimal exploration strategies from offline data using deep neural networks. Furthermore, when applied to the online meta-RL setting (agent simultaneously collects data and improves its meta-RL policy), our method is significantly more sample efficient than the conventional VariBAD.

READ FULL TEXT
research
07/08/2021

Offline Meta-Reinforcement Learning with Online Self-Supervision

Meta-reinforcement learning (RL) can meta-train policies that adapt to n...
research
06/21/2022

Meta Reinforcement Learning with Finite Training Tasks – a Density Estimation Approach

In meta reinforcement learning (meta RL), an agent learns from a set of ...
research
01/27/2022

The Challenges of Exploration for Offline Reinforcement Learning

Offline Reinforcement Learning (ORL) enablesus to separately study the t...
research
01/18/2023

Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in su...
research
10/18/2019

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Trading off exploration and exploitation in an unknown environment is ke...
research
06/04/2023

ContraBAR: Contrastive Bayes-Adaptive Deep RL

In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal...
research
02/24/2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

Upside down reinforcement learning (UDRL) flips the conventional use of ...

Please sign up or login with your details

Forgot password? Click here to reset