Learning to reinforcement learn

11/17/2016
by   Jane X Wang, et al.
0

In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.

READ FULL TEXT

page 12

page 13

02/11/2023

A large parametrized space of meta-reinforcement learning tasks

We describe a parametrized space for simple meta-reinforcement-learning ...
01/19/2023

A Survey of Meta-Reinforcement Learning

While deep reinforcement learning (RL) has fueled multiple high-profile ...
05/28/2021

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

The generalization ability of most meta-reinforcement learning (meta-RL)...
10/08/2021

Offline Meta-Reinforcement Learning for Industrial Insertion

Reinforcement learning (RL) can in principle make it possible for robots...
07/11/2021

Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results

We study the problem of out-of-distribution dynamics (OODD) detection, w...
02/08/2022

Local Explanations for Reinforcement Learning

Many works in explainable AI have focused on explaining black-box classi...

Code Repositories

gym-bandit-environments

Multi-armed bandits environments for OpenAI Gym


view repo

Please sign up or login with your details

Forgot password? Click here to reset