Learning to reinforcement learn

by   Jane X Wang, et al.

In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.


page 12

page 13


A large parametrized space of meta-reinforcement learning tasks

We describe a parametrized space for simple meta-reinforcement-learning ...

A Survey of Meta-Reinforcement Learning

While deep reinforcement learning (RL) has fueled multiple high-profile ...

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

The generalization ability of most meta-reinforcement learning (meta-RL)...

Offline Meta-Reinforcement Learning for Industrial Insertion

Reinforcement learning (RL) can in principle make it possible for robots...

Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results

We study the problem of out-of-distribution dynamics (OODD) detection, w...

Local Explanations for Reinforcement Learning

Many works in explainable AI have focused on explaining black-box classi...

Code Repositories


Multi-armed bandits environments for OpenAI Gym

view repo

Please sign up or login with your details

Forgot password? Click here to reset