Meta reinforcement learning as task inference

by   Jan Humplik, et al.

Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There has been considerable interest in designing reinforcement learning algorithms with similar properties. This includes several proposals to learn the learning algorithm itself, an idea also referred to as meta learning. One formal interpretation of this idea is in terms of a partially observable multi-task reinforcement learning problem in which information about the task is hidden from the agent. Although agents that solve partially observable environments can be trained from rewards alone, shaping an agent's memory with additional supervision has been shown to boost learning efficiency. It is thus natural to ask what kind of supervision, if any, facilitates meta-learning. Here we explore several choices and develop an architecture that separates learning of the belief about the unknown task from learning of the policy, and that can be used effectively with privileged information about the task during training. We show that this approach can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment in which a simulated robot has to execute various movement sequences.


page 7

page 9

page 11


Unsupervised Meta-Learning for Reinforcement Learning

Meta-learning is a powerful tool that builds on multi-task learning to l...

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Recurrent meta reinforcement learning (meta-RL) agents are agents that e...

Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning

A fundamental aspect of behaviour is the ability to encode salient featu...

Been There, Done That: Meta-Learning with Episodic Recall

Meta-learning agents excel at rapidly learning new tasks from open-ended...

Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization

At first sight it may seem straightforward to use recurrent layers in De...

Collision Avoidance Robotics Via Meta-Learning (CARML)

This paper presents an approach to exploring a multi-objective reinforce...

Using Meta Reinforcement Learning to Bridge the Gap between Simulation and Experiment in Energy Demand Response

Our team is proposing to run a full-scale energy demand response experim...

Code Repositories


Implementation of VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning - Zintgraf et al. (ICLR 2020)

view repo

Please sign up or login with your details

Forgot password? Click here to reset