Meta reinforcement learning as task inference

05/15/2019
by   Jan Humplik, et al.
5

Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There has been considerable interest in designing reinforcement learning algorithms with similar properties. This includes several proposals to learn the learning algorithm itself, an idea also referred to as meta learning. One formal interpretation of this idea is in terms of a partially observable multi-task reinforcement learning problem in which information about the task is hidden from the agent. Although agents that solve partially observable environments can be trained from rewards alone, shaping an agent's memory with additional supervision has been shown to boost learning efficiency. It is thus natural to ask what kind of supervision, if any, facilitates meta-learning. Here we explore several choices and develop an architecture that separates learning of the belief about the unknown task from learning of the policy, and that can be used effectively with privileged information about the task during training. We show that this approach can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment in which a simulated robot has to execute various movement sequences.

READ FULL TEXT

page 7

page 9

page 11

research
06/12/2018

Unsupervised Meta-Learning for Reinforcement Learning

Meta-learning is a powerful tool that builds on multi-task learning to l...
research
04/29/2021

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Recurrent meta reinforcement learning (meta-RL) agents are agents that e...
research
06/23/2021

Evolving Hierarchical Memory-Prediction Machines in Multi-Task Reinforcement Learning

A fundamental aspect of behaviour is the ability to encode salient featu...
research
05/24/2018

Been There, Done That: Meta-Learning with Episodic Recall

Meta-learning agents excel at rapidly learning new tasks from open-ended...
research
05/23/2022

Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization

At first sight it may seem straightforward to use recurrent layers in De...
research
07/16/2020

Collision Avoidance Robotics Via Meta-Learning (CARML)

This paper presents an approach to exploring a multi-objective reinforce...
research
04/29/2021

Using Meta Reinforcement Learning to Bridge the Gap between Simulation and Experiment in Energy Demand Response

Our team is proposing to run a full-scale energy demand response experim...

Please sign up or login with your details

Forgot password? Click here to reset