Guided Meta-Policy Search

04/01/2019
by   Russell Mendonca, et al.
8

Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples because they learn from scratch. Meta-RL aims to address this challenge by leveraging experience from previous tasks in order to more quickly solve new tasks. However, in practice, these algorithms generally also require large amounts of on-policy experience during the meta-training process, making them impractical for use in many problems. To this end, we propose to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks. This involves a nested optimization, with RL in the inner loop and supervised imitation learning in the outer loop. Because the outer loop imitation learning can be done with off-policy data, we can achieve significant gains in meta-learning sample efficiency. In this paper, we show how this general idea can be used both for meta-reinforcement learning and for learning fast RL procedures from multi-task demonstration data. The former results in an approach that can leverage policies learned for previous tasks without significant amounts of on-policy data during meta-training, whereas the latter is particularly useful in cases where demonstrations are easy for a person to provide. Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.

READ FULL TEXT
research
03/19/2019

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Deep reinforcement learning algorithms require large amounts of experien...
research
06/07/2019

Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Imitation learning allows agents to learn complex behaviors from demonst...
research
12/08/2021

CoMPS: Continual Meta Policy Search

We develop a new continual meta-learning method to address challenges in...
research
05/26/2018

Fast Policy Learning through Imitation and Reinforcement

Imitation learning (IL) consists of a set of tools that leverage expert ...
research
03/01/2022

FIRL: Fast Imitation and Policy Reuse Learning

Intelligent robotics policies have been widely researched for challengin...
research
10/11/2018

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Humans are experts at high-fidelity imitation -- closely mimicking a dem...
research
08/13/2020

Offline Meta-Reinforcement Learning with Advantage Weighting

Massive datasets have proven critical to successfully applying deep lear...

Please sign up or login with your details

Forgot password? Click here to reset