ProMP: Proximal Meta-Policy Search

10/16/2018
by   Jonas Rothfuss, et al.
0

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance. Our code is available at https://github.com/jonasrothfuss/promp.

READ FULL TEXT

page 7

page 24

research
06/04/2020

Meta-Model-Based Meta-Policy Optimization

Model-based reinforcement learning (MBRL) has been applied to meta-learn...
research
01/27/2019

Reward Shaping via Meta-Learning

Reward shaping is one of the most effective methods to tackle the crucia...
research
06/16/2020

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Temporal-Difference (TD) learning is a standard and very successful rein...
research
09/25/2019

ES-MAML: Simple Hessian-Free Meta Learning

We introduce ES-MAML, a new framework for solving the model agnostic met...
research
06/05/2023

Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization

This paper proposes Meta-SAGE, a novel approach for improving the scalab...
research
03/09/2021

Scalable Online Recurrent Learning Using Columnar Neural Networks

Structural credit assignment for recurrent learning is challenging. An a...
research
10/22/2019

Bottom-Up Meta-Policy Search

Despite of the recent progress in agents that learn through interaction,...

Please sign up or login with your details

Forgot password? Click here to reset