On First-Order Meta-Reinforcement Learning with Moreau Envelopes

05/20/2023
by   Mohammad Taha Toghani, et al.
0

Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the MRL problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of tasks by efficiently updating the policy parameters using a combination of gradient-based optimization and Moreau Envelope regularization. Moreau Envelopes provide a smooth approximation of the policy optimization problem, which enables us to apply standard optimization techniques and converge to an appropriate stationary point. We provide a detailed analysis of the MEMRL algorithm, where we show a sublinear convergence rate to a first-order stationary point for non-convex policy gradient optimization. We finally show the effectiveness of MEMRL on a multi-task 2D-navigation problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2020

Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments

We are interested in learning models of non-stationary environments, whi...
research
10/26/2019

Convergent Policy Optimization for Safe Reinforcement Learning

We study the safe reinforcement learning problem with nonlinear function...
research
04/05/2022

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...
research
10/22/2019

Bottom-Up Meta-Policy Search

Despite of the recent progress in agents that learn through interaction,...
research
12/29/2017

Smoothed Dual Embedding Control

We revisit the Bellman optimality equation with Nesterov's smoothing tec...
research
01/08/2021

Learning Low-Correlation GPS Spreading Codes with a Policy Gradient Algorithm

With the birth of the next-generation GPS III constellation and the upco...
research
05/18/2021

Meta-Reinforcement Learning by Tracking Task Non-stationarity

Many real-world domains are subject to a structured non-stationarity whi...

Please sign up or login with your details

Forgot password? Click here to reset