Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

02/12/2020
by   Alireza Fallah, et al.
1

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems where the goal is to find a policy (using data from several tasks represented by Markov Decision Processes (MDPs)) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update step is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an ϵ-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used in the update during the test time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration a...
research
12/14/2020

Policy Gradient RL Algorithms as Directed Acyclic Graphs

Meta Reinforcement Learning (RL) methods focus on automating the design ...
research
06/13/2023

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Policy-based algorithms are among the most widely adopted techniques in ...
research
01/15/2022

Block Policy Mirror Descent

In this paper, we present a new class of policy gradient (PG) methods, n...
research
02/10/2020

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy paramete...
research
05/25/2020

Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks

In this paper, the design of an optimal trajectory for an energy-constra...
research
12/10/2020

Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) a...

Please sign up or login with your details

Forgot password? Click here to reset