Stochastic Variance-Reduced Policy Gradient

06/14/2018
by   Matteo Papini, et al.
0

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration a...
research
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
06/10/2022

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

We study policy optimization for Markov decision processes (MDPs) with m...
research
10/16/2020

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that ma...
research
03/09/2022

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

This paper studies the problem of data collection for policy evaluation ...

Please sign up or login with your details

Forgot password? Click here to reset