An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

05/29/2019
by   Pan Xu, et al.
3

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ϵ-approximate stationary point of the performance function within O(1/ϵ^5/3) trajectories. This sample complexity improves upon the best known result O(1/ϵ^2) by a factor of O(1/ϵ^1/3). At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on reinforcement learning benchmark tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Momentum-Based Policy Gradient Methods

In the paper, we propose a class of efficient momentum-based policy grad...
research
06/14/2018

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm cons...
research
10/22/2020

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-s...
research
06/23/2021

Bregman Gradient Policy Optimization

In this paper, we design a novel Bregman gradient policy optimization fr...
research
05/25/2022

Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions

We study the performance of Stochastic Cubic Regularized Newton (SCRN) o...
research
01/07/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Temporal difference (TD) learning is a popular algorithm for policy eval...
research
11/15/2022

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

In this paper, we revisit and improve the convergence of policy gradient...

Please sign up or login with your details

Forgot password? Click here to reset