PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

02/01/2022
by   Matilde Gargiani, et al.
0

Despite their success, policy gradient methods suffer from high variance of the gradient estimate, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of updates. Our method is inspired by the PAGE estimator for supervised learning and leverages importance sampling to obtain an unbiased gradient estimator. We show that PAGE-PG enjoys a 𝒪( ϵ^-3) average sample complexity to reach an ϵ-stationary solution, which matches the sample complexity of its most competitive counterparts under the same setting. A numerical evaluation confirms the competitive performance of our method on classical control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
11/15/2022

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

In this paper, we revisit and improve the convergence of policy gradient...
research
01/30/2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Despite the popularity of policy gradient methods, they are known to suf...
research
09/28/2022

SoftTreeMax: Policy Gradient with Tree Search

Policy-gradient methods are widely used for learning control policies. T...
research
10/30/2022

Robust Data Valuation via Variance Reduced Data Shapley

Data valuation, especially quantifying data value in algorithmic predict...
research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
05/24/2023

Adaptive Policy Learning to Additional Tasks

This paper develops a policy learning method for tuning a pre-trained po...

Please sign up or login with your details

Forgot password? Click here to reset