Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

09/18/2019
by   Pan Xu, et al.
11

Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires O(1/ϵ^3/2) episodes to find an ϵ-approximate stationary point of the nonconcave performance function J(θ) (i.e., θ such that ∇ J(θ)_2^2≤ϵ). This sample complexity improves the best known result O(1/ϵ^5/3) for policy gradient algorithms by a factor of O(1/ϵ^1/6). In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
02/17/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient gives rise to a rich class of reinforcement learning (RL...
research
12/02/2020

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

The goal of policy-based reinforcement learning (RL) is to search the ma...
research
03/01/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combin...
research
12/12/2022

Variance-Reduced Conservative Policy Iteration

We study the sample complexity of reducing reinforcement learning to a s...
research
06/26/2020

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement...
research
08/12/2021

A functional mirror ascent view of policy gradient methods with function approximation

We use functional mirror ascent to propose a general framework (referred...

Please sign up or login with your details

Forgot password? Click here to reset