Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

02/03/2023
by   Ilyas Fatkhullin, et al.
0

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a 𝒪̃(ε^-2.5) sample complexity of this method for finding a global ε-optimal policy. Improving over the previously known 𝒪̃(ε^-3) complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we further improve this complexity to 𝒪̃(ε^-2) by considering a Hessian-Aided Recursive Policy Gradient ((N)-HARPG) algorithm enhanced with a correction based on a Hessian-vector product. Interestingly, both algorithms are (i) simple and easy to implement: single-loop, do not require large batches of trajectories and sample at most two trajectories per iteration; (ii) computationally and memory efficient: they do not require expensive subroutines at each iteration and can be implemented with memory linear in the dimension of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Momentum-Based Policy Gradient Methods

In the paper, we propose a class of efficient momentum-based policy grad...
research
10/19/2021

On the Global Convergence of Momentum-based Policy Gradient

Policy gradient (PG) methods are popular and efficient for large-scale r...
research
03/25/2022

Quasi-Newton Iteration in Deterministic Policy Gradient

This paper presents a model-free approximation for the Hessian of the pe...
research
01/28/2023

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

In this paper, we propose several new stochastic second-order algorithms...
research
11/21/2017

Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Policy optimization methods have shown great promise in solving complex ...
research
05/14/2008

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Several approximate policy iteration schemes without value functions, wh...
research
01/24/2022

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

We propose the homotopic policy mirror descent (HPMD) method for solving...

Please sign up or login with your details

Forgot password? Click here to reset