Momentum-Based Policy Gradient Methods

07/13/2020
by   Feihu Huang, et al.
0

In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of O(ϵ^-3) for finding an ϵ-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of O(ϵ^-3) without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

READ FULL TEXT
research
06/23/2021

Bregman Gradient Policy Optimization

In this paper, we design a novel Bregman gradient policy optimization fr...
research
12/06/2021

MDPGT: Momentum-based Decentralized Policy Gradient Tracking

We propose a novel policy gradient method for multi-agent reinforcement ...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
05/17/2022

Adaptive Momentum-Based Policy Gradient with Second-Order Information

The variance reduced gradient estimators for policy gradient methods has...
research
02/03/2023

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) metho...
research
03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
06/21/2021

BiAdam: Fast Adaptive Bilevel Optimization Methods

Bilevel optimization recently has attracted increased interest in machin...

Please sign up or login with your details

Forgot password? Click here to reset