Bregman Gradient Policy Optimization

06/23/2021
by   Feihu Huang, et al.
0

In this paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. At the same time, we present an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on a momentum variance-reduced technique. Moreover, we introduce a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. Specifically, we prove that BGPO achieves the sample complexity of Õ(ϵ^-4) for finding ϵ-stationary point only requiring one trajectory at each iteration, and VR-BGPO reaches the best known sample complexity of Õ(ϵ^-3) for finding an ϵ-stationary point, which also only requires one trajectory at each iteration. In particular, by using different Bregman divergences, our methods unify many existing policy optimization algorithms and their new variants such as the existing (variance-reduced) policy gradient algorithms and (variance-reduced) natural policy gradient algorithms. Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Momentum-Based Policy Gradient Methods

In the paper, we propose a class of efficient momentum-based policy grad...
research
06/30/2021

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent asc...
research
10/21/2019

Momentum in Reinforcement Learning

We adapt the optimization's concept of momentum to reinforcement learnin...
research
06/21/2021

BiAdam: Fast Adaptive Bilevel Optimization Methods

Bilevel optimization recently has attracted increased interest in machin...
research
05/17/2022

Adaptive Momentum-Based Policy Gradient with Second-Order Information

The variance reduced gradient estimators for policy gradient methods has...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
01/28/2023

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

In this paper, we propose several new stochastic second-order algorithms...

Please sign up or login with your details

Forgot password? Click here to reset