On the Convergence of Discounted Policy Gradient Methods

12/28/2022
by   Chris Nota, et al.
0

Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...
research
06/17/2019

Is the Policy Gradient a Gradient?

The policy gradient theorem describes the gradient of the expected disco...
research
10/20/2020

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient)...
research
05/28/2021

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Many engineering problems have multiple objectives, and the overall aim ...
research
06/30/2021

Inverse Design of Grating Couplers Using the Policy Gradient Method from Reinforcement Learning

We present a proof-of-concept technique for the inverse design of electr...
research
06/18/2020

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision p...
research
03/25/2022

Quasi-Newton Iteration in Deterministic Policy Gradient

This paper presents a model-free approximation for the Hessian of the pe...

Please sign up or login with your details

Forgot password? Click here to reset