Is the Policy Gradient a Gradient?

06/17/2019
by   Chris Nota, et al.
0

The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods do not use the discount factor in the manner originally prescribed, and therefore do not optimize the discounted objective. It has been an open question in RL as to which, if any, objective they optimize instead. We show that the direction followed by these methods is not the gradient of any objective, and reclassify them as semi-gradient methods with respect to the undiscounted objective. Further, we show that they are not guaranteed to converge to a locally optimal policy, and construct an counterexample where they will converge to the globally pessimal policy with respect to both the discounted and undiscounted objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2022

On the Convergence of Discounted Policy Gradient Methods

Many popular policy gradient methods for reinforcement learning follow a...
research
11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...
research
12/04/2019

AlgaeDICE: Policy Gradient from Arbitrary Experience

In many real-world applications of reinforcement learning (RL), interact...
research
03/04/2021

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

In order to model risk aversion in reinforcement learning, an emerging l...
research
04/10/2023

Epidemic Control on a Large-Scale-Agent-Based Epidemiology Model using Deep Deterministic Policy Gradient

To mitigate the impact of the pandemic, several measures include lockdow...
research
05/11/2023

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved w...
research
03/15/2023

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Nonlinear control systems with partial information to the decision maker...

Please sign up or login with your details

Forgot password? Click here to reset