Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

11/06/2018
by   Andrew Ilyas, et al.
4

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this perspective, the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Our analysis suggests first steps towards solidifying the foundations of these algorithms, and in particular indicates that we may need to move beyond the current benchmark-centric evaluation methodology.

READ FULL TEXT

page 10

page 27

page 28

page 30

page 31

page 33

page 34

page 35

research
10/17/2017

Stochastic Variance Reduction for Policy Gradient Estimation

Recent advances in policy gradient methods and deep learning have demons...
research
06/14/2022

How are policy gradient methods affected by the limits of control?

We study stochastic policy gradient methods from the perspective of cont...
research
05/18/2023

Deep Metric Tensor Regularized Policy Gradient

Policy gradient algorithms are an important family of deep reinforcement...
research
05/06/2020

Robotic Arm Control and Task Training through Deep Reinforcement Learning

This paper proposes a detailed and extensive comparison of the Trust Reg...
research
03/24/2022

Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment

In modern robotics, effectively computing optimal control policies under...
research
06/18/2020

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision p...

Please sign up or login with your details

Forgot password? Click here to reset