Proximal Policy Gradient: PPO with Policy Gradient

10/20/2020
by   Ju-Seung Byun, et al.
0

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient), which is close to both VPG (vanilla policy gradient) and PPO (proximal policy optimization). The PPG objective is a partial variation of the VPG objective and the gradient of the PPG objective is exactly same as the gradient of the VPG objective. To increase the number of policy update iterations, we introduce the advantage-policy plane and design a new clipping strategy. We perform experiments in OpenAI Gym and Bullet robotics environments for ten random seeds. The performance of PPG is comparable to PPO, and the entropy decays slower than PPG. Thus we show that performance similar to PPO can be obtained by using the gradient formula from the original policy gradient theorem.

READ FULL TEXT
research
12/28/2022

On the Convergence of Discounted Policy Gradient Methods

Many popular policy gradient methods for reinforcement learning follow a...
research
08/24/2018

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model op...
research
01/26/2023

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem....
research
09/03/2018

Emergence of Communication in an Interactive World with Consistent Speakers

Training agents to communicate with one another given task-based supervi...
research
09/23/2020

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient alg...
research
03/09/2021

Model-free Policy Learning with Reward Gradients

Policy gradient methods estimate the gradient of a policy objective sole...
research
12/03/2021

Episodic Policy Gradient Training

We introduce a novel training procedure for policy gradient methods wher...

Please sign up or login with your details

Forgot password? Click here to reset