Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

07/02/2018
by   Xiangxiang Chu, et al.
0

This paper proposes a first order gradient reinforcement learning algorithm, which can be seen as a variant for Trust Region Policy Optimization(TRPO). This method, which we call policy optimization with penalized point probability distance (POP3D), keeps almost all positive spheres of proximal policy optimization (PPO) such as easy implementation, fast learning and high score capability. As PPO, we also use a single surrogate objective without constraints, where a penalized item based on point probability distance is included to prevent update step from growing too large. Experiments verify that POP3D is state-of-the-art within 40 million frame steps on 49 Atari games based on two common metrics, which can be a competitive alternative to PPO. Moreover, comparison experiments regarding PPO based on Mujoco environment verify that POP3D is also competitive in continuous domain. In addition, we release the code on github https://github.com/cxxgtxy/POP3D.git.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2020

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results ...
research
05/24/2022

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satis...
research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
05/25/2020

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

We study the roots of algorithmic progress in deep policy gradient algor...
research
09/02/2020

Extensions to the Proximal Distance of Method of Constrained Optimization

The current paper studies the problem of minimizing a loss f(x) subject ...
research
09/23/2020

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient alg...
research
12/16/2018

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order opt...

Please sign up or login with your details

Forgot password? Click here to reset