A Logarithmic Barrier Method For Proximal Policy Optimization

12/16/2018
by   Cheng Zeng, et al.
0

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2018

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been pr...
research
12/04/2020

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results ...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
05/24/2022

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satis...
research
10/20/2021

CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

As an algorithm based on deep reinforcement learning, Proximal Policy Op...
research
07/25/2016

Accelerating Stochastic Composition Optimization

Consider the stochastic composition optimization problem where the objec...
research
07/02/2018

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

This paper proposes a first order gradient reinforcement learning algori...

Please sign up or login with your details

Forgot password? Click here to reset