Trust Region-Guided Proximal Policy Optimization

01/29/2019
by   Yuhui Wang, et al.
0

Model-free reinforcement learning relies heavily on a safe yet exploratory policy search. Proximal policy optimization (PPO) is a prominent algorithm to address the safe search problem, by exploiting a heuristic clipping mechanism motivated by a theoretically-justified "trust region" guidance. However, we found that the clipping mechanism of PPO could lead to a lack of exploration issue. Based on this finding, we improve the original PPO with an adaptive clipping mechanism guided by a "trust region" criterion. Our method, termed as Trust Region-Guided PPO (TRPPO), improves PPO with more exploration and better sample efficiency, while maintains the safe search property and design simplicity of PPO. On several benchmark tasks, TRPPO significantly outperforms the original PPO and is competitive with several state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
11/11/2019

Multi-Path Policy Optimization

Recent years have witnessed a tremendous improvement of deep reinforceme...
research
09/27/2018

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

We propose to improve trust region policy search with normalizing flows ...
research
04/17/2018

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been pr...
research
09/06/2019

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Trust region policy optimization (TRPO) is a popular and empirically suc...
research
05/29/2018

Supervised Policy Update

We propose a new sample-efficient methodology, called Supervised Policy ...
research
07/13/2022

Grassmanian packings: Trust region stochastic tuning for matrix incoherence

We provide a new numerical procedure for constructing low coherence matr...

Please sign up or login with your details

Forgot password? Click here to reset