Generalized Proximal Policy Optimization with Sample Reuse

10/29/2021
by   James Queeney, et al.
0

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2022

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

Real-world sequential decision making requires data-driven algorithms th...
research
04/17/2018

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been pr...
research
08/17/2018

Importance mixing: Improving sample reuse in evolutionary policy search methods

Deep neuroevolution, that is evolutionary policy search methods based on...
research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
01/10/2023

Imbalanced Classification In Faulty Turbine Data: New Proximal Policy Optimization

There is growing importance to detecting faults and implementing the bes...
research
11/11/2019

Multi-Path Policy Optimization

Recent years have witnessed a tremendous improvement of deep reinforceme...
research
06/25/2019

Optimistic Proximal Policy Optimization

Reinforcement Learning, a machine learning framework for training an aut...

Please sign up or login with your details

Forgot password? Click here to reset