P3O: Policy-on Policy-off Policy Optimization

05/05/2019
by   Rasool Fakoor, et al.
36

On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interleaves off-policy updates with on-policy updates. P3O uses the effective sample size between the behavior policy and the target policy to control how far they can be from each other and does not introduce any additional hyper-parameters. Extensive experiments on the Atari-2600 and MuJoCo benchmark suites show that this simple technique is highly effective in reducing the sample complexity of state-of-the-art algorithms.

READ FULL TEXT

page 12

page 14

research
05/18/2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in re...
research
06/24/2019

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning ...
research
03/02/2023

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

Knowing the learning dynamics of policy is significant to unveiling the ...
research
06/26/2020

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement...
research
03/08/2018

Learning with Rules

Complex classifiers may exhibit "embarassing" failures in cases that wou...
research
02/11/2015

Off-Policy Reward Shaping with Ensembles

Potential-based reward shaping (PBRS) is an effective and popular techni...
research
12/28/2016

Efficient iterative policy optimization

We tackle the issue of finding a good policy when the number of policy u...

Please sign up or login with your details

Forgot password? Click here to reset