Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

06/28/2022
by   James Queeney, et al.
0

Real-world sequential decision making requires data-driven algorithms that provide practical guarantees on performance throughout training while also making efficient use of data. Model-free deep reinforcement learning represents a framework for such data-driven decision making, but existing algorithms typically only focus on one of these goals while sacrificing performance with respect to the other. On-policy algorithms guarantee policy improvement throughout training but suffer from high sample complexity, while off-policy algorithms make efficient use of data through sample reuse but lack theoretical guarantees. In order to balance these competing goals, we develop a class of Generalized Policy Improvement algorithms that combines the policy improvement guarantees of on-policy methods with the efficiency of theoretically supported sample reuse. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a variety of continuous control tasks from the DeepMind Control Suite.

READ FULL TEXT

page 16

page 26

page 27

page 28

research
10/29/2021

Generalized Proximal Policy Optimization with Sample Reuse

In real-world decision making tasks, it is critical for data-driven rein...
research
03/07/2021

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Some reinforcement learning methods suffer from high sample complexity c...
research
01/31/2023

Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees

Robustness and safety are critical for the trustworthy deployment of dee...
research
09/11/2019

Safe Policy Improvement with an Estimated Baseline Policy

Previous work has shown the unreliability of existing algorithms in the ...
research
07/10/2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

While model-based reinforcement learning has empirically been shown to s...
research
09/14/2021

Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

We study the problem of designing optimal learning and decision-making f...
research
08/17/2018

Importance mixing: Improving sample reuse in evolutionary policy search methods

Deep neuroevolution, that is evolutionary policy search methods based on...

Please sign up or login with your details

Forgot password? Click here to reset