Safe Policy Improvement with Baseline Bootstrapping

12/19/2017
by   Romain Laroche, et al.
0

A common goal in Reinforcement Learning is to derive a good strategy given a limited batch of data. In this paper, we adopt the safe policy improvement (SPI) approach: we compute a target policy guaranteed to perform at least as well as a given baseline policy. Our SPI strategy, inspired by the knows-what-it-knows paradigms, consists in bootstrapping the target policy with the baseline policy when it does not know. We develop two computationally efficient bootstrapping algorithms, a value-based and a policy-based, both accompanied with theoretical SPI bounds. Three algorithm variants are proposed. We empirically show the literature algorithms limits on a small stochastic gridworld problem, and then demonstrate that our five algorithms not only improve the worst case scenarios, but also the mean performance.

READ FULL TEXT

page 16

page 17

research
09/11/2019

Safe Policy Improvement with an Estimated Baseline Policy

Previous work has shown the unreliability of existing algorithms in the ...
research
07/13/2016

Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is ...
research
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
research
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...
research
02/26/2022

Safe Exploration for Efficient Policy Evaluation and Comparison

High-quality data plays a central role in ensuring the accuracy of polic...
research
06/15/2020

Pessimism About Unknown Unknowns Inspires Conservatism

If we could define the set of all bad outcomes, we could hard-code an ag...
research
09/15/2020

The Importance of Pessimism in Fixed-Dataset Policy Optimization

We study worst-case guarantees on the expected return of fixed-dataset p...

Please sign up or login with your details

Forgot password? Click here to reset