Safe Policy Improvement with Baseline Bootstrapping

12/19/2017
by   Romain Laroche, et al.
0

A common goal in Reinforcement Learning is to derive a good strategy given a limited batch of data. In this paper, we adopt the safe policy improvement (SPI) approach: we compute a target policy guaranteed to perform at least as well as a given baseline policy. Our SPI strategy, inspired by the knows-what-it-knows paradigms, consists in bootstrapping the target policy with the baseline policy when it does not know. We develop two computationally efficient bootstrapping algorithms, a value-based and a policy-based, both accompanied with theoretical SPI bounds. Three algorithm variants are proposed. We empirically show the literature algorithms limits on a small stochastic gridworld problem, and then demonstrate that our five algorithms not only improve the worst case scenarios, but also the mean performance.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

page 17

09/11/2019

Safe Policy Improvement with an Estimated Baseline Policy

Previous work has shown the unreliability of existing algorithms in the ...
07/13/2016

Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is ...
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
02/26/2022

Safe Exploration for Efficient Policy Evaluation and Comparison

High-quality data plays a central role in ensuring the accuracy of polic...
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...
09/15/2020

The Importance of Pessimism in Fixed-Dataset Policy Optimization

We study worst-case guarantees on the expected return of fixed-dataset p...
06/15/2020

Pessimism About Unknown Unknowns Inspires Conservatism

If we could define the set of all bad outcomes, we could hard-code an ag...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.