Safe Policy Improvement Approaches and their Limitations

08/01/2022
by   Philipp Scholl, et al.
0

Safe Policy Improvement (SPI) is an important technique for offline reinforcement learning in safety critical applications as it improves the behavior policy with a high probability. We classify various SPI approaches from the literature into two groups, based on how they utilize the uncertainty of state-action pairs. Focusing on the Soft-SPIBB (Safe Policy Improvement with Soft Baseline Bootstrapping) algorithms, we show that their claim of being provably safe does not hold. Based on this finding, we develop adaptations, the Adv-Soft-SPIBB algorithms, and show that they are provably safe. A heuristic adaptation, Lower-Approx-Soft-SPIBB, yields the best performance among all SPIBB algorithms in extensive experiments on two benchmarks. We also check the safety guarantees of the provably safe algorithms and show that huge amounts of data are necessary such that the safety bounds become useful in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Safe Policy Improvement (SPI) aims at provable guarantees that a learned...
research
07/11/2019

Safe Policy Improvement with Soft Baseline Bootstrapping

Batch Reinforcement Learning (Batch RL) consists in training a policy us...
research
09/11/2019

Safe Policy Improvement with an Estimated Baseline Policy

Previous work has shown the unreliability of existing algorithms in the ...
research
05/13/2023

More for Less: Safe Policy Improvement With Stronger Performance Guarantees

In an offline reinforcement learning setting, the safe policy improvemen...
research
09/05/2023

Provably safe systems: the only path to controllable AGI

We describe a path to humanity safely thriving with powerful Artificial ...
research
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
research
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...

Please sign up or login with your details

Forgot password? Click here to reset