Towards Safe Policy Improvement for Non-Stationary MDPs

10/23/2020
by   Yash Chandak, et al.
0

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Non-stationary Reinforcement Learning under General Function Approximation

General function approximation is a powerful tool to handle large state ...
research
05/17/2020

Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption th...
research
01/24/2023

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Methods for sequential decision-making are often built upon a foundation...
research
03/04/2018

SAFE: Spectral Evolution Analysis Feature Extraction for Non-Stationary Time Series Prediction

This paper presents a practical approach for detecting non-stationarity ...
research
01/02/2021

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Safety is a critical concern when deploying reinforcement learning agent...
research
09/05/2023

Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

In the area of learning-driven artificial intelligence advancement, the ...
research
09/13/2021

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

We propose SLTD (`Sequential Learning-to-Defer') a framework for learnin...

Please sign up or login with your details

Forgot password? Click here to reset