Robust Anytime Learning of Markov Decision Processes

05/31/2022
by   Marnix Suilen, et al.
0

Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2023

Robust Average-Reward Markov Decision Processes

In robust Markov decision processes (MDPs), the uncertainty in the trans...
research
03/10/2023

Decision-Making Under Uncertainty: Beyond Probabilities

This position paper reflects on the state-of-the-art in decision-making ...
research
10/20/2017

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance ana...
research
07/16/2018

Shielded Decision-Making in MDPs

A prominent problem in artificial intelligence and machine learning is t...
research
09/30/2022

Prioritizing emergency evacuations under compounding levels of uncertainty

Well-executed emergency evacuations can save lives and reduce suffering....
research
01/22/2020

Cohort state-transition models in R: From conceptualization to implementation

Decision models can synthesize evidence from different sources to provid...
research
05/20/2019

A Bayesian Approach to Robust Reinforcement Learning

Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...

Please sign up or login with your details

Forgot password? Click here to reset