Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

10/20/2017
by   Dimitri Scheftelowitsch, et al.
0

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly, analysis of models with uncertainties amounts to searching for the most robust policy which means that the goal is to generate a policy with the greatest lower bound on performance (or, symmetrically, the lowest upper bound on costs). However, hedging against an unlikely worst case may lead to losses in other situations. In general, one is interested in policies that behave well in all situations which results in a multi-objective view on decision making. In this paper, we consider policies for the expected discounted reward measure of MDPs with uncertain parameters. In particular, the approach is defined for bounded-parameter MDPs (BMDPs) [8]. In this setting the worst, best and average case performances of a policy are analyzed simultaneously, which yields a multi-scenario multi-objective optimization problem. The paper presents and evaluates approaches to compute the pure Pareto optimal policies in the value vector space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2013

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often fo...
research
05/31/2022

Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequ...
research
10/24/2019

Simple Strategies in Multi-Objective MDPs (Technical Report)

We consider the verification of multiple expected reward objectives at o...
research
11/30/2022

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

We show subexponential lower bounds (i.e., 2^Ω (n^c)) on the smoothed co...
research
01/31/2023

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

We introduce the Blackwell discount factor for Markov Decision Processes...
research
06/13/2018

Parameter-Independent Strategies for pMDPs via POMDPs

Markov Decision Processes (MDPs) are a popular class of models suitable ...
research
05/10/2021

Multi-Objective Controller Synthesis with Uncertain Human Preferences

Multi-objective controller synthesis concerns the problem of computing a...

Please sign up or login with your details

Forgot password? Click here to reset