Optimizing Quantiles in Preference-based Markov Decision Processes

12/01/2016
by   Hugo Gilbert, et al.
0

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of th...
research
08/03/2022

Bayesian regularization of empirical MDPs

In most applications of model-based Markov decision processes, the param...
research
06/05/2022

Formally Verified Solution Methods for Infinite-Horizon Markov Decision Processes

We formally verify executable algorithms for solving Markov decision pro...
research
12/01/2021

Comparing discounted and average-cost Markov Decision Processes: a statistical significance perspective

Optimal Markov Decision Process policies for problems with finite state ...
research
09/15/2021

Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

We investigate the problem of designing optimal stealthy poisoning attac...
research
06/10/2022

Conformal Prediction Intervals for Markov Decision Process Trajectories

Before delegating a task to an autonomous system, a human operator may w...
research
01/30/2013

Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems

This paper presents two new approaches to decomposing and solving large ...

Please sign up or login with your details

Forgot password? Click here to reset