Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

05/10/2017
by   Yan Li, et al.
0

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm FLMDP that could solve general multi-objective MDP with lexicographic reward preference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of th...
research
05/29/2020

Human-Centric Active Perception for Autonomous Observation

As robot autonomy improves, robots are increasingly being considered in ...
research
09/25/2022

Probabilistic Planning with Partially Ordered Preferences over Temporal Goals

In this paper, we study planning in stochastic systems, modeled as Marko...
research
11/25/2020

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

In this paper we consider multi-objective reinforcement learning where t...
research
06/20/2019

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...
research
12/06/2018

Provably Efficient Maximum Entropy Exploration

Suppose an agent is in a (possibly unknown) Markov decision process (MDP...
research
05/16/2023

Optimizing pre-scheduled, intermittently-observed MDPs

A challenging category of robotics problems arises when sensing incurs s...

Please sign up or login with your details

Forgot password? Click here to reset