A Geometric Traversal Algorithm for Reward-Uncertain MDPs

02/14/2012
by   Eunsoo Oh, et al.
0

Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks in computing the minimax regret policy is to obtain the set of all policies that can be optimal for some candidate reward function. In this paper, we propose an efficient algorithm that exploits the geometric properties of the reward function associated with the policies. We also present an approximate version of the method for further speed up. We experimentally demonstrate that our algorithm improves the performance by orders of magnitude.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2012

Regret-based Reward Elicitation for Markov Decision Processes

The specification of aMarkov decision process (MDP) can be difficult. Re...
research
07/22/2023

On the Expressivity of Multidimensional Markov Reward

We consider the expressivity of Markov rewards in sequential decision ma...
research
08/20/2022

Calculus on MDPs: Potential Shaping as a Gradient

In reinforcement learning, different reward functions can be equivalent ...
research
09/03/2023

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

In robust Markov decision processes (RMDPs), it is assumed that the rewa...
research
09/30/2022

On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs

We show two average-reward off-policy control algorithms, Differential Q...
research
04/16/2018

Distribution Estimation in Discounted MDPs via a Transformation

Although the general deterministic reward function in MDPs takes three a...
research
06/02/2021

A Generalizable Approach to Learning Optimizers

A core issue with learning to optimize neural networks has been the lack...

Please sign up or login with your details

Forgot password? Click here to reset