On the Expressivity of Multidimensional Markov Reward

07/22/2023
by   Shuwa Miura, et al.
0

We consider the expressivity of Markov rewards in sequential decision making under uncertainty. We view reward functions in Markov Decision Processes (MDPs) as a means to characterize desired behaviors of agents. Assuming desired behaviors are specified as a set of acceptable policies, we investigate if there exists a scalar or multidimensional Markov reward function that makes the policies in the set more desirable than the other policies. Our main result states both necessary and sufficient conditions for the existence of such reward functions. We also show that for every non-degenerate set of deterministic policies, there exists a multidimensional Markov reward function that characterizes it

READ FULL TEXT
research
02/14/2012

A Geometric Traversal Algorithm for Reward-Uncertain MDPs

Markov decision processes (MDPs) are widely used in modeling decision ma...
research
11/01/2021

On the Expressivity of Markov Reward

Reward is the driving force for reinforcement-learning agents. This pape...
research
06/27/2022

Utility Theory for Sequential Decision Making

The von Neumann-Morgenstern (VNM) utility theorem shows that under certa...
research
05/01/2021

Markov Rewards Processes with Impulse Rewards and Absorbing States

We study the expected accumulated reward for a discrete-time Markov rewa...
research
09/27/2022

Defining and Characterizing Reward Hacking

We provide the first formal definition of reward hacking, a phenomenon w...
research
05/09/2012

Regret-based Reward Elicitation for Markov Decision Processes

The specification of aMarkov decision process (MDP) can be difficult. Re...
research
06/25/2017

Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)

In Markov Decision Processes (MDPs), the reward obtained in a state depe...

Please sign up or login with your details

Forgot password? Click here to reset