Utility Theory for Sequential Decision Making

06/27/2022
by   Mehran Shakerinava, et al.
0

The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing the expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. In particular, we show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. This result motivates a generalization of Markov Decision Processes (MDPs) with this structure on the agent's returns, which we call Affine-Reward MDPs. A stronger constraint on preferences is needed to recover the commonly used cumulative sum of scalar rewards in MDPs. A yet stronger constraint simplifies the utility function for goal-seeking agents in the form of a difference in some function of states that we call potential functions. Our necessary and sufficient conditions demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning by adding an axiom to the VNM rationality axioms and motivates new directions for AI research involving sequential decision making.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2023

On the Expressivity of Multidimensional Markov Reward

We consider the expressivity of Markov rewards in sequential decision ma...
research
12/12/2012

Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation

We develop a qualitative theory of Markov Decision Processes (MDPs) and ...
research
02/03/2023

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

We study stochastic delayed feedback in general multi-agent sequential d...
research
06/03/2021

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes

This paper presents the first model-free, simulator-free reinforcement l...
research
07/11/2023

Sequential Language-based Decisions

In earlier work, we introduced the framework of language-based decisions...
research
01/21/2021

Mechanism Design for Cumulative Prospect Theoretic Agents: A General Framework and the Revelation Principle

This paper initiates a discussion of mechanism design when the participa...
research
12/08/2021

Application of Deep Reinforcement Learning to Payment Fraud

The large variety of digital payment choices available to consumers toda...

Please sign up or login with your details

Forgot password? Click here to reset