Markov decision processes with observation costs

01/19/2022
by   Christoph Reisinger, et al.
0

We present a framework for a controlled Markov chain where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies therefore involve the choice of observation times as well as the subsequent control values. We show that the corresponding value function satisfies a dynamic programming principle, which leads to a system of quasi-variational inequalities (QVIs). Next, we give an extension where the model parameters are not known a priori but are inferred from the costly observations by Bayesian updates. We then prove a comparison principle for a larger class of QVIs, which implies uniqueness of solutions to our proposed problem. We utilise penalty methods to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2012

Dynamic Programming for Structured Continuous Markov Decision Problems

We describe an approach for exploiting structure in Markov Decision Proc...
research
12/15/2022

Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: An application to railway systems

Structural Health Monitoring (SHM) describes a process for inferring qua...
research
12/22/2021

Entropy-Regularized Partially Observed Markov Decision Processes

We investigate partially observed Markov decision processes (POMDPs) wit...
research
06/13/2022

Markov Decision Processes under Model Uncertainty

We introduce a general framework for Markov decision problems under mode...
research
09/26/2013

Approximation of Lorenz-Optimal Solutions in Multiobjective Markov Decision Processes

This paper is devoted to fair optimization in Multiobjective Markov Deci...
research
07/06/2019

Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP)...
research
03/29/2017

Optimal Policies for Observing Time Series and Related Restless Bandit Problems

The trade-off between the cost of acquiring and processing data, and unc...

Please sign up or login with your details

Forgot password? Click here to reset