Stochastic Processes with Expected Stopping Time

04/15/2021
by   Krishnendu Chatterjee, et al.
0

Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decision processes (MDPs) extend Markov chains by incorporating non-deterministic behaviors. Given an MDP and rewards on states, a classical optimization criterion is the maximal expected total reward where the MDP stops after T steps, which can be computed by a simple dynamic programming algorithm. We consider a natural generalization of the problem where the stopping times can be chosen according to a probability distribution, such that the expected stopping time is T, to optimize the expected total reward. Quite surprisingly we establish inter-reducibility of the expected stopping-time problem for Markov chains with the Positivity problem (which is related to the well-known Skolem problem), for which establishing either decidability or undecidability would be a major breakthrough. Given the hardness of the exact problem, we consider the approximate version of the problem: we show that it can be solved in exponential time for Markov chains and in exponential space for MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2019

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

We propose a new complexity measure for Markov decision processes (MDP),...
research
01/25/2020

Learning Non-Markovian Reward Models in MDPs

There are situations in which an agent should receive rewards only after...
research
06/05/2022

Formally Verified Solution Methods for Infinite-Horizon Markov Decision Processes

We formally verify executable algorithms for solving Markov decision pro...
research
04/19/2023

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

A classic solution technique for Markov decision processes (MDP) and sto...
research
02/10/2018

Graph Planning with Expected Finite Horizon

Graph planning gives rise to fundamental algorithmic questions such as s...
research
01/21/2022

Under-Approximating Expected Total Rewards in POMDPs

We consider the problem: is the optimal expected total reward to reach a...
research
11/17/2022

Learning Mixtures of Markov Chains and MDPs

We present an algorithm for use in learning mixtures of both Markov chai...

Please sign up or login with your details

Forgot password? Click here to reset