Under-Approximating Expected Total Rewards in POMDPs

01/21/2022
by   Alexander Bork, et al.
0

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this – generally undecidable – problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2013

Value-Directed Belief State Approximation for POMDPs

We consider the problem belief-state monitoring for the purposes of impl...
research
12/28/2016

Efficient iterative policy optimization

We tackle the issue of finding a good policy when the number of policy u...
research
06/20/2019

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...
research
04/15/2021

Stochastic Processes with Expected Stopping Time

Markov chains are the de facto finite-state model for stochastic dynamic...
research
08/12/2020

Deceptive Kernel Function on Observations of Discrete POMDP

This paper studies the deception applied on agent in a partially observa...
research
07/10/2020

Efficient MDP Analysis for Selfish-Mining in Blockchains

A proof of work (PoW) blockchain protocol distributes rewards to its par...
research
02/19/2019

Computation of the expected value of a function of a chi-distributed random variable

We consider the problem of numerically evaluating the expected value of ...

Please sign up or login with your details

Forgot password? Click here to reset