Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

04/07/2018
by   Sankalp Arora, et al.
0

Partially Observable Markov Decision Processes (POMDPs) offer an elegant framework to model sequential decision making in uncertain environments. Solving POMDPs online is an active area of research and given the size of real-world problems approximate solvers are used. Recently, a few approaches have been suggested for solving POMDPs by using MDP solvers in conjunction with imitation learning. MDP based POMDP solvers work well for some cases, while catastrophically failing for others. The main failure point of such solvers is the lack of motivation for MDP solvers to gain information, since under their assumption the environment is either already known as much as it can be or the uncertainty will disappear after the next step. However for solving POMDP problems gaining information can lead to efficient solutions. In this paper we derive a set of conditions where MDP based POMDP solvers are provably sub-optimal. We then use the well-known tiger problem to demonstrate such sub-optimality. We show that multi-resolution, budgeted information gathering cannot be addressed using MDP based POMDP solvers. The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2022

Generalized Optimality Guarantees for Solving Continuous Observation POMDPs through Particle Belief MDP Approximation

Partially observable Markov decision processes (POMDPs) provide a flexib...
research
07/30/2022

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

In the reinforcement learning literature, there are many algorithms deve...
research
04/07/2016

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Information-theoretic principles for learning and acting have been propo...
research
09/09/2019

Parameter Tuning for Self-optimizing Software at Scale

Efficiency of self-optimizing systems is heavily dependent on their opti...
research
04/26/2010

An approach to visualize the course of solving of a research task in humans

A technique to study the dynamics of solving of a research task is sugge...
research
12/28/2019

Value of structural health monitoring quantification in partially observable stochastic environments

Sequential decision-making under uncertainty for optimal life-cycle cont...
research
09/02/2021

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Autonomous marine vehicles play an essential role in many ocean science ...

Please sign up or login with your details

Forgot password? Click here to reset