B^3RTDP: A Belief Branch and Bound Real-Time Dynamic Programming Approach to Solving POMDPs

Partially Observable Markov Decision Processes (POMDPs) offer a promising world representation for autonomous agents, as they can model both transitional and perceptual uncertainties. Calculating the optimal solution to POMDP problems can be computationally expensive as they require reasoning over the (possibly infinite) space of beliefs. Several approaches have been proposed to overcome this difficulty, such as discretizing the belief space, point-based belief sampling, and Monte Carlo tree search. The Real-Time Dynamic Programming approach of the RTDP-Bel algorithm approximates the value function by storing it in a hashtable with discretized belief keys. We propose an extension to the RTDP-Bel algorithm which we call Belief Branch and Bound RTDP (B^3RTDP). Our algorithm uses a bounded value function representation and takes advantage of this in two novel ways: a search-bounding technique based on action selection convergence probabilities, and a method for leveraging early action convergence called the Convergence Frontier. Lastly, we empirically demonstrate that B^3RTDP can achieve greater returns in less time than the state-of-the-art SARSOP solver on known POMDP problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Bayesian Optimized Monte Carlo Planning

Online solvers for partially observable Markov decision processes have d...
research
03/21/2021

Monte Carlo Information-Oriented Planning

In this article, we discuss how to solve information-gathering problems ...
research
10/25/2021

HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties

Dynamic programming and heuristic search are at the core of state-of-the...
research
01/10/2013

A Tractable POMDP for a Class of Sequencing Problems

We consider a partially observable Markov decision problem (POMDP) that ...
research
07/11/2012

Region-Based Incremental Pruning for POMDPs

We present a major improvement to the incremental pruning algorithm for ...
research
09/30/2011

Anytime Point-Based Approximations for Large POMDPs

The Partially Observable Markov Decision Process has long been recognize...
research
02/23/2021

Blending Dynamic Programming with Monte Carlo Simulation for Bounding the Running Time of Evolutionary Algorithms

With the goal to provide absolute lower bounds for the best possible run...

Please sign up or login with your details

Forgot password? Click here to reset