Optimal and Approximate Q-value Functions for Decentralized POMDPs

10/31/2011
by   Frans A. Oliehoek, et al.
0

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2013

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian ne...
research
07/02/2014

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning...
research
05/20/2019

Issues concerning realizability of Blackwell optimal policies in reinforcement learning

N-discount optimality was introduced as a hierarchical form of policy- a...
research
10/09/2020

Discussion of Kallus (2020) and Mo, Qi, and Liu (2020): New Objectives for Policy Learning

We discuss the thought-provoking new objective functions for policy lear...
research
01/23/2013

Continuous Value Function Approximation for Sequential Bidding Policies

Market-based mechanisms such as auctions are being studied as an appropr...
research
04/28/2016

Sequential Bayesian optimal experimental design via approximate dynamic programming

The design of multiple experiments is commonly undertaken via suboptimal...
research
03/10/2019

A Decision Support System for Multi-target Geosteering

Geosteering is a sequential decision process under uncertainty. The goal...

Please sign up or login with your details

Forgot password? Click here to reset