Sparse Stochastic Finite-State Controllers for POMDPs

06/13/2012
by   Eric A. Hansen, et al.
0

Bounded policy iteration is an approach to solving infinite-horizon POMDPs that represents policies as stochastic finite-state controllers and iteratively improves a controller by adjusting the parameters of each node using linear programming. In the original algorithm, the size of the linear programs, and thus the complexity of policy improvement, depends on the number of parameters of each node, which grows with the size of the controller. But in practice, the number of parameters of a node with non-zero values is often very small, and does not grow with the size of the controller. Based on this observation, we develop a version of bounded policy iteration that leverages the sparse structure of a stochastic finite-state controller. In each iteration, it improves a policy by the same amount as the original algorithm, but with much better scalability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2013

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs iteratively improve a value function ...
research
09/17/2021

Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP

This paper looks at solving collaborative planning problems formalized a...
research
01/15/2014

Policy Iteration for Decentralized Control of Markov Decision Processes

Coordination of distributed agents is required for problems arising in m...
research
05/19/2020

Robust Policy Iteration for Continuous-time Linear Quadratic Regulation

This paper studies the robustness of policy iteration in the context of ...
research
01/21/2020

Stochastic Finite State Control of POMDPs with LTL Specifications

Partially observable Markov decision processes (POMDPs) provide a modeli...
research
05/01/2015

Stick-Breaking Policy Learning in Dec-POMDPs

Expectation maximization (EM) has recently been shown to be an efficient...
research
09/04/2020

Technical Report: The Policy Graph Improvement Algorithm

Optimizing a partially observable Markov decision process (POMDP) policy...

Please sign up or login with your details

Forgot password? Click here to reset