Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems

11/09/2020
by   Sushmita Bhattacharya, et al.
8

In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of O(Cm) as compared with O(C^m) for standard rollout, where C is the maximum cardinality of the constraint set for the control component of each agent, and m is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size 10^37 and control space size 10^7, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the con...
research
09/30/2019

Multiagent Rollout Algorithms and Reinforcement Learning

We consider finite and infinite horizon dynamic programming problems, wh...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
09/04/2020

Technical Report: The Policy Graph Improvement Algorithm

Optimizing a partially observable Markov decision process (POMDP) policy...
research
03/28/2023

Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon

Safety-critical cyber-physical systems require control strategies whose ...
research
10/04/2019

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...

Please sign up or login with your details

Forgot password? Click here to reset