Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies

01/25/2019
by   Thomy Phan, et al.
0

Decision making in multi-agent systems (MAS) is a great challenge due to enormous state and joint action spaces as well as uncertainty, making centralized control generally infeasible. Decentralized control offers better scalability and robustness but requires mechanisms to coordinate on joint tasks and to avoid conflicts. Common approaches to learn decentralized policies for cooperative MAS suffer from non-stationarity and lacking credit assignment, which can lead to unstable and uncoordinated behavior in complex environments. In this paper, we propose Strong Emergent Policy approximation (STEP), a scalable approach to learn strong decentralized policies for cooperative MAS with a distributed variant of policy iteration. For that, we use function approximation to learn from action recommendations of a decentralized multi-agent planning algorithm. STEP combines decentralized multi-agent planning with centralized learning, only requiring a generative model for distributed black box optimization. We experimentally evaluate STEP in two challenging and stochastic domains with large state and joint action spaces and show that STEP is able to learn stronger policies than standard multi-agent reinforcement learning algorithms, when combining multi-agent open-loop planning with centralized function approximation. The learned policies can be reintegrated into the multi-agent planning process to further improve performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2022

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

In cooperative multi-agent reinforcement learning (MARL), combining valu...
research
04/17/2018

Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation

Making decisions is a great challenge in distributed autonomous environm...
research
10/16/2021

Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network

Multi-agent Markov Decision Process (MMDP) has been an effective way of ...
research
03/19/2020

Decentralized MCTS via Learned Teammate Models

A key difficulty of cooperative decentralized planning lies in making ac...
research
04/19/2021

Approximate Multi-Agent Fitted Q Iteration

We formulate an efficient approximation for multi-agent batch reinforcem...
research
03/15/2012

Rollout Sampling Policy Iteration for Decentralized POMDPs

We present decentralized rollout sampling policy iteration (DecRSPI) - a...
research
12/05/2018

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Many reality tasks such as robot coordination can be naturally modelled ...

Please sign up or login with your details

Forgot password? Click here to reset