Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

12/07/2020
by   Lipeng Wan, et al.
0

Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to multi-agent system. Further more, we introduce a policy approximation for synchronous advantage estimation, and break down the multi-agent policy optimization problem into multiple sub-problems of single-agent policy optimization. Our method is compared with baseline algorithms on StarCraft multi-agent challenges, and shows the best performance on most of the tasks.

READ FULL TEXT
research
10/15/2020

Multi-Agent Trust Region Policy Optimization

We extend trust region policy optimization (TRPO) to multi-agent reinfor...
research
11/09/2022

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of th...
research
09/16/2020

Energy-based Surprise Minimization for Multi-Agent Value Factorization

Multi-Agent Reinforcement Learning (MARL) has demonstrated significant s...
research
10/17/2017

Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

A key challenge in multi-robot and multi-agent systems is generating sol...
research
05/31/2020

Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning

Value decomposition is a popular and promising approach to scaling up mu...
research
02/07/2023

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (MARL) requires agents to...
research
04/07/2022

Robust Event-Driven Interactions in Cooperative Multi-Agent Learning

We present an approach to reduce the communication required between agen...

Please sign up or login with your details

Forgot password? Click here to reset