Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

12/05/2018
by   Peixi Peng, et al.
0

Many reality tasks such as robot coordination can be naturally modelled as multi-agent cooperative system where the rewards are sparse. This paper focuses on learning decentralized policies for such tasks using sub-optimal demonstration. To learn the multi-agent cooperation effectively and tackle the sub-optimality of demonstration, a self-improving learning method is proposed: On the one hand, the centralized state-action values are initialized by the demonstration and updated by the learned decentralized policy to improve the sub-optimality. On the other hand, the Nash Equilibrium are found by the current state-action value and are used as a guide to learn the policy. The proposed method is evaluated on the combat RTS games which requires a high level of multi-agent cooperation. Extensive experimental results on various combat scenarios demonstrate that the proposed method can learn multi-agent cooperation effectively. It significantly outperforms many state-of-the-art demonstration based approaches.

READ FULL TEXT
research
09/17/2022

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

Decentralized learning has shown great promise for cooperative multi-age...
research
01/25/2019

Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies

Decision making in multi-agent systems (MAS) is a great challenge due to...
research
07/12/2022

Towards Global Optimality in Cooperative MARL with Sequential Transformation

Policy learning in multi-agent reinforcement learning (MARL) is challeng...
research
07/06/2020

Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning

We present a multi-agent actor-critic method that aims to implicitly add...
research
01/16/2019

ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

In autonomous vehicle (AV) control, allowing mistakes can be quite dange...
research
03/20/2018

Generative Multi-Agent Behavioral Cloning

We propose and study the problem of generative multi-agent behavioral cl...
research
06/27/2021

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the ...

Please sign up or login with your details

Forgot password? Click here to reset