Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning

11/28/2022
by   Qi Tian, et al.
0

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

READ FULL TEXT

page 3

page 18

research
07/21/2023

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

Offline reinforcement learning (RL) has received considerable attention ...
research
07/04/2023

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

We present a novel Diffusion Offline Multi-agent Model (DOM2) for offlin...
research
05/26/2023

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Training multiple agents to coordinate is an important problem with appl...
research
06/01/2022

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

Cooperative multi-agent reinforcement learning (MARL) is making rapid pr...
research
11/02/2020

Multi-Agent Reinforcement Learning for Persistent Monitoring

The Persistent Monitoring (PM) problem seeks to find a set of trajectori...
research
02/01/2023

Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning

Being able to harness the power of large, static datasets for developing...
research
08/04/2021

Offline Decentralized Multi-Agent Reinforcement Learning

In many real-world multi-agent cooperative tasks, due to high cost and r...

Please sign up or login with your details

Forgot password? Click here to reset