Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

06/15/2023
by   Xiangsen Wang, et al.
0

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2023

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

Offline reinforcement learning (RL) has received considerable attention ...
research
11/09/2021

Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

Reinforcement Learning (RL) has been shown effective in domains where th...
research
06/04/2023

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings,...
research
11/22/2021

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

The idea of conservatism has led to significant progress in offline rein...
research
02/16/2021

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings,...
research
10/13/2022

Multi-agent Dynamic Algorithm Configuration

Automated algorithm configuration relieves users from tedious, trial-and...
research
08/03/2020

QPLEX: Duplex Dueling Multi-Agent Q-Learning

We explore value-based multi-agent reinforcement learning (MARL) in the ...

Please sign up or login with your details

Forgot password? Click here to reset