Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

11/22/2021
by   Ling Pan, et al.
0

The idea of conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, it is still an open question to resolve offline RL in the more practical multi-agent setting as many real-world scenarios involve interaction among multiple agents. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, when conservatism-based algorithms are applied to the multi-agent setting, the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify that a key issue that the landscape of the value function can be non-concave and policy gradient improvements are prone to local optima. Multiple agents exacerbate the problem since the suboptimal policy by any agent could lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge via an effective combination of first-order policy gradient and zeroth-order optimization methods for the actor to better optimize the conservative value function. Despite the simplicity, OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Learning from datasets without interaction with environments (Offline Le...
research
02/01/2023

Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning

Being able to harness the power of large, static datasets for developing...
research
06/15/2023

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Offline reinforcement learning (RL) that learns policies from offline da...
research
02/28/2017

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...
research
09/13/2023

Characterizing Speed Performance of Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) has achieved significant succe...
research
11/11/2022

Efficient Domain Coverage for Vehicles with Second Order Dynamics via Multi-Agent Reinforcement Learning

Collaborative autonomous multi-agent systems covering a specified area h...
research
09/17/2018

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Ranking is a fundamental and widely studied problem in scenarios such as...

Please sign up or login with your details

Forgot password? Click here to reset