DeepAI AI Chat
Log In Sign Up

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

by   Ling Pan, et al.
Tsinghua University
Stanford University

The idea of conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, it is still an open question to resolve offline RL in the more practical multi-agent setting as many real-world scenarios involve interaction among multiple agents. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, when conservatism-based algorithms are applied to the multi-agent setting, the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify that a key issue that the landscape of the value function can be non-concave and policy gradient improvements are prone to local optima. Multiple agents exacerbate the problem since the suboptimal policy by any agent could lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge via an effective combination of first-order policy gradient and zeroth-order optimization methods for the actor to better optimize the conservative value function. Despite the simplicity, OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.


page 1

page 2

page 3

page 4


Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Learning from datasets without interaction with environments (Offline Le...

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...

QPLEX: Duplex Dueling Multi-Agent Q-Learning

We explore value-based multi-agent reinforcement learning (MARL) in the ...

Efficient Domain Coverage for Vehicles with Second Order Dynamics via Multi-Agent Reinforcement Learning

Collaborative autonomous multi-agent systems covering a specified area h...

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Ranking is a fundamental and widely studied problem in scenarios such as...