DeepAI AI Chat
Log In Sign Up

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

11/22/2021
by   Ling Pan, et al.
Tsinghua University
Stanford University
0

The idea of conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, it is still an open question to resolve offline RL in the more practical multi-agent setting as many real-world scenarios involve interaction among multiple agents. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, when conservatism-based algorithms are applied to the multi-agent setting, the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify that a key issue that the landscape of the value function can be non-concave and policy gradient improvements are prone to local optima. Multiple agents exacerbate the problem since the suboptimal policy by any agent could lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge via an effective combination of first-order policy gradient and zeroth-order optimization methods for the actor to better optimize the conservative value function. Despite the simplicity, OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/07/2021

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Learning from datasets without interaction with environments (Offline Le...
12/26/2020

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...
02/28/2017

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traff...
08/03/2020

QPLEX: Duplex Dueling Multi-Agent Q-Learning

We explore value-based multi-agent reinforcement learning (MARL) in the ...
11/11/2022

Efficient Domain Coverage for Vehicles with Second Order Dynamics via Multi-Agent Reinforcement Learning

Collaborative autonomous multi-agent systems covering a specified area h...
09/17/2018

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Ranking is a fundamental and widely studied problem in scenarios such as...