Offline Decentralized Multi-Agent Reinforcement Learning

08/04/2021
by   Jiechuan Jiang, et al.
0

In many real-world multi-agent cooperative tasks, due to high cost and risk, agents cannot interact with the environment and collect experiences during learning, but have to learn from offline datasets. However, the transition probabilities calculated from the dataset can be much different from the transition probabilities induced by the learned policies of other agents, creating large errors in value estimates. Moreover, the experience distributions of agents' datasets may vary wildly due to diverse behavior policies, causing large difference in value estimates between agents. Consequently, agents will learn uncoordinated suboptimal policies. In this paper, we propose MABCQ, which exploits value deviation and transition normalization to modify the transition probabilities. Value deviation optimistically increases the transition probabilities of high-value next states, and transition normalization normalizes the biased transition probabilities of next states. They together encourage agents to discover potential optimal and coordinated policies. Mathematically, we prove the convergence of Q-learning under the non-stationary transition probabilities after modification. Empirically, we show that MABCQ greatly outperforms baselines and reduces the difference in value estimates between agents.

READ FULL TEXT
research
02/02/2023

Best Possible Q-Learning

Fully decentralized learning, where the global information, i.e., the ac...
research
07/17/2023

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

In multi-timescale multi-agent reinforcement learning (MARL), agents int...
research
11/28/2022

Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning

Offline multi-agent reinforcement learning (MARL) aims to learn effectiv...
research
01/25/2023

Discriminative Experience Replay for Efficient Multi-agent Reinforcement Learning

In cooperative multi-agent tasks, parameter sharing among agents is a co...
research
06/10/2021

Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions

In multi-agent reinforcement learning, the inherent non-stationarity of ...
research
10/10/2021

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning...
research
01/22/2020

Cohort state-transition models in R: From conceptualization to implementation

Decision models can synthesize evidence from different sources to provid...

Please sign up or login with your details

Forgot password? Click here to reset