Best Possible Q-Learning

02/02/2023
by   Jiechuan Jiang, et al.
0

Fully decentralized learning, where the global information, i.e., the actions of other agents, is inaccessible, is a fundamental challenge in cooperative multi-agent reinforcement learning. However, the convergence and optimality of most decentralized algorithms are not theoretically guaranteed, since the transition probabilities are non-stationary as all agents are updating policies simultaneously. To tackle this challenge, we propose best possible operator, a novel decentralized operator, and prove that the policies of agents will converge to the optimal joint policy if each agent independently updates its individual state-action value by the operator. Further, to make the update more efficient and practical, we simplify the operator and prove that the convergence and optimality still hold with the simplified one. By instantiating the simplified operator, the derived fully decentralized algorithm, best possible Q-learning (BQL), does not suffer from non-stationarity. Empirically, we show that BQL achieves remarkable improvement over baselines in a variety of cooperative multi-agent tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2023

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized cooperative multi-agent deep reinforcement learning (MARL)...
research
09/17/2022

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

Decentralized learning has shown great promise for cooperative multi-age...
research
03/16/2023

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Stochastic games are a popular framework for studying multi-agent reinfo...
research
08/04/2021

Offline Decentralized Multi-Agent Reinforcement Learning

In many real-world multi-agent cooperative tasks, due to high cost and r...
research
11/09/2020

BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization

This article deals with the problem of distributed machine learning, in ...
research
02/16/2023

Model-Based Decentralized Policy Optimization

Decentralized policy optimization has been commonly used in cooperative ...
research
03/19/2020

Decentralized MCTS via Learned Teammate Models

A key difficulty of cooperative decentralized planning lies in making ac...

Please sign up or login with your details

Forgot password? Click here to reset