Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

10/31/2021
by   Kuo Li, et al.
0

We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others. There is a communication graph, among which the agents can exchange information with their neighbors. The agents make individual decisions and cooperate to reach a higher accumulated reward. Towards this end, we first propose a decentralized actor-critic (AC) setting. Then, the policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively. Furthermore, convergence analysis is given under the discrete-space case, which guarantees that the policy will be reinforced by alternating between the processes of policy evaluation and policy improvement. In order to validate the effectiveness of algorithms, we design experiments and compare them with previous algorithms, e.g., Q-learning <cit.> and MADDPG <cit.>. The results show that our algorithms perform better from the aspects of both learning speed and final performance. Moreover, the algorithms can be executed in an off-policy manner, which greatly improves the data efficiency compared with on-policy algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2018

Multi-Agent Actor-Critic with Generative Cooperative Policy Network

We propose an efficient multi-agent reinforcement learning approach to d...
research
09/30/2021

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

In multi-agent reinforcement learning (MARL), it is challenging for a co...
research
02/15/2021

How RL Agents Behave When Their Actions Are Modified

Reinforcement learning in complex environments may require supervision t...
research
10/15/2020

Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

Consider a typical organization whose worker agents seek to collectively...
research
11/03/2022

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Given an environment (e.g., a simulator) for evaluating samples in a spe...
research
01/30/2023

Planning Multiple Epidemic Interventions with Reinforcement Learning

Combating an epidemic entails finding a plan that describes when and how...
research
11/22/2021

Off-Policy Correction For Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) provides a framework for probl...

Please sign up or login with your details

Forgot password? Click here to reset