Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

03/21/2019
by   Yan Zhang, et al.
0

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2019

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent ...
research
05/10/2021

AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning

This paper investigates the problem of age of information (AoI) aware ra...
research
09/18/2017

Guided Deep Reinforcement Learning for Swarm Systems

In this paper, we investigate how to learn to control a group of coopera...
research
12/14/2019

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

Multi-agent reinforcement learning has been successfully applied to a nu...
research
05/29/2021

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

We posit a new mechanism for cooperation in multi-agent reinforcement le...
research
05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...
research
08/16/2021

Optimal Actor-Critic Policy with Optimized Training Datasets

Actor-critic (AC) algorithms are known for their efficacy and high perfo...

Please sign up or login with your details

Forgot password? Click here to reset