QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

04/30/2012
by   Soummya Kar, et al.
0

The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents' objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of Q-learning, QD-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is weakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed time-scale stochastic dynamics of the consensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose novel distributed gradient-based temporal diffe...
research
04/04/2023

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Autonomous cyber and cyber-physical systems need to perform decision-mak...
research
07/06/2020

Consensus Multi-Agent Reinforcement Learning for Volt-VAR Control in Power Distribution Networks

Volt-VAR control (VVC) is a critical application in active distribution ...
research
02/04/2021

Optimizing Consensus-based Multi-target Tracking with Multiagent Rollout Control Policies

This paper considers a multiagent, connected, robotic fleet where the pr...
research
03/07/2022

On observability and optimal gain design for distributed linear filtering and prediction

This paper presents a new approach to distributed linear filtering and p...
research
11/19/2020

Zeroth-Order Feedback Optimization for Cooperative Multi-Agent Systems

We study a class of cooperative multi-agent optimization problems, where...
research
03/07/2019

Intelligent Knowledge Distribution: Constrained-Action POMDPs for Resource-Aware Multi-Agent Communication

This paper addresses a fundamental question of multi-agent knowledge dis...

Please sign up or login with your details

Forgot password? Click here to reset