Multi-Agent Reinforcement Learning with Reward Delays

12/02/2022
∙
by   Yuyang Zhang, et al.
∙
0
∙

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies among agents. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate 𝒊Ėƒ(H^3√(Sð’Ŋ_K)/K+H^3√(SA)/√(K)) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and ð’Ŋ_K is the measure of the total delay defined in the paper. Moreover, our algorithm can be extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 12/03/2022

DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning

Communication is supposed to improve multi-agent collaboration and overa...
research
∙ 10/25/2020

Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control

In many reinforcement learning (RL) problems, it takes some time until a...
research
∙ 05/11/2020

Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments

Action and observation delays exist prevalently in the real-world cyber-...
research
∙ 06/04/2021

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random del...
research
∙ 07/18/2020

Tomography Based Learning for Load Distribution through Opaque Networks

Applications such as virtual reality and online gaming require low delay...
research
∙ 12/21/2020

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

Online learning has been successfully applied to many problems in which ...
research
∙ 02/17/2020

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

A large portion of the passenger requests is reportedly unserviced, part...

Please sign up or login with your details

Forgot password? Click here to reset