Multi-Agent Reinforcement Learning with Reward Delays

12/02/2022
∙
by   Yuyang Zhang, et al.
∙
0
∙

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies among agents. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate 𝒊Ėƒ(H^3√(Sð’Ŋ_K)/K+H^3√(SA)/√(K)) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and ð’Ŋ_K is the measure of the total delay defined in the paper. Moreover, our algorithm can be extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset