Multi-Agent Reinforcement Learning with Reward Delays

12/02/2022

∙

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies among agents. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate 𝒪̃(H^3√(S𝒯_K)/K+H^3√(SA)/√(K)) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and 𝒯_K is the measure of the total delay defined in the paper. Moreover, our algorithm can be extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

READ FULL TEXT

Multi-Agent Reinforcement Learning with Reward Delays

Sign in with Google

Consider DeepAI Pro