Multi-Agent Reinforcement Learning with Reward Delays
This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies among agents. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate ðŠĖ(H^3â(SðŊ_K)/K+H^3â(SA)/â(K)) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and ðŊ_K is the measure of the total delay defined in the paper. Moreover, our algorithm can be extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.
READ FULL TEXT