Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic
Decentralized Actor-Critic (AC) algorithms have been widely utilized for multi-agent reinforcement learning (MARL) and have achieved remarkable success. Apart from its empirical success, the theoretical convergence property of decentralized AC algorithms is largely unexplored. The existing finite-time convergence results are derived based on either double-loop update or two-timescale step sizes rule, which is not often adopted in real implementation. In this work, we introduce a fully decentralized AC algorithm, where actor, critic, and global reward estimator are updated in an alternating manner with step sizes being of the same order, namely, we adopt the single-timescale update. Theoretically, using linear approximation for value and reward estimation, we show that our algorithm has sample complexity of 𝒪̃(ϵ^-2) under Markovian sampling, which matches the optimal complexity with double-loop implementation (here, 𝒪̃ hides a log term). The sample complexity can be improved to 𝒪(ϵ^-2) under the i.i.d. sampling scheme. The central to establishing our complexity results is the hidden smoothness of the optimal critic variable we revealed. We also provide a local action privacy-preserving version of our algorithm and its analysis. Finally, we conduct experiments to show the superiority of our algorithm over the existing decentralized AC algorithms.
READ FULL TEXT