Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic

06/12/2022
∙
by   Qijun Luo, et al.
∙
2
∙

Decentralized Actor-Critic (AC) algorithms have been widely utilized for multi-agent reinforcement learning (MARL) and have achieved remarkable success. Apart from its empirical success, the theoretical convergence property of decentralized AC algorithms is largely unexplored. The existing finite-time convergence results are derived based on either double-loop update or two-timescale step sizes rule, which is not often adopted in real implementation. In this work, we introduce a fully decentralized AC algorithm, where actor, critic, and global reward estimator are updated in an alternating manner with step sizes being of the same order, namely, we adopt the single-timescale update. Theoretically, using linear approximation for value and reward estimation, we show that our algorithm has sample complexity of 𝒊Ėƒ(Ïĩ^-2) under Markovian sampling, which matches the optimal complexity with double-loop implementation (here, 𝒊Ėƒ hides a log term). The sample complexity can be improved to 𝒊(Ïĩ^-2) under the i.i.d. sampling scheme. The central to establishing our complexity results is the hidden smoothness of the optimal critic variable we revealed. We also provide a local action privacy-preserving version of our algorithm and its analysis. Finally, we conduct experiments to show the superiority of our algorithm over the existing decentralized AC algorithms.

READ FULL TEXT
research
∙ 08/18/2022

Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator

The actor-critic (AC) reinforcement learning algorithms have been the po...
research
∙ 03/04/2022

A Small Gain Analysis of Single Timescale Actor Critic

We consider a version of actor-critic which uses proportional step-sizes...
research
∙ 10/18/2022

Finite-time analysis of single-timescale actor-critic

Despite the great empirical success of actor-critic methods, its finite-...
research
∙ 09/08/2021

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Actor-critic (AC) algorithms have been widely adopted in decentralized m...
research
∙ 12/31/2020

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Asynchronous and parallel implementation of standard reinforcement learn...
research
∙ 03/08/2023

Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games

We introduce a class of networked Markov potential games where agents ar...
research
∙ 07/25/2022

Cooperative Actor-Critic via TD Error Aggregation

In decentralized cooperative multi-agent reinforcement learning, agents ...

Please sign up or login with your details

Forgot password? Click here to reset