Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

02/22/2018
by   Arunselvan Ramaswamy, et al.
0

Asynchronous stochastic approximations are an important class of model-free algorithms that are readily applicable to multi-agent reinforcement learning (RL) and distributed control applications. When the system size is large, the aforementioned algorithms are used in conjunction with function approximations. In this paper, we present a complete analysis, including stability (almost sure boundedness) and convergence, of asynchronous stochastic approximations with asymptotically bounded biased errors, under easily verifiable sufficient conditions. As an application, we analyze the Policy Gradient algorithms and the more general Value Iteration based algorithms with noise. These are popular reinforcement learning algorithms due to their simplicity and effectiveness. Specifically, we analyze the asynchronous approximate counterpart of policy gradient (A2PG) and value iteration (A2VI) schemes. It is shown that the stability of these algorithms remains unaffected when the approximation errors are guaranteed to be asymptotically bounded, although possibly biased. Regarding convergence of A2VI, it is shown to converge to a fixed point of the perturbed Bellman operator when balanced step-sizes are used. Further, a relationship between these fixed points and the approximation errors is established. A similar analysis for A2PG is also presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2016

Analysis of gradient descent methods with non-diminishing, bounded errors

The main aim of this paper is to provide an analysis of gradient descent...
research
03/17/2019

DSPG: Decentralized Simultaneous Perturbations Gradient Descent Scheme

In this paper, we present an asynchronous approximate gradient method th...
research
02/02/2021

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

This paper develops an unified framework to study finite-sample converge...
research
10/23/2022

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Multi-agent interactions are increasingly important in the context of re...
research
12/10/2021

Edge-Compatible Reinforcement Learning for Recommendations

Most reinforcement learning (RL) recommendation systems designed for edg...
research
05/11/2023

Stability and Convergence of Distributed Stochastic Approximations with large Unbounded Stochastic Information Delays

We generalize the Borkar-Meyn stability Theorem (BMT) to distributed sto...

Please sign up or login with your details

Forgot password? Click here to reset