Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

09/08/2021
โˆ™
by   Rajeeva L. Karandikar, et al.
โˆ™
0
โˆ™

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form ๐Ÿ(ฮธ) = 0 where ๐Ÿ : โ„^d โ†’โ„^d, when only noisy measurements of ๐Ÿ(ยท) are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess ฮธ_t is updated at each time, and "asynchronous" updating, whereby ony one component of ฮธ_t is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of ฮธ_t are updated at each time t. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.dsequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 02/15/2019

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...
research
โˆ™ 02/01/2020

Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning

We consider a general asynchronous Stochastic Approximation (SA) scheme ...
research
โˆ™ 06/27/2021

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Using a martingale concentration inequality, concentration bounds `from ...
research
โˆ™ 03/28/2023

Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients

In this paper, we study the well-known "Heavy Ball" method for convex an...
research
โˆ™ 01/29/2019

Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation

Batch Bayesian optimisation (BO) has been successfully applied to hyperp...
research
โˆ™ 02/15/2019

Reinforcement Learning Without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...
research
โˆ™ 01/28/2019

Sundial: Using Sunlight to Reconstruct Global Timestamps

This paper investigates postmortem timestamp reconstruction in environme...

Please sign up or login with your details

Forgot password? Click here to reset