Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

11/15/2020
by   Georgios Kotsalis, et al.
0

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by its original version that maintain its simplicity, while offering distinct advantages from a non-asymptotic analysis point of view. We first provide an improved analysis of the standard TD algorithm that can benefit from parallel implementation. Then we present versions of a conditional TD algorithm (CTD), that involves periodic updates of the stochastic iterates, which reduce the bias and therefore exhibit improved iteration complexity. This bring us to the fast TD (FTD) algorithm which combines elements of CTD and the stochastic operator extrapolation method of the companion paper. For a novel index resetting policy FTD exhibits the best known convergence rate. We also devised a robust version of the algorithm that is particularly suitable for discounting factors close to 1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2020

Simple and optimal methods for stochastic variational inequalities, I: operator extrapolation

In this paper we first present a novel operator extrapolation (OE) metho...
research
07/05/2021

The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities

In this paper, we analyze the local convergence rate of optimistic mirro...
research
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
research
08/07/2019

Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual optimization

We consider a distributed multi-agent policy evaluation problem in reinf...
research
02/04/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Linear two-timescale stochastic approximation (SA) scheme is an importan...
research
10/10/2019

One Sample Stochastic Frank-Wolfe

One of the beauties of the projected gradient descent method lies in its...
research
11/15/2022

On the rate of convergence of Bregman proximal methods in constrained variational inequalities

We examine the last-iterate convergence rate of Bregman proximal methods...

Please sign up or login with your details

Forgot password? Click here to reset