Optimal variance-reduced stochastic approximation in Banach spaces

01/21/2022
by   Wenlong Mou, et al.
8

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and Q-learning for tabular Markov decision processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2020

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decisio...
research
05/15/2019

Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learnin...
research
09/24/2021

Optimal policy evaluation using kernel-based temporal difference methods

We study methods based on reproducing kernel Hilbert spaces for estimati...
research
06/28/2021

Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variabilit...
research
09/17/2015

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision process...
research
10/02/2020

Self-Play Reinforcement Learning for Fast Image Retargeting

In this study, we address image retargeting, which is a task that adjust...
research
08/14/2015

Emphatic TD Bellman Operator is a Contraction

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...

Please sign up or login with your details

Forgot password? Click here to reset