Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

05/15/2019
by   Martin J. Wainwright, et al.
0

Motivated by the study of Q-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous Q-learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the ℓ_∞-norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a worst-case sense. These results show that relative to a model-based Q-iteration, the ℓ_∞-based sample complexity of Q-learning is suboptimal in terms of the discount factor γ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2022

Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive oper...
research
06/11/2020

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

We consider the problem of batch multi-task reinforcement learning with ...
research
02/19/2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

In this paper we study a model-based approach to calculating approximate...
research
04/12/2010

Dynamic Policy Programming

In this paper, we propose a novel policy iteration method, called dynami...
research
12/06/2019

Non-asymptotic error bounds for scaled underdamped Langevin MCMC

Recent works have derived non-asymptotic upper bounds for convergence of...
research
04/22/2019

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

This work tackles the problem of robust zero-shot planning in non-statio...
research
06/27/2012

Incremental Model-based Learners With Formal Learning-Time Guarantees

Model-based learning algorithms have been shown to use experience effici...

Please sign up or login with your details

Forgot password? Click here to reset