Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

06/28/2021
by   Koulik Khamaru, et al.
5

Various algorithms in reinforcement learning exhibit dramatic variability in their convergence rates and ultimate accuracy as a function of the problem structure. Such instance-specific behavior is not captured by existing global minimax bounds, which are worst-case in nature. We analyze the problem of estimating optimal Q-value functions for a discounted Markov decision process with discrete states and actions and identify an instance-dependent functional that controls the difficulty of estimation in the ℓ_∞-norm. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of Q-learning. Our theory provides a precise way of distinguishing "easy" problems from "hard" ones in the context of Q-learning, as illustrated by an ensemble with a continuum of difficulty.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2022

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

We study the off-policy evaluation (OPE) problem in an infinite-horizon ...
research
06/11/2019

Variance-reduced Q-learning is minimax optimal

We introduce and analyze a form of variance-reduced Q-learning. For γ-di...
research
01/21/2022

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic vari...
research
03/16/2020

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decisio...
research
01/21/2022

Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive oper...
research
12/24/2021

Accelerated and instance-optimal policy evaluation with linear function approximation

We study the problem of policy evaluation with linear function approxima...
research
10/07/2012

Privacy Aware Learning

We study statistical risk minimization problems under a privacy model in...

Please sign up or login with your details

Forgot password? Click here to reset