Log In Sign Up

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

by   Koulik Khamaru, et al.

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the ℓ_∞-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.


page 1

page 2

page 3

page 4


Accelerated and instance-optimal policy evaluation with linear function approximation

We study the problem of policy evaluation with linear function approxima...

Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive oper...

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

We study stochastic approximation procedures for approximately solving a...

Stochastic approximation with decision-dependent distributions: asymptotic normality and optimality

We analyze a stochastic approximation algorithm for decision-dependent p...

Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variabilit...

Optimal oracle inequalities for solving projected fixed-point equations

Linear fixed point equations in Hilbert spaces arise in a variety of set...

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic vari...