Optimal oracle inequalities for solving projected fixed-point equations

12/09/2020
by   Wenlong Mou, et al.
0

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak–Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2021

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

We study stochastic approximation procedures for approximately solving a...
research
12/24/2021

Accelerated and instance-optimal policy evaluation with linear function approximation

We study the problem of policy evaluation with linear function approxima...
research
12/30/2020

Adversarial Estimation of Riesz Representers

We provide an adversarial approach to estimating Riesz representers of l...
research
09/24/2021

Optimal policy evaluation using kernel-based temporal difference methods

We study methods based on reproducing kernel Hilbert spaces for estimati...
research
03/16/2020

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decisio...
research
02/21/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement le...
research
04/28/2021

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation. However...

Please sign up or login with your details

Forgot password? Click here to reset