Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

03/17/2021
by   Lin Chen, et al.
41

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime dγ^2>1, where d is the dimension of the feature vector and γ is the discount rate. In this regime, for any q∈[γ^2,1], we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is q/d and it requires Ω(d/γ^2(q-γ^2)ε^2exp(Θ(dγ^2))) samples to approximate the value function up to an additive error ε. Note that the lower bound of the sample complexity is exponential in d. If q=γ^2, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most O(max{‖θ^π‖ _2^4/ε^4logd/δ,1/ε^2(d+log1/δ)}) samples (θ^π is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of ε with probability at least 1-δ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) ...
research
11/21/2021

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

We consider the offline reinforcement learning problem, where the aim is...
research
03/08/2022

A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinfo...
research
03/02/2021

Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning

We study the dynamics of temporal-difference learning with neural networ...
research
04/02/2021

Linear Systems can be Hard to Learn

In this paper, we investigate when system identification is statisticall...
research
03/27/2023

Dimensionality Collapse: Optimal Measurement Selection for Low-Error Infinite-Horizon Forecasting

This work introduces a method to select linear functional measurements o...
research
05/30/2023

Sharp high-probability sample complexities for policy evaluation with linear function approximation

This paper is concerned with the problem of policy evaluation with linea...

Please sign up or login with your details

Forgot password? Click here to reset