Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

06/06/2022
by   Xiang Ji, et al.
0

We consider the off-policy evaluation problem of reinforcement learning using deep neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data are generated from an unknown behavior policy. We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high representation dimensionality. Specifically, we establish a sharp error bound for the fitted Q-evaluation that depends on the intrinsic low dimension, the smoothness of the state-action space, and a function class-restricted χ^2-divergence. It is noteworthy that the restricted χ^2-divergence measures the behavior and target policies' mismatch in the function space, which can be small even if the two policies are not close to each other in their tabular forms. Numerical experiments are provided to support our theoretical analysis.

READ FULL TEXT
research
02/15/2016

Efficient Representation of Low-Dimensional Manifolds using Deep Networks

We consider the ability of deep neural networks to represent data that l...
research
02/21/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement le...
research
03/11/2021

On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

This paper studies the statistical theory of offline reinforcement learn...
research
01/21/2017

Learning Policies for Markov Decision Processes from Data

We consider the problem of learning a policy for a Markov decision proce...
research
01/29/2018

Using deep Q-learning to understand the tax evasion behavior of risk-averse firms

Designing tax policies that are effective in curbing tax evasion and max...
research
07/04/2023

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Convolutional residual neural networks (ConvResNets), though overparamet...
research
10/29/2020

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

We consider off-policy evaluation (OPE) in continuous action domains, su...

Please sign up or login with your details

Forgot password? Click here to reset