On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

03/11/2021
by   Thanh Nguyen-Tang, et al.
21

This paper studies the statistical theory of offline reinforcement learning with deep ReLU networks. We consider the off-policy evaluation (OPE) problem where the goal is to estimate the expected discounted reward of a target policy given the logged data generated by unknown behaviour policies. We study a regression-based fitted Q evaluation (FQE) method using deep ReLU networks and characterize a finite-sample bound on the estimation error of this method under mild assumptions. The prior works in OPE with either general function approximation or deep ReLU networks ignore the data-dependent structure in the algorithm, dodging the technical bottleneck of OPE, while requiring a rather restricted regularity assumption. In this work, we overcome these limitations and provide a comprehensive analysis of OPE with deep ReLU networks. In particular, we precisely quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error. Consequently, we provide insights into the interplay between offline reinforcement learning and deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement le...
research
05/31/2023

Optimal Estimates for Pairwise Learning with Deep ReLU Networks

Pairwise learning refers to learning tasks where a loss takes a pair of ...
research
06/06/2022

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

We consider the off-policy evaluation problem of reinforcement learning ...
research
05/24/2023

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

We consider offline Reinforcement Learning (RL), where the agent does no...
research
02/10/2022

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinfor...
research
08/15/2023

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

This paper studies the binary classification of unbounded data from ℝ^d ...
research
09/06/2018

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

This work provides a thorough study on how reward scaling can affect per...

Please sign up or login with your details

Forgot password? Click here to reset