Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

11/08/2020
by   Botao Hao, et al.
0

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy. Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. The results suggest that having well-conditioned data is crucial for sparse batch policy learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement le...
research
01/30/2023

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at finding an optimal policy in a...
research
02/10/2022

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinfor...
research
12/21/2019

An error bound for Lasso and Group Lasso in high dimensions

We leverage recent advances in high-dimensional statistics to derive new...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...
research
05/25/2018

Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

Policy evaluation with linear function approximation is an important pro...
research
08/18/2023

Breaking the Complexity Barrier in Compositional Minimax Optimization

Compositional minimax optimization is a pivotal yet under-explored chall...

Please sign up or login with your details

Forgot password? Click here to reset