Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

01/13/2020
by   C. Shi, et al.
0

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper is to construct confidence intervals (CIs) for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient's health status.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...
research
01/20/2022

Statistical Learning for Individualized Asset Allocation

We establish a high-dimensional statistical learning framework for indiv...
research
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...
research
10/20/2021

Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

Recent advances in mobile health (mHealth) technology provide an effecti...
research
11/16/2022

Minimum information divergence of Q-functions for dynamic treatment resumes

This paper aims at presenting a new application of information geometry ...
research
10/20/2019

Policy Learning for Malaria Control

Sequential decision making is a typical problem in reinforcement learnin...
research
10/22/2017

Exploiting generalization in the subspaces for faster model-based learning

Due to the lack of enough generalization in the state-space, common meth...

Please sign up or login with your details

Forgot password? Click here to reset