Semi-Supervised Off Policy Reinforcement Learning

by   Aaron Sonabend W, et al.

Reinforcement learning (RL) has shown great success in estimating sequential treatment strategies which account for patient heterogeneity. However, health-outcome information is often not well coded but rather embedded in clinical notes. Extracting precise outcome information is a resource intensive task. This translates into only small well-annotated cohorts available. We propose a semi-supervised learning (SSL) approach that can efficiently leverage a small sized labeled data ℒ with true outcome observed, and a large sized unlabeled data 𝒰 with outcome surrogates W. In particular we propose a theoretically justified SSL approach to Q-learning and develop a robust and efficient SSL approach to estimating the value function of the derived optimal STR, defined as the expected counterfactual outcome under the optimal STR. Generalizing SSL to learning STR brings interesting challenges. First, the feature distribution for predicting Y_t is unknown in the Q-learning procedure, as it includes unknown Y_t-1 due to the sequential nature. Our methods for estimating optimal STR and its associated value function, carefully adapts to this sequentially missing data structure. Second, we modify the SSL framework to handle the use of surrogate variables W which are predictive of the outcome through the joint law ℙ_Y, O, W, but are not part of the conditional distribution of interest ℙ_Y| O. We provide theoretical results to understand when and to what degree efficiency can be gained from W and O. Our approach is robust to misspecification of the imputation models. Further, we provide a doubly robust value function estimator for the derived STR. If either the Q functions or the propensity score functions are correctly specified, our value function estimators are consistent for the true value function.


page 1

page 2

page 3

page 4


Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

There is strong interest in conducting comparative effectiveness researc...

Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression

There are many scenarios such as the electronic health records where the...

Efficient and robust transfer learning of optimal individualized treatment regimes with right-censored survival data

An individualized treatment regime (ITR) is a decision rule that assigns...

Double Robust Semi-Supervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

Semi-supervised (SS) inference has received much attention in recent yea...

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

We consider the problem of reinforcement learning over episodes of a fin...

Information-Theoretic Considerations in Batch Reinforcement Learning

Value-function approximation methods that operate in batch mode have fou...

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

Off-policy evaluation often refers to two related tasks: estimating the ...

Please sign up or login with your details

Forgot password? Click here to reset