Double Robust Semi-Supervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

04/14/2021
by   Yuqian Zhang, et al.
0

Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, L, the SS setting is characterized by an additional, much larger sized, unlabeled data, U. The setting of |U| >> |L|, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called 'positivity' or 'overlap' assumption. However, most of the SS literature implicitly assumes L and U to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random (MAR) type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response's mean. We propose a double robust SS (DRSS) mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size |L|. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high and low dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2022

A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

In this article, we aim to provide a general and complete understanding ...
research
01/25/2022

Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

We consider quantile estimation in a semi-supervised setting, characteri...
research
01/17/2017

Efficient and Adaptive Linear Regression in Semi-Supervised Settings

We consider the linear regression problem under semi-supervised settings...
research
12/09/2020

Semi-Supervised Off Policy Reinforcement Learning

Reinforcement learning (RL) has shown great success in estimating sequen...
research
03/23/2018

Robust semiparametric estimators: missing data and causal inference

Semiparametric inference with missing outcome data (including causal inf...

Please sign up or login with your details

Forgot password? Click here to reset