
SemiSupervised learning with DensityRatio Estimation
In this paper, we study statistical properties of semisupervised learni...
read it

Asymptotic Bayes risk for Gaussian mixture in a semisupervised setting
Semisupervised learning (SSL) uses unlabeled data for training and has ...
read it

Counterfactual Propagation for SemiSupervised Individual Treatment Effect Estimation
Individual treatment effect (ITE) represents the expected improvement in...
read it

SemiSupervised Approaches to Efficient Evaluation of Model Prediction Performance
In many modern machine learning applications, the outcome is expensive o...
read it

Efficient and Robust SemiSupervised Estimation of Average Treatment Effects in Electronic Medical Records Data
There is strong interest in conducting comparative effectiveness researc...
read it

Surrogate Assisted Semisupervised Inference for High Dimensional Risk Prediction
Risk modeling with EHR data is challenging due to a lack of direct obser...
read it

Double Robust SemiSupervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying Overlap
Semisupervised (SS) inference has received much attention in recent yea...
read it
Semisupervised learning and the question of true versus estimated propensity scores
A straightforward application of semisupervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller labeled data set could proceed via weighted estimators using the learned propensity scores. In the limiting case of infinite unlabeled data, one may estimate the high dimensional propensity function exactly. However, longstanding advice in the causal inference community suggests that estimated propensity scores (from labeled data alone) are actually preferable to true propensity scores, implying that the unlabeled data is actually useless in this context. In this paper we examine this paradox and propose a simple procedure that reconciles the strong intuition that a known propensity functions should be useful for estimating treatment effects with the previous literature suggesting otherwise. Further, simulation studies suggest that direct regression may be preferable to inversepropensity weight estimators in many circumstances.
READ FULL TEXT
Comments
There are no comments yet.