Semi-supervised learning and the question of true versus estimated propensity scores

09/14/2020
by   Andrew Herren, et al.
0

A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller labeled data set could proceed via weighted estimators using the learned propensity scores. In the limiting case of infinite unlabeled data, one may estimate the high dimensional propensity function exactly. However, longstanding advice in the causal inference community suggests that estimated propensity scores (from labeled data alone) are actually preferable to true propensity scores, implying that the unlabeled data is actually useless in this context. In this paper we examine this paradox and propose a simple procedure that reconciles the strong intuition that a known propensity functions should be useful for estimating treatment effects with the previous literature suggesting otherwise. Further, simulation studies suggest that direct regression may be preferable to inverse-propensity weight estimators in many circumstances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2022

A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

In this article, we aim to provide a general and complete understanding ...
research
07/08/2019

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Semi-supervised learning (SSL) uses unlabeled data for training and has ...
research
05/11/2020

Counterfactual Propagation for Semi-Supervised Individual Treatment Effect Estimation

Individual treatment effect (ITE) represents the expected improvement in...
research
01/25/2022

Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

We consider quantile estimation in a semi-supervised setting, characteri...
research
03/31/2018

Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

There is strong interest in conducting comparative effectiveness researc...
research
06/17/2023

Distributed Semi-Supervised Sparse Statistical Inference

This paper is devoted to studying the semi-supervised sparse statistical...
research
05/18/2023

On true versus estimated propensity scores for treatment effect estimation with discrete controls

The finite sample variance of an inverse propensity weighted estimator i...

Please sign up or login with your details

Forgot password? Click here to reset