Using Embeddings to Correct for Unobserved Confounding
We consider causal inference in the presence of unobserved confounding. In particular, we study the case where a proxy is available for the confounder but the proxy has non-iid structure. As one example, the link structure of a social network carries information about its members. As another, the text of a document collection carries information about their meanings. In both these settings, we show how to effectively use the proxy to do causal inference. The main idea is to reduce the causal estimation problem to a semi-supervised prediction of both the treatments and outcomes. Networks and text both admit high-quality embedding models that can be used for this semi-supervised prediction. Our method yields valid inferences under suitable (weak) conditions on the quality of the predictive model. We validate the method with experiments on a semi-synthetic social network dataset. We demonstrate the method by estimating the causal effect of properties of computer science submissions on whether they are accepted at a conference.
READ FULL TEXT