How to Make Causal Inferences Using Texts

by   Naoki Egami, et al.
Princeton University
The University of Chicago
University of California, San Diego
Stanford University

New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and estimate the ways that observed treatments affect text-based outcomes. We argue that nearly all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation. But estimating this latent representation, we show, creates new risks: we may introduce an identification problem or overfit. To address these risks we describe a split-sample framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic response. Our work provides a rigorous foundation for text-based causal inferences.


page 1

page 2

page 3

page 4


Identification and Estimation of Causal Effects with Confounders Missing Not at Random

Making causal inferences from observational studies can be challenging w...

Causal Support: Modeling Causal Inferences with Visualizations

Analysts often make visual causal inferences about possible data-generat...

Identification and Estimation of Causal Effects from Dependent Data

The assumption that data samples are independent and identically distrib...

Disentangled Representation for Causal Mediation Analysis

Estimating direct and indirect causal effects from observational data is...

Fairness Through Causal Awareness: Learning Latent-Variable Models for Biased Data

How do we learn from biased data? Historical datasets often reflect hist...

ValiTex – a unified validation framework for computational text-based measures of social science constructs

Guidance on how to validate computational text-based measures of social ...

From one environment to many: The problem of replicability of statistical inferences

The environment in which an experiment is conducted is unique to each ex...

Please sign up or login with your details

Forgot password? Click here to reset