Generating Synthetic Text Data to Evaluate Causal Inference Methods

02/10/2021
by   Zach Wood-Doughty, et al.
18

Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.

READ FULL TEXT

page 7

page 13

page 14

page 16

page 18

research
10/01/2018

Challenges of Using Text Classifiers for Causal Inference

Causal understanding is essential for many kinds of decision-making, but...
research
05/18/2020

Towards Causal Inference for Spatio-Temporal Data: Conflict and Forest Loss in Colombia

In many data scientific problems, we are interested not only in modeling...
research
05/29/2019

Using Text Embeddings for Causal Inference

We address causal inference with text documents. For example, does addin...
research
03/18/2022

Multi-Modal Causal Inference with Deep Structural Equation Models

Accounting for the effects of confounders is one of the central challeng...
research
09/21/2020

Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

Leveraging text, such as social media posts, for causal inferences requi...
research
04/09/2021

Automated Meta-Analysis: A Causal Learning Perspective

Meta-analysis is a systematic approach for understanding a phenomenon by...
research
09/11/2021

Bayesian Topic Regression for Causal Inference

Causal inference using observational text data is becoming increasingly ...

Please sign up or login with your details

Forgot password? Click here to reset