Machine Learning Tests for Effects on Multiple Outcomes

07/05/2017
by   Jens Ludwig, et al.
0

A core challenge in the analysis of experimental data is that the impact of some intervention is often not entirely captured by a single, well-defined outcome. Instead there may be a large number of outcome variables that are potentially affected and of interest. In this paper, we propose a data-driven approach rooted in machine learning to the problem of testing effects on such groups of outcome variables. It is based on two simple observations. First, the 'false-positive' problem that a group of outcomes is similar to the concern of 'over-fitting,' which has been the focus of a large literature in statistics and computer science. We can thus leverage sample-splitting methods from the machine-learning playbook that are designed to control over-fitting to ensure that statistical models express generalizable insights about treatment effects. The second simple observation is that the question whether treatment affects a group of variables is equivalent to the question whether treatment is predictable from these variables better than some trivial benchmark (provided treatment is assigned randomly). This formulation allows us to leverage data-driven predictors from the machine-learning literature to flexibly mine for effects, rather than rely on more rigid approaches like multiple-testing corrections and pre-analysis plans. We formulate a specific methodology and present three kinds of results: first, our test is exactly sized for the null hypothesis of no effect; second, a specific version is asymptotically equivalent to a benchmark joint Wald test in a linear regression; and third, this methodology can guide inference on where an intervention has effects. Finally, we argue that our approach can naturally deal with typical features of real-world experiments, and be adapted to baseline balance checks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2015

Recursive Partitioning for Heterogeneous Causal Effects

In this paper we study the problems of estimating heterogeneity in causa...
research
02/21/2019

Exploration, inference and prediction in neuroscience and biomedicine

The last decades saw dramatic progress in brain research. These advances...
research
11/30/2020

Double machine learning for sample selection models

This paper considers treatment evaluation when outcomes are only observe...
research
09/09/2022

Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response

In this paper we propose a method for nonparametric estimation and infer...
research
06/20/2020

Learning and Testing Sub-groups with Heterogeneous Treatment Effects:A Sequence of Two Studies

There is strong interest in estimating how the magnitude of treatment ef...
research
05/28/2019

Can we disregard the whole model? Omnibus non-inferiority testing for R^2 in multivariable linear regression and ^2 in ANOVA

Determining a lack of association between an outcome variable and a numb...
research
11/25/2022

Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?

As the scope of machine learning broadens, we observe a recurring theme ...

Please sign up or login with your details

Forgot password? Click here to reset