Heteroscedasticity-aware sample trimming for causal inference

10/18/2022
by   Samir Khan, et al.
0

A popular method for variance reduction in observational causal inference is propensity-based trimming, the practice of removing units with extreme propensities from the sample. This practice has theoretical grounding when the data are homoscedastic and the propensity model is parametric (Yang and Ding, 2018; Crump et al. 2009), but in modern settings where heteroscedastic data are analyzed with non-parametric models, existing theory fails to support current practice. In this work, we address this challenge by developing new methods and theory for sample trimming. Our contributions are three-fold: first, we describe novel procedures for selecting which units to trim. Our procedures differ from previous work in that we trim not only units with small propensities, but also units with extreme conditional variances. Second, we give new theoretical guarantees for inference after trimming. In particular, we show how to perform inference on the trimmed subpopulation without requiring that our regressions converge at parametric rates. Instead, we make only fourth-root rate assumptions like those in the double machine learning literature. This result applies to conventional propensity-based trimming as well and thus may be of independent interest. Finally, we propose a bootstrap-based method for constructing simultaneously valid confidence intervals for multiple trimmed sub-populations, which are valuable for navigating the trade-off between sample size and variance reduction inherent in trimming. We validate our methods in simulation, on the 2007-2008 National Health and Nutrition Examination Survey, and on a semi-synthetic Medicare dataset and find promising results in all settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

Doubly robust confidence sequences for sequential causal inference

This paper derives time-uniform confidence sequences (CS) for causal eff...
research
02/25/2022

Causal discovery for observational sciences using supervised machine learning

Causal inference can estimate causal effects, but unless data are collec...
research
02/06/2023

A Fast Bootstrap Algorithm for Causal Inference with Large Data

Estimating causal effects from large experimental and observational data...
research
10/30/2019

A Semiparametric Approach to Model-based Sensitivity Analysis in Observational Studies

When drawing causal inference from observational data, there is always c...
research
12/10/2021

On the Assumptions of Synthetic Control Methods

Synthetic control (SC) methods have been widely applied to estimate the ...
research
05/28/2021

Distribution-free inference for regression: discrete, continuous, and in between

In data analysis problems where we are not able to rely on distributiona...
research
11/07/2018

Carving model-free inference

Many scientific studies are modeled as hierarchical procedures where the...

Please sign up or login with your details

Forgot password? Click here to reset