"Robust-squared" Imputation Models Using BART

01/09/2018
by   Yaoyuan V. Tan, et al.
0

Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly-robust estimators can perform poorly when modest misspecification is present in both models (Kang and Schafer, 2007). Here we consider extensions of the AIPWT and PSPP models that use Bayesian Additive Regression Trees (BART; Chipman et al., 2010) to provide highly robust propensity and mean model estimation. We term these "robust-squared" in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly-robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART applied to PSPP and AIPWT, provides a more robust and efficient estimate compared to PSPP and AIPWT, with the BART-estimated propensity score combined with PSPP providing the most efficient estimator with close to nominal coverage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Triply robust estimation under missing at random

Missing data is frequently encountered in many areas of statistics. Impu...
research
05/07/2020

Robust location estimators in regression models with covariates and responses missing at random

This paper deals with robust marginal estimation under a general regress...
research
04/06/2022

Calibrated regression estimation using empirical likelihood under data fusion

Data analysis based on information from several sources is common in eco...
research
12/11/2022

On regression-adjusted imputation estimators of the average treatment effect

Imputing missing potential outcomes using an estimated regression functi...
research
12/14/2021

Navigating the corporate disclosure gap: Modelling of Missing Not at Random Carbon Data

Corporate carbon emissions data is disclosed by approximately 65 and mid...
research
06/20/2019

On Statistical Properties of A Veracity Scoring Method for Spatial Data

Measuring veracity or reliability of noisy data is of utmost importance,...
research
07/18/2020

Robust Optimal Design when Missing Data Happen at Random

In this article, we investigate the robust optimal design problem for th...

Please sign up or login with your details

Forgot password? Click here to reset