"Robust-squared" Imputation Models Using BART
Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly-robust estimators can perform poorly when modest misspecification is present in both models (Kang and Schafer, 2007). Here we consider extensions of the AIPWT and PSPP models that use Bayesian Additive Regression Trees (BART; Chipman et al., 2010) to provide highly robust propensity and mean model estimation. We term these "robust-squared" in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly-robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART applied to PSPP and AIPWT, provides a more robust and efficient estimate compared to PSPP and AIPWT, with the BART-estimated propensity score combined with PSPP providing the most efficient estimator with close to nominal coverage.
READ FULL TEXT 
  
  
     share
 share