Calibrated regression estimation using empirical likelihood under data fusion
Data analysis based on information from several sources is common in economic and biomedical studies. This setting is often referred to as the data fusion problem, which differs from traditional missing data problems since no complete data is observed for any subject. We consider a regression analysis when the outcome variable and some covariates are collected from two different sources. By leveraging the common variables observed in both data sets, doubly robust estimation procedures are proposed in the literature to protect against possible model misspecifications. However, they employ only a single propensity score model for the data fusion process and a single imputation model for the covariates available in one data set. It may be questionable to assume that either model is correctly specified in practice. We therefore propose an approach that calibrates multiple propensity score and imputation models to gain more protection based on empirical likelihood methods. The resulting estimator is consistent when any one of those models is correctly specified and is robust against extreme values of the fitted propensity scores. We also establish its asymptotic normality property and discuss the semiparametric estimation efficiency. Simulation studies show that the proposed estimator has substantial advantages over existing doubly robust estimators, and an assembled U.S. household expenditure data example is used for illustration.
READ FULL TEXT