A Joint Bayesian Framework for Causal Inference and Bipartite Matching for Record Linkage
The recent proliferation in the use of digital health data has opened possibilities for gathering information on a common set of entities from various government and non-government sources and make causal inferences about important health outcomes. In such scenarios, the response may be obtained from a source different than the one from which the treatment assignment and covariates are obtained. In absence of error free direct identifiers (e.g., SSN), straightforward merging of separate files based on these identifiers is not feasible, giving rise to need for matching on imperfect linking variables (e.g., names, birth years). Causal inference in such situations generally follows using a two-stage procedure, wherein the first stage involves linking two files using a probabilistic linkage technique with imperfect linking variables common to both files, followed by causal inference on the linked dataset in the second stage. Rather than sequentially performing record linkage and causal inference, this article proposes a novel framework for simultaneous Bayesian inference on probabilistic linkage and the causal effect. In contrast with the two-stage approach, our proposed methodology facilitates borrowing of information between the models employed for causal inference and record linkage, thus improving accuracy of inference in both models. Importantly, the joint modeling framework offers characterization of uncertainty, both in causal inference and in record linkage. An efficient computational template using Markov chain Monte Carlo (MCMC) is developed for the joint model. Simulation studies and real data analysis provide evidence of both improved accuracy in estimates of treatment effects, as well as more accurate linking of two files in the joint modeling framework over the two-stage modeling option. The conclusion is further buttressed by theoretical insights presented in this article.
READ FULL TEXT