A General Framework for Regression with Mismatched Data Based on Mixture Modeling

06/01/2023
by   Martin Slawski, et al.
0

Data sets obtained from linking multiple files are frequently affected by mismatch error, as a result of non-unique or noisy identifiers used during record linkage. Accounting for such mismatch error in downstream analysis performed on the linked file is critical to ensure valid statistical inference. In this paper, we present a general framework to enable valid post-linkage inference in the challenging secondary analysis setting in which only the linked file is given. The proposed framework covers a wide selection of statistical models and can flexibly incorporate additional information about the underlying record linkage process. Specifically, we propose a mixture model for pairs of linked records whose two components reflect distributions conditional on match status, i.e., correct match or mismatch. Regarding inference, we develop a method based on composite likelihood and the EM algorithm as well as an extension towards a fully Bayesian approach. Extensive simulations and several case studies involving contemporary record linkage applications corroborate the effectiveness of our framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

Bayesian Record Linkage with Variables in One File

In many healthcare and social science applications, information about un...
research
10/01/2020

Estimation in exponential family Regression based on linked data contaminated by mismatch error

Identification of matching records in multiple files can be a challengin...
research
12/01/2020

A Bayesian Approach to Linking Data Without Unique Identifiers

Existing file linkage methods may produce sub-optimal results because th...
research
03/12/2020

Extending the MaCSim approach using similarity weight matrix to assess the accuracy of record linkage

Record linkage is the process of bringing together the same entity from ...
research
11/02/2021

Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group

In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumpt...
research
12/20/2012

An Experiment with Hierarchical Bayesian Record Linkage

In record linkage (RL), or exact file matching, the goal is to identify ...
research
10/11/2018

Generalized Bayesian Record Linkage and Regression with Exact Error Propagation

Record linkage (de-duplication or entity resolution) is the process of m...

Please sign up or login with your details

Forgot password? Click here to reset