A Bayesian Approach to Linking Data Without Unique Identifiers

12/01/2020
by   Edwin Farley, et al.
0

Existing file linkage methods may produce sub-optimal results because they consider neither the interactions between different pairs of matched records nor relationships between variables that are exclusive to one of the files. In addition, many of the current methods fail to address the uncertainty in the linkage, which may result in overly precise estimates of relationships between variables that are exclusive to one of the files. Bayesian methods for record linkage can reduce the bias in the estimation of scientific relationships of interest and provide interval estimates that account for the uncertainty in the linkage; however, implementation of these methods can often be complex and computationally intensive. This article presents the GFS package for the R programming language that utilizes a Bayesian approach for file linkage. The linking procedure implemented in GFS samples from the joint posterior distribution of model parameters and the linking permutations. The algorithm approaches file linkage as a missing data problem and generates multiple linked data sets. For computational efficiency, only the linkage permutations are stored and multiple analyses are performed using each of the permutations separately. This implementation reduces the computational complexity of the linking process and the expertise required of researchers analyzing linked data sets. We describe the algorithm implemented in the GFS package and its statistical basis, and demonstrate its use on a sample data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

Bayesian Record Linkage with Variables in One File

In many healthcare and social science applications, information about un...
research
07/13/2023

Fast Bayesian Record Linkage for Streaming Data Contexts

Record linkage is the task of combining records from multiple files whic...
research
01/15/2019

Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach

Record linkage is the process of finding matches and linking records fro...
research
06/01/2023

A General Framework for Regression with Mismatched Data Based on Mixture Modeling

Data sets obtained from linking multiple files are frequently affected b...
research
05/14/2019

Scaling Bayesian Probabilistic Record Linkage with Post-Hoc Blocking: An Application to the California Great Registers

Probabilistic record linkage (PRL) is the process of determining which r...
research
03/12/2020

Improved assessment of the accuracy of record linkage via an extended MaCSim approach

Record linkage is the process of bringing together the same entity from ...
research
03/12/2020

MaCSim approach to assess the accuracy of individual matched records with varying block sizes and cut-off values

Record linkage is the process of matching together the records from diff...

Please sign up or login with your details

Forgot password? Click here to reset