Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

03/09/2020
by   Thomas Stringham, et al.
0

Applied researchers are often interested in linking individuals between two datasets that lack unique identifiers. Accuracy and computational feasibility are a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers 40 matching of Union Army recruitment data to the 1900 US Census for which expert-labelled true matches are known. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modelling of comparison data, allowing disagreement probability parameters conditional on non-match status to be record-specific. To make this refinement computationally feasible, we implement a Gibbs sampler that achieves significant improvement in speed over comparable recent implementations. We also generalize the notion of comparison data to allow for treatment of very common first names that spuriously produce exact matches in record pairs and show how to estimate true positive rate and positive predictive value when ground truth is unavailable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2012

A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems

We present a probabilistic method for linking multiple datafiles. This t...
research
02/21/2020

A Joint Bayesian Framework for Causal Inference and Bipartite Matching for Record Linkage

The recent proliferation in the use of digital health data has opened po...
research
08/04/2017

Exploiting Redundancy, Recurrence and Parallelism: How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Accurate and efficient record linkage is an open challenge of particular...
research
06/26/2018

Record Linkage to Match Customer Names: A Probabilistic Approach

Consider the following problem: given a database of records indexed by n...
research
01/25/2016

Bayesian Estimation of Bipartite Matchings for Record Linkage

The bipartite record linkage task consists of merging two disparate data...
research
03/12/2020

Assessing the accuracy of individual link with varying block sizes and cut-off values using MaCSim approach

Record linkage is the process of matching together the records from diff...

Please sign up or login with your details

Forgot password? Click here to reset