Performance Bounds for Graphical Record Linkage

03/08/2017
by   Rebecca C. Steorts, et al.
0

Record linkage involves merging records in large, noisy databases to remove duplicate entities. It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods directly linking records to one another are computationally infeasible as the number of records grows. As a result, it is increasingly common for researchers to treat record linkage as a clustering task, in which each latent entity is associated with one or more noisy database records. We critically assess performance bounds using the Kullback-Leibler (KL) divergence under a Bayesian record linkage framework, making connections to Kolchin partition models. We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity. We give insights for when our bounds hold using simulated data and provide practical user guidance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2018

Posterior Prototyping: Bridging the Gap between Bayesian Record Linkage and Regression

Record linkage (entity resolution or de-deduplication) is the process of...
research
08/23/2020

A Prior for Record Linkage Based on Allelic Partitions

In database management, record linkage aims to identify multiple records...
research
10/11/2018

Generalized Bayesian Record Linkage and Regression with Exact Error Propagation

Record linkage (de-duplication or entity resolution) is the process of m...
research
01/08/2023

Bayesian Graphical Entity Resolution Using Exchangeable Random Partition Priors

Entity resolution (record linkage or deduplication) is the process of id...
research
09/13/2019

d-blink: Distributed End-to-End Bayesian Entity Resolution

Entity resolution (ER) (record linkage or de-duplication) is the process...
research
10/17/2014

Variational Bayes for Merging Noisy Databases

Bayesian entity resolution merges together multiple, noisy databases and...
research
02/07/2016

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Entity resolution (ER), an important and common data cleaning problem, i...

Please sign up or login with your details

Forgot password? Click here to reset