A Hierarchical Graphical Model for Record Linkage

07/12/2012
by   Pradeep Ravikumar, et al.
0

The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2019

Supervised Negative Binomial Classifier for Probabilistic Record Linkage

Motivated by the need of the linking records across various databases, w...
research
07/08/2019

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Semi-supervised learning (SSL) uses unlabeled data for training and has ...
research
09/30/2020

Maximum Entropy classification for record linkage

By record linkage one joins records residing in separate files which are...
research
12/08/2022

Changepoint Methods in Climatology

Changepoint methods have multiple uses in climatology, including station...
research
12/12/2021

Graph-based hierarchical record clustering for unsupervised entity resolution

Here we study the problem of matched record clustering in unsupervised e...
research
06/26/2018

Record Linkage to Match Customer Names: A Probabilistic Approach

Consider the following problem: given a database of records indexed by n...

Please sign up or login with your details

Forgot password? Click here to reset