Variational Bayes for Merging Noisy Databases

10/17/2014
by   Tamara Broderick, et al.
0

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for mean-field variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

d-blink: Distributed End-to-End Bayesian Entity Resolution

Entity resolution (ER) (record linkage or de-duplication) is the process...
research
05/01/2019

Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity

Variational Bayes (VB) methods have emerged as a fast and computationall...
research
03/23/2023

Variational Bayes latent class approach for EHR-based phenotyping with large real-world data

Bayesian approaches to clinical analyses for the purposes of patient phe...
research
05/26/2019

Variational Bayes: A report on approaches and applications

Deep neural networks have achieved impressive results on a wide variety ...
research
04/14/2021

Measuring diachronic sense change: new models and Monte Carlo methods for Bayesian inference

In a bag-of-words model, the senses of a word with multiple meanings, e....
research
12/25/2017

On Statistical Optimality of Variational Bayes

The article addresses a long-standing open problem on the justification ...
research
03/08/2017

Performance Bounds for Graphical Record Linkage

Record linkage involves merging records in large, noisy databases to rem...

Please sign up or login with your details

Forgot password? Click here to reset