A Bayesian Approach for De-duplication in the Presence of Relational Data

09/14/2019
by   Juan Sosa, et al.
0

In this paper we study the impact of combining profile and network data in a de-duplication setting. We also assess the influence of a range of prior distributions on the linkage structure, including our proposal. Our proposed prior makes it straightforward to specify prior believes and naturally enforces the microclustering property. Furthermore, we explore stochastic gradient Hamiltonian Monte Carlo methods as a faster alternative to obtain samples for the network parameters. Our methodology is evaluated using the RLdata500 data, which is a popular dataset in the record linkage literature.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset