Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

12/27/2017
by   Yuhang Zhang, et al.
0

Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation of entity resolution, this paper presents a novel Entity Resolution algorithm that introduces a data-driven blocking and record-linkage technique based on the probabilistic identification of entity signatures in data. The scalability and accuracy of the proposed algorithm are evaluated using benchmark datasets and shown to achieve state-of-the-art results. The proposed algorithm can be implemented simply on modern parallel databases, which allows it to be deployed with relative ease in large industrial applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2020

(Almost) All of Entity Resolution

Whether the goal is to estimate the number of people that live in a cong...
research
09/10/2015

Performance Bounds for Pairwise Entity Resolution

One significant challenge to scaling entity resolution algorithms to mas...
research
10/03/2022

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

This paper introduces a novel evaluation methodology for entity resoluti...
research
04/19/2021

Large Scale Record Linkage in the Presence of Missing Data

Record linkage is aimed at the accurate and efficient identification of ...
research
09/13/2019

d-blink: Distributed End-to-End Bayesian Entity Resolution

Entity resolution (ER) (record linkage or de-duplication) is the process...
research
05/15/2019

Schema-agnostic Progressive Entity Resolution (extended version)

Entity Resolution (ER) is the task of finding entity profiles that corre...
research
09/12/2019

Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution

In this paper, we introduce a new optimization approach to Entity Resolu...

Please sign up or login with your details

Forgot password? Click here to reset